π€ A Glimpse of Google's AI Power
Dear curious minds,
I bought a lifetime membership of the small bets community by Daniel Vasallo quite a while ago and really like the community and talks from invited speakers. This week, a new program about creating and succeeding with your newsletter started by Louie Bacaj and Chris Wong, which I subscribed to. It consists in total of six sessions with practical exercises to implement and create your own newsletter. My one already exists for quite some time, but there is still so much to try and improve. Be prepared for a few experiments in this one and in the following issues.
In this week's issue, I bring to you the following topics:
OpenAI Competition: Google Gemini Unveiled
A Breakthrough In Character Animation
AI Prompt Writing Tips
Tech Term: Retrieval Augmented Generation
If nothing sparks your interest, feel free to move on, otherwise, let us dive in!
π€π OpenAI Competition: Google Gemini Unveiled
After initially postponing the release to January, Google announced and partially released its ChatGPT competitor Gemini on December 6, 2023.
Gemini comes in three versions:
Gemini Ultra (most capable for complex tasks),
Gemini Pro (versatile for a broad range of tasks),
Gemini Nano (efficient for on-device tasks, especially mobile).
Its groundbreaking feature is the multimodal design, seamlessly integrating text, code, audio, image, and video inputs, unlike GPT-3 and GPT-4 which were later upgraded for multimodal tasks.
Benchmark tests show Gemini Ultra outperforming GPT-4 in various domains, including reasoning, reading comprehension, and Python code generation, though it lags slightly in common-sense reasoning.
Notably, Gemini Pro excelled in image recognition, document understanding, and audio speech translation, outclassing OpenAI's models in these areas.
Gemini demonstrates remarkable capabilities in understanding and responding to visuals and text simultaneously, like interpreting and responding to drawings in real-time.
It can turn images into code, suggest creative uses for objects, and engage in logical problem-solving, showcased through various demos.
Gemini's coding capability is also highlighted, showing significant improvements in languages like Python, Java, C++, and Go.
However, initial versions of Gemini won't have image generation capabilities at launch, a feature planned for future updates.
Google emphasizes safety and responsibility in Gemini's development, with comprehensive evaluations for bias and toxicity, but remains secretive about its training data sources.
Gemini Pro is now integrated into Google products like Bard, and will soon be available via API for developers, with Gemini Ultra being expected to launch next year.
My take: It appears that we finally have a true competitor of OpenAIβs GPT-4, but sadly, we canβt use the Gemini Ultra model yet. The in Bard integrated version Gemini Pro is comparable with the free version of ChatGPT, but not outstanding. Especially the lack of the multi-modality in the current version is a disappointment. The documentation shared is very detailed but lacks information about the training data, most likely Google is trying to avoid copyright disputes.
π₯π€ A Breakthrough in Character Animation
Alibaba Group's Intelligent Computing Institute introduces a new framework named Animate Anyone using diffusion models for character animation.
The system animates characters from still images, preserving detailed features and ensuring smooth motion.
The method excels in fashion video synthesis and human dance generation, outperforming other image-to-video approaches.
Demonstrations include animation of various character types, including humans and cartoons.
My take: Fashion models and social media influencers face new challenges, as AI tools like Animate Anyone could potentially reduce demand for their services. While these developments are exciting for the tech world, they bring a mixed bag of consequences for creative professionals.
π€βοΈ AI Prompt Writing Tips
Notion published a blog article which emphasizes the importance of skillfully crafting AI prompts to effectively harness the power of generative AI. These tips and insights can be used directly in Notion, but are also valid in any other tool where you prompt an AI.
Tips for Effective Prompts:
Speak Normally: Treat the AI like a human conversation partner. There is no benefit in talking robotic to the AI like you might do with today's smart speakers like Alexa.
Be Concise: Provide clarity without unnecessary complexity to avoid misinterpretation.
Avoid Negative Phrases: Phrase requests positively to prevent misdirection.
Include Necessary Details: Adding more details in prompts leads to more accurate responses.
Assign an Identity to the Model: Stating something like βYou are a marketerβ helps the AI focus on relevant responses.
Force AI to move slowly: Asking the AI to think step by step or even take a deep breath before it crafts a reply, often improves the results.
To maximize AI Efficiency, you should iteratively refine your prompt and if the results are not as desired, you might add 'few-shot examples' to guide AI responses.
It is stated that AI's occasional replies with inaccuracies, termed "hallucinations", which underscores the significant human effort still required in guiding AI.
However, the blog article acknowledges the vast potential AI offers in complementing human capabilitiesββββββ.
My take: The article is a nice summary of how to prompt an AI. The area is still quite new and likely to change, but for now, it helps to apply the tips and tricks mentioned in the article.Β
π»π Tech Terms: Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a method used in AI language models like me. It combines two things: a language model and a system for searching for information. Here's how it works:
Language Model: This is the part that generates text. It's trained on a huge amount of data and learns how to write sentences that make sense.
Retrieval System: This is like a search engine inside the model. When the model gets a question or needs information to answer something, this system searches through a database to find relevant information.
So, when RAG is used, the language model gets help from the retrieval system. It's like if you're writing an essay and use Google to look up facts. The retrieval system finds info, and the language model uses this info to give better answers.
It's helpful because the language model alone might not always have the most recent or specific info. By using RAG, the model can give answers that are more accurate and up-to-date.
βπ€ Mind Bender
What are the privacy concerns when using AI in PKM?
Think about the prompt and identify what your answer to this question is. If you are curious what GPT-4 replies to this prompt, take a look here. You are welcome to share any thoughts by replying to this mail. I would love to hear from you!
Disclaimer:Β This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.
This publication is free. If youβd like to support me, please recommend this newsletter to anyone you think would enjoy it!