π€ AidfulAI Newsletter #23: Meta's App Integration Catapults AI into the Mainstream
Dear curious minds,
Welcome to the ultimate newsletter for those interested in Artificial Intelligence (AI) and Personal Knowledge Management (PKM). In this week's issue, I bring to you the following topics:
Meta Unveiled Emu: A New Image Generation Model
AI Goes Mainstream: Meta Releases AI Experiences to Their Apps
ChatGPT's New Voice and Image Features
LeoLM: A Leap Forward in German Language AI Research
Navigating the AI Venture Landscape: Differentiation, Innovation, and Opportunities for Startups (podcast)
In a small but significant shift, this month's newsletter is stepping away from the traditional format of designated sections for 'Major AI News,' 'Privacy-Friendly AI,' 'PKM and AI,' and 'Podcasts.' I've opted for a more fluid approach to content delivery. This change stems from a desire for flexibility: in the past, the structured layout felt limiting and sometimes forced me to find topics just to fill each category. By loosening these constraints, I hope to bring you a broader, more natural range of discussions that flow seamlessly from one subject to the next. Thank you for your continued readership, and I hope you find this new format engaging and informative.
And like always, if nothing sparks your interest, feel free to move on, otherwise, let us dive in!
π¨πΌ Meta Unveiled Emu: A New Image Generation Model
Meta releases a new AI model, Emu (Expressive Media Universe), capable of generating highly realistic and aesthetically pleasing images from text prompts.
The breakthrough: "Quality tuning," a method where Emu was fine-tuned on just 2000 exceptionally high-quality images instead of a massive dataset. Automatic filters were initially used to sift through billions of images, but human annotators played a crucial role in curating the final 2000-image dataset.
When compared to its pre-trained base model and the state-of-the-art SDXL1.0 model, Emu scored higher in terms of visual appeal and text faithfulness.
Emu is not limited to photorealistic images but excels even in non-photorealistic settings like sketches and cartoons.
My take: While Emu's capabilities are undeniably of high quality, there is so far no open-source release or announcement that this will happen at a later stage. Even with Meta releasing ground-breaking approaches like Llama, Segment Anything and a lot more as open-source, the description of the work in the research paper is currently the only accessible information.
π€π± AI Goes Mainstream: Meta Releases AI Experiences to Their Apps
Meta announced at Meta Connect, an event which took place from the 27-28th of September, various AI releases and integrations.
Besides the Emu image generation tool (passage above), and the announcement of the Ray-Ban Meta smart glasses and Quest 3 mixed reality headset, there were major announcements of AI integrations in Metaβs social media apps:
txt2stickers: Using Llama 2 and the image generation model Emu to create customized stickers based on your text inputs. This feature will first be rolled out to English-speaking users.
Image Editing: New tools on Instagram announced for creative image transformations using Emu and Segment Anything to change the style or background of an image. Meta states that they experiment with visible and invisible markers to show that content is AI generated.
Chatbot Meta AI: A conversational assistant coming to multiple Meta platforms. Besides WhatsApp, Messenger and Instagram, Meta AI is also announced for the Ray-Ban Meta smart glasses and Quest 3 mixed reality headset.
Niche AI Chatbots with Celebrities: Meta introduced a range of specialized chatbots represented by celebrity personas, each representing a specific interest. From sport debates with Tom Brady to interactive pen and paper role playing sessions with Snoop Dogg.
Furthermore, Meta announced AI studio, a platform which can be used to create new AIs and integrate them, even without programming experience, in Metaβs social media apps. This possibility is targeted especially for businesses and creators to engage with their customers and followers.
My take: The integration of AI functionalities directly into commonly used platforms, such as messaging apps, marks a significant milestone for generative AI. While Meta did address privacy concerns during the event and released them in a blog article, one key aspect that requires clarity is the specifics of data usage for future AI model training. It's essential to know precisely what data is utilized and whether deleting previous chats or interactions effectively excludes them from future training runs. Transparency in data handling is crucial for ensuring user trust in these AI-driven systems.
ππΈ ChatGPT's New Voice and Image Features
OpenAI announced the roll-out of two new features to their ChatGPT Plus and Enterprise customers over the next two weeks:
Voice Interaction: ChatGPT will enable voice conversations. You can ask questions, request stories, and have a back-and-forth chat using voice. This feature is initially coming to the mobile iOS and Android versions. Users can opt in via settings and choose from five different voices.
Image Understanding: You will be able to use ChatGPT to discuss and analyze images you upload. This is useful for various tasks, from meal planning based on your fridge's contents to work-related data analysis.
The blog article also states that OpenAI works with companies like Spotify for specific use-cases, such as translating podcasts into other languages using the podcaster's own voice.
My take: The competition is catching up and OpenAI releases new features to stay ahead. However, the possibility to upload images is nothing new and already available in Microsoft's Bing AI and Google's Bard. A plus is the statement from OpenAI to limit the ability to analyze or comment on individuals in images to respect privacy.
π£π©πͺ LeoLM: A Leap Forward in German Language AI Research
A blog article announced LeoLM (Linguistically Enhanced Open Language Model), a set of German-language Foundation Models developed by LAION thanks to the compute grant at the HessianAI's new supercomputerΒ 42.
The models
LeoLM/leo-hessianai-7b
Β andΒLeoLM/leo-hessianai-13b
are based on the Llama-2 architecture and were trained on a large corpus of high-quality German text to offer advanced language understanding.A model based on Llama-2-70b is announced to be released at a later stage.
The team also developed GermanBench, a collection of translated English benchmarks, to evaluate LeoLM's performance.
Despite the advancements, LeoLM's performance in German still lags its English counterparts in some benchmarks, indicating room for improvement.
They released a PDF with examples generated by LeoLM.
My take: The emergence of LeoLM is a noteworthy development, especially when considering that up to now, larger foundation models have been the only viable option for generating high-quality text in languages like German. Smaller, open-source models have often struggled to achieve multilingual capabilities due to limited parameters. The advent of LeoLM could potentially change the landscape by providing a more privacy-focused alternative without sacrificing text quality, thereby reducing dependence on a few major organizations. This is a significant step forward in enhancing transparency and user control over data privacy.
ποΈπ€ Navigating the AI Venture Landscape: Differentiation, Innovation, and Opportunities for Startups (podcast)
Miles Grimshaw, General Partner at the venture capital company Benchmark, was guest in episode 90 of the Gradient Podcast.
The episode delves into the challenges and opportunities in the AI start-up landscape, with Grimshaw sharing insights from his venture capital experience.
Grimshaw highlights the critical need for AI startups to stand out by offering unique value propositions in a saturated market, and suggests moving beyond conventional "co-pilot" approaches in AI. He emphasizes the importance of innovating new business models and user experiences.
Language models and diffusion models have potential applications in non-obvious domains like biology. There is a need for more exploration of generative AI in other fields.
My take: It would have been interesting to hear Grimshaw's thoughts on how startups could integrate privacy-centric models as a point of differentiation. As consumers become more privacy-conscious, this could become a significant selling point.
Disclaimer:Β This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.