π€ AI becomes our new conversation partner
Dear curious mind,
Before we dive into this issue, I want to address a recent technical hiccup. In the last two issues, some readers may have noticed inaccessible images in the emails. I apologize for any inconvenience this may have caused. This issue has been resolved in the web versions of these newsletters. You can find the updated versions at π€ The secret to free Claude 3.5 SonnetΒ usage and π€ Dragon vs eagle: The AI battle heats up.
Now, let's explore the exciting world of AI developments in this latest edition of Aidful News. In this issue:
π‘ Shared Insight
The Rise of Voice-Based AI: Are Conversations with AIs Our Future?
π° AI Update
The Biggest Reveals from OpenAI's DevDay 2024
Canvas: OpenAI Introduces New Interface for Writing and Coding with ChatGPT
Type with Your Voice: Wispr Flow's Innovative AI Solution
π Media Recommendation
Andrej Karpathy's AI-Curated Podcast: "Histories of Mysteries"
π‘ Shared Insight
The Rise of Voice-Based AI: Are Conversations with AIs Our Future?
The landscape of human-AI interaction is rapidly evolving, with recent developments suggesting a significant shift towards voice-based interfaces. This trend raises the question if conversations with AIs becoming our primary mode of interaction with technology?
Several recent advancements highlight this shift:
NotebookLM's Natural Audio Generation: Google's NotebookLM has been generating attention for its ability to produce audio that feels remarkably natural. This development showcases AI's growing capability to engage in human-like vocal interactions, blurring the line between artificial and human voices.
ChatGPT's Advanced Voice Mode: OpenAI has made significant strides with the global release of ChatGPT's Advanced Voice mode. While initially unavailable to Plus users in Europe, recent announcements indicate a rollout in the UK and access for EU business customers and team accounts. Likely the rest will follow.
Wispr Flow: Voice as a Keyboard Replacement: The release of Wispr Flow (highlighted in the AI Update section of this issue), which transcribes voice input and markets itself as a keyboard replacement, further underscores the move towards voice-based interfaces. This tool hints at a future where typing could become secondary to speaking for many digital interactions.
These advancements suggest weβre heading towards a future in which conversing with AI is as natural as typing on a keyboard. This shift to voice-based AI could make technology significantly more accessible, especially for people with physical or visual impairments. But as speaking is often faster than typing, it will enhance productivity for everyone. At the same time, more human-like AI voices could break down barriers and make technology feel more approachable.
However, this trend also raises questions:
Privacy Concerns: How will constant voice interaction affect our privacy in public and shared spaces?
Dependence: Could over-reliance on voice AI impact our ability to communicate effectively with humans?
Cultural Impact: How might widespread adoption of voice AI change social norms and communication between persons?
As we approach this potentially transformative shift, it's important to think about both the exciting possibilities and the challenges that come with a more conversational relationship with AI. The future may indeed be spoken, but how we navigate this new landscape will determine whether it enhances or complicates our digital lives.
π° AI Update
The Biggest Reveals from OpenAI's DevDay 2024
OpenAI held its first DevDay of 2024 in San Francisco on October 1st, the first of three events, with London and Singapore to follow. The event targeted primarily developers, but it offers everyone a glimpse into the future as OpenAI envisions it.
The o1 modelβs impressive capabilities in rapid application development were demonstrated. In a live demo, an iPhone app was built end-to-end with a single prompt in less than 30 seconds. This shows the potential for dramatically accelerated software development cycles.
Key announcements include:
Realtime API: This new API allows developers to build functionality similar to ChatGPT's Advanced Voice Mode within their own apps. It enables real-time speech-to-speech interactions, opening up possibilities to develop more immersive and natural AI-powered experiences. However, doing this is not cheap. It is priced at $0.06 per minute of audio input and $0.24 per minute of audio output.
o1 Model Enhancements: OpenAI's recently released o1 model, known for its superior reasoning capabilities, received a significant boost with increased API rate limits matching those of GPT-4o (10,000 requests per minute).
Cost Reductions: GPT-4o API calls are now 50% cheaper thanks to automatic prompt caching. Prompt caching stores previously used prompts and their corresponding responses, allowing the system to quickly retrieve results for identical or similar queries without reprocessing them. This reduces computational overhead and speeds up response times, benefiting developers without requiring additional effort.
Multi-modal Fine-tuning: A new tool allows users to customize GPT-4o's understanding of both images and text after its initial training. This fine-tuning process is like giving the AI extra lessons on specific topics you care about, making it more versatile and better suited to your needs. It's an additional step that adapts the model to your data, enhancing its ability to work with visual and textual information in ways that are most relevant to you.
My take: OpenAI's DevDay 2024 demonstrates the company's commitment to diversifying AI models for different purposes rather than pursuing a one-size-fits-all approach. The introduction of the Realtime API promises to revolutionize how developers interact with and implement AI in their applications. The rapid pace of innovation in AI development tools is truly remarkable, enabling individual developers to create applications that would have required entire teams just a few years ago. It's an exiting time for AI development, and I'm eager to see how these new tools will be utilized in the coming months.
Canvas: OpenAI Introduces New Interface for Writing and Coding with ChatGPT
OpenAI has unveiled Canvas, a new interface for ChatGPT that enhances collaboration on writing and coding projects beyond simple chat interactions.
Canvas opens in a separate window, allowing users to work side-by-side with ChatGPT on more complex tasks. It's built with GPT-4o and is currently in beta.
Key features of Canvas include:
Inline feedback and suggestions from ChatGPT
Highlighting specific sections and ask for modifications or feedback
Direct text and code editing by users
Shortcuts for common tasks at writing and coding
Version history with a back button
Writing shortcuts in Canvas include suggesting edits, adjusting length, changing reading level, adding final polish, and inserting emojis.
Coding shortcuts include reviewing code, adding logs and comments, fixing bugs, and porting to different programming languages.
Canvas rolled out to ChatGPT Plus and Team users globally, with Enterprise and Edu users getting access next week. It's planned to be available to all ChatGPT Free users once out of beta.
OpenAI trained GPT-4o specifically for this collaborative role, developing core behaviors like triggering the canvas, generating diverse content, making targeted edits, and providing inline critique.
My take: By providing a more visual and interactive workspace, Canvas could greatly enhance productivity for writers and coders. However, unlike Anthropic's Artifacts feature, Canvas does currently not execute code, which limits its functionality for real-time coding collaboration. Additionally, the current lack of change highlighting makes it challenging to quickly assess and review AI-applied modifications. This could potentially slow down the review process, especially for larger projects. While Canvas is a step in the right direction for more interactive AI assistance, these limitations show there's still room for improvement in making AI collaboration tools more user-friendly and efficient.
Type with Your Voice: Wispr Flow's Innovative AI Solution
Wispr has launched Flow, an advanced AI-powered voice typing tool that promises to revolutionize how we write on a Mac.
Flow integrates seamlessly with all applications on your computer, allowing users to "just speak, and Flow writes for you, everywhere on your computer."
Key features that set Flow apart from other dictation tools:
Auto-edits: Flow can automatically correct and refine your dictation, even mid-sentence.
Sounds like you: Adapts to your personal writing style and adapts it even to the app you are using.
Strengthens your writing: Offers suggestions to enhance your text.
Accurate name recognition: Correctly handles even uncommon names.
Early adoption data shows impressive results:
Increased typing speed by nearly 4x compared to keyboard input, allowing users to dictate at up to 220 words per minute (WPM).
Even users who type over 100 WPM are choosing Flow for its efficiency.
After two weeks, most users prefer Flow to their keyboards.
Jeremy Howard, renowned AI researcher and founder of fast.ai, shared on π that he now uses voice typing for the majority of his work.
Even coding, at least with AI assistance, is a usage scenario, as Mckay Wrigley shared in a video on π. He uses Flow in the code editor Cursor to create a Perplexity clone in just 8 minutes, showcasing its potential for rapid development.
Flow offers a free tier with 2,000 words per week and a Pro plan at $12/month for unlimited words and advanced features.
My take: The endorsement from industry leaders and early adoption data suggest that Flow could indeed be a game-changer in how we interact with our computers.
If you are a Mac user, you might also want to take a look at the competitors, namely MacWhisper and SuperWhisper. The latter offers the usage of local AI models, which is a privacy-friendly alternative.
To my knowledge, there is so far no comparable application for Windows or Linux users. However, if voice typing becomes successful, we will see how the existing realizations get launched for other platforms and alternatives emerge.
π Media Recommendation
Andrej Karpathy's AI-Curated Podcast: "Histories of Mysteries"
Andrej Karpathy, a prominent figure in the AI community, has recently shared his latest project on π: An AI-curated podcast called "Histories of Mysteries".
The podcast consists of 10 episodes, each focusing on a different historical mystery or intriguing topic.
What makes this podcast unique is its creation process, which heavily leveraged AI tools:
Research was conducted using ChatGPT, Claude, and Google
NotebookLM was used to generate podcast audio and write descriptions
Ideogram created digital art for episodes and the podcast itself
The entire curation process took Karpathy only about two hours, showcasing the potential of AI in content creation and curation.
You can find the podcast on Spotify:
My take: This project demonstrates the incredible leverage AI tools can provide in content creation. While it raises questions about the future of AI-generated content on the internet, it also opens up exciting possibilities for individuals to curate and share knowledge efficiently. I recommend giving the podcast a listen during your next walk or drive to experience this AI-curated content firsthand and form your own opinion on its quality and implications.
Disclaimer: This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.