Dear curious mind,
DeepSeek's r1 model release shocked the AI landscape and triggered both a stock market panic and a wave of competitive releases from other companies. But behind the headlines lies a deeper shift: reasoning models are fundamentally changing how AI operates – and how we should use it.
In this issue:
💡 Shared Insight
Reasoning Models: The LLM Evolution That We Need to Use Differently
📰 AI Update
Mistral Small 3: Low-Latency AI That Fits on Your GPU
Qwen: Alibaba Released Vision, Long Context, and a Competitive Base Model
OpenAI Counters DeepSeek r1 with o3-mini
🌟 Media Recommendation
DeepSeek's AI Breakthrough: Why the Market Panic Misses the Bigger Picture
Anthropic's CEO Perspective on DeepSeek and AI Export Controls
💡 Shared Insight
Reasoning Models: The LLM Evolution That We Need to Use Differently
The landscape of Large Language Models (LLMs) is undergoing a fundamental shift with the rise of reasoning models. While traditional chat models are very good at back and forth conversations, these new AI systems are designed for deeper analysis and complex problem-solving. This evolution represents more than just an incremental improvement - it's changing how we interact with LLMs.
OpenAI pioneered the reasoning model approach with their o1 models in September 2024, and other major players are finally following. Google has released an experimental version of Gemini 2.0 Flash Thinking model, Alibaba introduced the open QwQ, and DeepSeek launched the currently hyped and also open r1. Major players including Anthropic, xAI, Meta, and Mistral are expected to release their first reasoning models in the coming weeks.
As Dario Amodei, CEO of Anthropic, points out, this is a relatively new research area with significant potential gains from relatively low investment. This industry-wide movement signals a fundamental shift in AI development.
Understanding the Technology
Traditional AI development focused on making training runs bigger. The new approach uses more compute during actual usage (inference), enabling deeper reasoning capabilities and more sophisticated problem-solving abilities. This seemingly simple shift opens up new possibilities for tackling complex problems.
To achieve this, reasoning models are trained using sophisticated reinforcement learning techniques that integrate the well known "chain of thought" answering style in the model itself. As a result, the models generate a massive number of potential output paths that are then used to produce the final answer. OpenAI, who pioneered this approach, keeps the reasoning process hidden, likely to prevent competitors from learning and replicating their model's thought patterns. In contrast, all other reasoning models so far make their step-by-step logic visible, offering fascinating insights into how these systems actually think through problems.
Guidelines for Effective Use
Reasoning models offer significant advantages in accuracy through their built-in self-correcting mechanisms, though their outputs should still be verified. As demonstrated in Ben Hylak's case study published in the Latent Space newsletter, these models work best when approached differently than traditional chat models. Rather than engaging in back-and-forth conversations, reasoning models excel when provided comprehensive context upfront, functioning more like "report generators".
When choosing between reasoning and chat models, consider the complexity of your task - use reasoning models for complex problems, coding tasks, and detailed analysis, while sticking with chat models for simple questions and quick interactions. The key to success with reasoning models lies in how you structure your inputs. Provide detailed initial prompts with all relevant background information and clear desired outcomes. While these models may take longer to respond, the trade-off comes in the form of higher quality, more thoughtfully reasoned answers.
Hylak suggests using voice memos as an effective way to capture and provide detailed context, especially for complex problems. This approach allows you to comprehensively explain your needs without the constraints of typed text, leading to better results from the reasoning model.
Looking Ahead
In the future, the LLM provider will likely offer an automatic model selection which chooses the best suited model for your request, similar to how ChatGPT currently adds results from a web search if the request will likely benefit from recent information. Until then, understanding these differences helps you manually select the appropriate tool for your needs.
Beyond human interactions, AI agents leveraging LLMs as tools will particularly benefit from reasoning models' enhanced capabilities - their improved accuracy leads to better task completion, and the longer processing times are often inconsequential for automated workflows that don't require real-time responses.
📰 AI Update
Mistral Small 3: Low-Latency AI That Fits on Your GPU [Mistral blog]
The 24B-parameter model, matches the performance of Llama 3.3 70B in various benchmarks. The smaller size makes it possible to run the model at decent speed on consumer hardware. Furthermore, Mistral AI also teases upcoming models with enhanced reasoning capabilities, signaling a new wave of innovation in open-source AI.
Qwen: Alibaba Released Vision, Long Context, and a Competitive Base Model [Qwen blog: VL, 1M, Max]
In a series of three major releases, Alibaba's Qwen team introduced Qwen2.5 VL, an openly available vision-language model with enhanced capabilities in visual recognition, document parsing and video understanding (with 3B, 7B, and 72B parameters), alongside Qwen2.5-1M, which offers open-source models (with 7B and 14B parameters) capable of processing up to 1 million tokens. Their third release, Qwen2.5-Max with 325B parameters, shows competitive performance compared to other non-reasoning models on various benchmarks, but is not openly available and only accessible through Alibaba Cloud's API and chat services.
OpenAI Counters DeepSeek r1 with o3-mini [OpenAI blog]
Responding to DeepSeek's r1 reasoning model, OpenAI released o3-mini, a cost-efficient AI model with a special "high" reasoning mode that takes longer but delivers enhanced performance through an extended reasoning process. The model matches or exceeds previous models capabilities while being 24% faster in standard mode. In a departure from previous launches, o3-mini is also immediately available in the EU via ChatGPT and the API.
🌟 Media Recommendation
DeepSeek's AI Breakthrough: Why the Market Panic Misses the Bigger Picture
A seismic shift hit tech stocks this week after the Chinese AI company DeepSeek demonstrated something remarkable - they trained a competitive AI model for just $5.576M. Ben Thompson's analysis in the article named DeepSeek FAQ brilliantly unpacks why this revelation sent shockwaves through Wall Street, while also highlighting why the market's knee-jerk reaction might be shortsighted.
The story centers around DeepSeek CEO Liang Wenfeng's bold stance on open source, which he shared in an interview he gave in 2024. "In the face of disruptive technologies, moats created by closed source are temporary," he declared in a compelling interview. While many other companies guard their AI secrets, DeepSeek openly shares their models and also how they were trained.
While many investors panicked about Nvidia's future as DeepSeek showed that competitive models can be trained with way less compute, Thompson's analysis reveals a more nuanced reality. The emergence of reasoning models like DeepSeek's r1 actually demands more compute power for inference. Additionally, the efficiency gains in training could enable companies to explore more experimental approaches and model architectures - potentially driving even greater demand for computing power. In other words, DeepSeek's breakthrough might actually accelerate GPU adoption instead of slowing it down.
My take: Thompson's deep dive captures a pivotal moment in AI development. DeepSeek's achievement isn't about reducing compute needs - it's about making AI more accessible and efficient, which historically has led to increased overall usage. This could ultimately accelerate AI adoption across industries, driving even greater demand for computing infrastructure.
Anthropic's CEO Perspective on DeepSeek and AI Export Controls
Dario Amodei, co-founder and CEO of Anthropic, shared his analysis of DeepSeek's recent model releases and their implications for US export controls on AI chips to China.
Key points from his perspective:
DeepSeek's v3 and r1 models represent expected progress in AI development rather than breakthrough innovations
He draws parallels to Anthropic's own journey, noting how Claude 3.5 Sonnet achieved similar or better quality than GPT-4 at significantly lower costs months later
DeepSeek has 50,000 Hopper generation chips of the type H100, despite the existing export restrictions
Future scenarios according to Amodei:
Based on scaling laws, superintelligent AI systems are likely to emerge in 2026-2027
Two possible futures:
US companies and allies exclusively possess advanced AI capabilities
Both US and Chinese companies achieve superintelligence, requiring access to US chips
He advocates for stricter export controls to prevent the second scenario
My take: While Amodei makes valid points about technological progress, the competition and open-source approach by DeepSeek is valuable for the AI community. Having advanced AI capabilities concentrated solely in US companies could pose greater risks than a more distributed development landscape. The open sharing of knowledge and models, as practiced by DeepSeek and Meta, leads to more responsible and transparent AI development compared to closed, proprietary systems.
Disclaimer: This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.