🤝 24GB is all you need

Running powerful AI models on your own hardware

Apr 06, 2025

Dear curious mind,

Curious about running AI locally instead of relying on cloud services? In this issue, I share the unfiltered reality of my first week attempting to ditch ChatGPT for local models. Spoiler alert: It hasn't exactly gone according to plan.

In this issue:

💡 Shared Insight
- Breaking Free from the Cloud: Week One with Local AI Models
📰 AI Update
- Llama 4 Arrives With 10 Million Tokens Context Length
- Open Deep Search: The Free Alternative to Perplexity and ChatGPT Search
- NotebookLM Levels Up: Mind Maps, Maintain Links to Sources, Discover Sources And More
🌟 Media Recommendation
- Podcast: Naval Ravikant's Mind-Expanding Insights

💡 Shared Insight

Breaking Free from the Cloud: Week One with Local AI Models

Last week, I shared my intention to reduce my dependency on cloud AI providers in favour of running models locally. I realized I completely forgot to mention which models I'm actually using! For those of you with similar hardware (a GPU with 24GB memory), here are my current go-to local models:

General purpose:

Mistral Small 3.1 24B q4
Gemma 3 27B q4

Reasoning (thinking):

QWQ 32B q4

Coding:

Qwen2.5 Coder 32B q4

These models work quite well not just on PCs with dedicated GPUs, but also on Macs with M-chips and 24GB+ memory. The “q4” at the end of the model names stands for 4-bit quantization, which is a 4-times compression of the parameters which has only a minimal effect on the response quality.

What's becoming increasingly clear in the local AI space is a split: there are excellent models that run well with 4-bit quantization on 24 GB of memory, and then there are the larger, more powerful models like DeepSeek V3, DeepSeek R1, and now Llama 4 (covered in the AI Update section of this issue), which require substantially more memory and so far, not able to run on most personal devices.

My first week of this experiment has been... challenging. I'll be honest – I've still found myself turning to ChatGPT and Grok for some tasks, primarily because their outputs are nicer visualized than what I'm getting locally so far. I have already realized that the UI/UX is just as important as the underlying model quality. I'm exploring options to improve this aspect. Beyond the popular SillyTavern and Open WebUI interfaces, I've discovered a new tool called ChatWise that looks quite interesting.

More updates to come as I continue this journey toward privacy-focused AI usage!

📰 AI Update

Llama 4 Arrives With 10 Million Tokens Context Length [Meta blog]

Meta has released the long-awaited next version of Llama with the release of two Llama 4 models. Both use a mixture of experts (MoE) architecture with 109 billion and 400 billion total parameters respectively. The smaller model, LAMA 4 Scout, features an outstanding context length of 10 million tokens, vastly surpassing anything so far available. If this really works as announced, it will enable you to use large document collections or code repositories as background knowledge for your requests. Unfortunately, even the smaller one of the two Llama 4 models cannot be run on consumer-grade GPUs like my GeForce 3090 with its 24 GB of VRAM.

Open Deep Search: The Free Alternative to Perplexity and ChatGPT Search [GitHub repository]

Open Deep Search (ODS) is a new open-source AI framework from Sentient Foundation, a non-profit committed to advancing open-source AI, that rivals proprietary search tools like Perplexity and ChatGPT Search. By combining open-source models like DeepSeek-R1 with web search capabilities and advanced reasoning agents, ODS offers a customizable, high-performance alternative without vendor lock-in.

NotebookLM Levels Up: Mind Maps, Maintain Links to Sources, Discover Sources And More [Post by Josh Woodward, Google blog]

NotebookLM has recently introduced several key improvements. New features include Mind Maps for visualizing connections between ideas, Discover Sources for finding relevant web content, significantly enhanced PDF understanding capabilities and improved citation with direct links to original sources. If you are open to using cloud-based tools, NotebookLM is currently the most powerful option to work with a model grounding it responses in multiple documents.

🌟 Media Recommendation

Podcast: Naval Ravikant's Mind-Expanding Insights

In an incredible episode of Modern Wisdom, Naval Ravikant (AngelList co-founder) shares many insights that will transform how you think!

This three-hour conversation is full of Naval's thoughts about technology, relationships, mortality, and what truly matters in life. I really couldn't stop listening and found myself constantly marking highlights while listening to the episode in my favourite podcast player, Snipd (link to the episode in Snipd).

Three out of my countless highlights:

Say No by Default: Start saying "no" more often when you're not sure. When you're young, try new things by saying "yes" a lot. But as you get older and know what you want, protect your time by saying "no" to most requests. This isn't being mean—it's making space for what truly matters to you.
Thinking About the Big Questions: Remember that life is short, which helps make daily problems seem smaller. Kids naturally ask big questions like "Why are we here?" before they're taught to stop. Don't lose this curiosity. Keep asking these important questions because they help you figure out what really matters in your life.
Living in the Present: Each moment only happens once, so try to really experience it. When you worry about tomorrow or get stuck thinking about yesterday, you miss what's happening right now. Being distracted means missing real life as it happens. The best gift you can give yourself is to fully experience this moment, right here, right now.

If you're looking for something that will challenge your thinking and give you an entirely new perspective on success and fulfilment, listen to this podcast episode!

Disclaimer: This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.

Liked this Aidful News issue? Share it with a friend, colleague or on social media! Your support means a lot.

Aidful News