🤝 Your path to AI without privacy risks

How to run LLMs on your own hardware

Jan 26, 2025

Dear curious mind,

In a world where AI capabilities grow daily, so do concerns about data privacy. This issue shares how you can use the power of AI while keeping your sensitive information exactly where it belongs: under your control.

In this issue:

💡 Shared Insight
- Running Open-Source LLMs Locally For Private AI Computing
📰 AI Update
- DeepSeek-R1: The Open-Source Answer to OpenAI's o1
- OpenAI Operator: The OpenAI Competitor to Anthropic's Computer Use Agent
- ByteDance UI-TARS: The Open-Source Agent That Outperforms Anthropic's Computer Use
🌟 Media Recommendation
- Podcast: Unlock o1's Full Potential with Strategies Shared in The Next Wave Podcast

💡 Shared Insight

Running Open-Source LLMs Locally For Private AI Computing

In today's AI-driven world, protecting sensitive data is important to me. While cloud-based LLMs offer impressive capabilities, they require sending your data to external servers. Even with strong privacy policies, there's always a risk when sharing sensitive information – whether it's medical records, financial data, personal communications, or confidential business information.

Running LLMs locally is the most privacy-friendly solution. When your data never leaves your device, you maintain complete control over your information.

However, setting up local LLM infrastructure requires a thoughtful approach. Let's explore how to find the right solution for your needs.

Start with Cloud Testing

Before investing in local setup, use cloud-based APIs (as described in the previous issue) to:

Test the capabilities of different open-source models
Compare performance against your current solutions
Experiment with smaller versions of the same models
Identify the minimum model size that meets your needs

This crucial first step helps you understand exactly what you need before making any hardware decisions. Services like OpenRouter and Groq make it easy to experiment with various models and sizes while paying only for what you use.

Determine Your Hardware Requirements

Once you've identified the model size that works for you, you can determine the hardware needed to run it locally. The requirements typically scale with the model size in needed memory size and compute.

Memory requirements:

Depends on quantization (parameter compression):
- fp16: 2 bytes per parameter (default size during training)
- int8: 1 byte per parameter (minimal performance loss)
- int4: 0.5 bytes per parameter (some performance impact)
As a rule of thumb: A model needs approximately its parameter count in GB of memory when using Int8. This results in a 7B model requiring ~7GB RAM to run.
Add at least 1GB of memory overhead for operating system requirements.

As the memory requirements are crucial to run the model at all, the available compute impacts how fast your text is generated.

Check Your Existing Hardware

With your target model size in mind, evaluate your current devices. Standard PCs/laptops with CPU only are suitable for smaller models (7B parameters), while gaming PCs with decent GPUs can handle medium-sized models efficiently. MacBooks with M-chips offer excellent performance for local AI computing, and even recent smartphones are capable of running highly optimized smaller models. You might discover your existing hardware is sufficient, especially if you're willing to work with smaller, optimized models.

Specialized Hardware Options

If your existing hardware isn't sufficient or you want to own a power efficient always on device which serves your home or business, several options exist:

NVIDIA Jetson Orin Nano: The Entry-Level Option

Recently huge performance increase with software update
8GB memory
$250 price point
Perfect for smaller quantized models

Mac Mini: The Budget-Friendly Powerhouse

Base model (M4, 16GB): $599
Advanced config (M4 Pro, 64GB): $1,999
Excellent balance of capabilities and cost

PC with High-End Gaming GPU: The Performance Option

NVIDIA GeForce RTX 3090/4090 with 24GB VRAM (or even the announced 5090 with 32GB)
Price range: $2,000-4,000 depending on configuration
Can run medium-sized models at high speed
Note: Not licensed for server/datacenter use

NVIDIA Project Digits: The Home AI Supercomputer

128GB shared memory
Power-efficient ARM processor from MediaTek
$3,000 price point (release announced for May 2025)
Can run large models at a decent speed and suited as always-on system
Ideal for small companies or serious enthusiasts

Run LLMs Locally

For those ready to run models locally, I highly recommend starting with Ollama. It offers a straightforward way to download and run various open-source models with minimal setup in your terminal. At a later stage, you can use Ollama as a backend for applications like the browser plugin ChatHub (highlighted in the previous issue) or the even more powerful Open WebUI. If you prefer an inbuilt graphical interface, LM Studio is an excellent alternative, though be aware that it is not free for commercial use. While there are many other options available, these two provide the easiest entry point for most users.

The Future of Private AI Computing

Thanks to companies like Meta and DeepSeek sharing their models openly, running LLMs locally is becoming a viable option. The key is following a structured approach: test in the cloud first, identify your minimum requirements, and then choose the most cost-effective hardware solution that meets your needs. As models become more efficient and hardware options expand, truly private AI computing will only become more accessible to everyone.

📰 AI Update

DeepSeek-r1: The Open-Source Answer to OpenAI's o1 [GitHub repository]

The Chinese company DeepSeek continues to fulfill OpenAI's original mission: releasing state-of-the-art AI models for everyone as open source. DeepSeek's reasoning model r1 with 671B parameters is available for free on their website and at competitive prices through their API. They also offer smaller versions of their models that you can run on your own hardware. All releases are under the commercially friendly MIT license.

OpenAI Operator: The OpenAI Competitor to Anthropic's Computer Use Agent [OpenAI article]

OpenAI has released Operator, an AI agent system that can control a web browser to perform real-world tasks like booking restaurants, shopping, and buying tickets by interacting with websites just as a human would through mouse clicks and keyboard inputs. Operator is currently available to users with the $200 Pro subscription in the US.

ByteDance UI-TARS: The Open-Source Agent That Outperforms Anthropic's Computer Use [GitHub repository]

UI-TARS is a new AI that can control your computer by seeing and understanding what's on your screen, which works, according to shared benchmarks, better than Anthropic's Computer Use. It was released in three sizes (2B, 7B, and 72B parameters), and surprisingly, the two small versions show a relatively small performance differences to the large versions. As the models are released as open-source, you will be able to run them on your own hardware.

🌟 Media Recommendation

Podcast: Unlock o1's Full Potential with Strategies Shared in The Next Wave Podcast

Matt Wolfe and Nathan Lands shared key insights about using o1 effectively in a recent episode of The Next Wave podcast. The hosts emphasized that most users significantly underestimate this model's capabilities by treating it like a regular chatbot.

The core strategy for maximizing o1's performance is to provide comprehensive context upfront instead of back-and-forth chat conversation. While the model takes often minutes to generate a reply, the results are dramatically better when given extensive relevant information in the initial prompt.
The hosts recommend using o1 specifically for complex tasks like:
- Software architecture and new feature development
- Content creation requiring detailed context analysis
- Projects needing careful consideration of multiple factors
For simpler tasks like minor code changes or basic requests, regular models like GPT-4o will be more appropriate due to faster response times.
Real-world example: Nathan used this approach to develop an impressive game prototype, with o1 handling complex coding tasks after being given comprehensive context about the project vision and technical requirements.
My take: This episode highlights that reasoning models like o1 are a fundamentally different tool that requires a new approach to unlock their potential. The key is shifting from conversation to providing comprehensive context in the initial request. I love to do this by transcribing my voice, as this minimizes the time I need to input all the information relevant for my problem. Short sidenote(s): The transformation from tokens to words in this episode is not correct as 100 tokens are around 75 words and with a context size of 128k tokens for o1 this corresponds to 96k words which is around 300–350 pages and with that (only) one book. Secondly, the "Repo Prompt" passage did confuse me a bit. On 𝕏, the podcast host Nathan shares a link to the Repo Prompt tool. The Mac application allows you to select files and copy them together with a prompt in the clipboard and paste it all together as input to your AI chat interface. Furthermore, the tool provides capabilities to integrate the results in your own files. This workflow makes it possible to use your AI subscriptions more easily with your own files.

Disclaimer: This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.

Liked this Aidful News issue? Share it with a friend, colleague or on social media! Your support means a lot.

Aidful News