Your Data. Your Model. Your Machine.
Stop sending your sensitive documents to someone else's server. With LM Studio, you run powerful open-source models locally: zero cost, zero tracking, and 100% offline.
Download LM Studio, download a model (like Llama 3.2), and start the local server. Your AI is now available at http://localhost:1234/v1—fully compatible with the OpenAI API.
Recommended Models (2026)
Choose a model based on your hardware. Q4_K_M quantization is the gold standard for speed and intelligence.
Hardware & Model Guide
- Model: Llama 3.2 3B
- RAM: 4 GB
- Best for: Summarization, email drafts, basic chat.
- Model: Phi-4 Mini / Qwen 2.5 7B
- RAM: 6-8 GB
- Best for: Logic-heavy tasks and writing code.
- Model: Qwen 2.5 14B / DeepSeek R1
- RAM: 16 GB+
- Best for: Complex research and creative writing.
The 11 Essential Integrations
Local AI Ecosystem
Copilot Plugin: Chat with your entire vault. Ask: "What do I know about [topic]?"
Local RAG: Drop in PDFs/Folders. Get answers with local citations.
Browser Sidekick: Summarize any webpage or YouTube video directly in Chrome/Arc.
CLI Patterns: Pipe content to 200+ pre-made "patterns" (e.g., extract_wisdom).
Power User: Automating with n8n
Connect local AI to 400+ apps. Your data stays in your network while your agents work 24/7.
- Trigger: New RSS post or Email received.
- Process: HTTP Request to
http://host.docker.internal:1234/v1. - Action: Summarize text and save to Notion or send to Slack.
- Key Note: Use
host.docker.internalinstead oflocalhostinside Docker containers.
Quick Reference & Troubleshooting
Integration Summary
- HARPA AI / Raycast
- Setup: < 5 mins
- Usage: Browser/Keyboard
- n8n / Fabric
- Setup: 20 mins
- Usage: Automation/CLI
If your browser extension can't reach LM Studio, enable the CORS toggle in the Developer tab and restart the server.
Key Takeaways
Local models mean your internal company docs, medical records, or personal journals never touch the cloud.
Almost any tool that supports OpenAI can be "tricked" into using your local model by changing the Base URL.
A 4-bit quantized model (Q4) is 4x smaller and significantly faster than the original with almost zero loss in perceptible intelligence.