Run state-of-the-art AI models on your own hardware. No API keys, no billing meters, no data leaving your machine. One command. Permanent savings.
The Full Story
From AI's origins to running your own models — with interactive visualizations at every step. Each section builds on the last, culminating in a live cost comparison.
foundry model run, list, download, info, load, unload and service control commands.Why Foundry Local
Foundry Local doesn't wrap cloud AI — it replaces it at the infrastructure level. Same application code, same API format, different endpoint: your own machine.
Every token processed on your device. Nothing sent to external servers. Full compliance for GDPR, HIPAA, SOC 2, and regulated industries — architecturally guaranteed, not policy-controlled.
No network round trip. No shared API queue. No Tuesday-afternoon slowdowns. First token in milliseconds — real-time AI that actually feels real-time to your users.
Run 1 million tokens or 1 billion — the cost is identical. Hardware is a one-time investment. No surprise bills, no rate limits, no pricing changes on the provider's schedule.
Head-to-Head
An honest comparison across the dimensions that matter most to production AI deployments.
FAQ
Everything you need to know before running AI on your own hardware.
No. Foundry Local runs on CPU, GPU, NPU, and Apple Silicon — it auto-detects the best available hardware. A Copilot+ PC NPU delivers excellent performance for most models. GPU and CPU work fine too.
The Azure AI Foundry catalog includes Phi-4 Mini (3.8B), Phi-4 (14B), Llama 3.2 3B, Llama 3.1 8B, Mistral 7B, DeepSeek-R1, and more — all quantized and hardware-optimized.
Yes. Foundry Local exposes an OpenAI-compatible REST API on localhost:5272. Every SDK, every framework, every tool that talks to OpenAI also talks to Foundry Local — change one URL.
Windows x64 and ARM64 (including Copilot+ PCs), and macOS. Minimum: 8 GB RAM, 3 GB free disk space. Internet only needed for first-time model downloads.
Yes. Foundry Local runs as a background service. Use foundry service start to launch it, then call localhost:5272 from any process on the machine.
At 1,000 users doing 10 requests/day of 3,000 tokens: GPT-4o costs ~$4,500/month ($54,000/year). Foundry Local costs $0/month. Use the calculator to model your exact scenario.
Get Started Today
Start with the AI evolution story — it sets the stage for everything that follows. Or jump straight to the cost calculator to see your savings.