Foundry Lab · Microsoft AI on Your Hardware

Local AI,
Zero Cloud,
Total Control

Run state-of-the-art AI models on your own hardware. No API keys, no billing meters, no data leaving your machine. One command. Permanent savings.

Get Started → Read the Story 🧮 Calculate Savings

Foundry Local at a glance

$0per token / month

<50msfirst-token latency

100%data stays local

1 cmdto run any model

# Install & run — that's it

$ foundry model run phi-4-mini

✓ Server ready on localhost:5272

Why Foundry Local

Cloud AI vs Local AI — Three Decisive Differences

Foundry Local doesn't wrap cloud AI — it replaces it at the infrastructure level. Same application code, same API format, different endpoint: your own machine.

🔒

Data Never Leaves Your Hardware

Every token processed on your device. Nothing sent to external servers. Full compliance for GDPR, HIPAA, SOC 2, and regulated industries — architecturally guaranteed, not policy-controlled.

⚡

Sub-50ms First-Token Latency

No network round trip. No shared API queue. No Tuesday-afternoon slowdowns. First token in milliseconds — real-time AI that actually feels real-time to your users.

💰

Zero Token Cost, Forever

Run 1 million tokens or 1 billion — the cost is identical. Hardware is a one-time investment. No surprise bills, no rate limits, no pricing changes on the provider's schedule.

Head-to-Head

Cloud API vs Foundry Local

An honest comparison across the dimensions that matter most to production AI deployments.

Feature

☁ Cloud API

💻 Foundry Local

Data privacy

Leaves your network per request

100% on-device, always

First-token latency

300–800ms (network + queue)

<50ms (no network hop)

Monthly cost (1,000 users)

$4,500–$45,000 / month

$0 / month (HW already paid)

Compliance (GDPR / HIPAA)

Requires DPA, legal review

Architecturally guaranteed

Rate limits

TPM / RPM caps apply

No limits — you own it

Works offline

Yes, fully air-gapped

Migration effort

—

Change one URL in your code

FAQ

Frequently Asked Questions

Everything you need to know before running AI on your own hardware.

Do I need a GPU to run Foundry Local?

No. Foundry Local runs on CPU, GPU, NPU, and Apple Silicon — it auto-detects the best available hardware. A Copilot+ PC NPU delivers excellent performance for most models. GPU and CPU work fine too.

What models are available?

The Azure AI Foundry catalog includes Phi-4 Mini (3.8B), Phi-4 (14B), Llama 3.2 3B, Llama 3.1 8B, Mistral 7B, DeepSeek-R1, and more — all quantized and hardware-optimized.

Is it really OpenAI-compatible?

Yes. Foundry Local exposes an OpenAI-compatible REST API on localhost:5272. Every SDK, every framework, every tool that talks to OpenAI also talks to Foundry Local — change one URL.

What OS is required?

Windows x64 and ARM64 (including Copilot+ PCs), and macOS. Minimum: 8 GB RAM, 3 GB free disk space. Internet only needed for first-time model downloads.

Can I use it in a CI/CD pipeline or server?

Yes. Foundry Local runs as a background service. Use foundry service start to launch it, then call localhost:5272 from any process on the machine.

How much can I actually save vs GPT-4o?

At 1,000 users doing 10 requests/day of 3,000 tokens: GPT-4o costs ~$4,500/month ($54,000/year). Foundry Local costs $0/month. Use the calculator to model your exact scenario.

Local AI,
Zero Cloud,
Total Control

A Complete Story in 7 Chapters

Cloud AI vs Local AI — Three Decisive Differences

Cloud API vs Foundry Local

Frequently Asked Questions

Ready to run AI locally?

Local AI, Zero Cloud, Total Control

A Complete Story in 7 Chapters

Cloud AI vs Local AI — Three Decisive Differences

Cloud API vs Foundry Local

Frequently Asked Questions

Ready to run AI locally?

Local AI,
Zero Cloud,
Total Control