Installation
Foundry Local runs on Windows (x64 / ARM64) and macOS. Minimum requirements: 8 GB RAM, 3 GB free disk space, internet connection for first-time model downloads.
# Windows — via winget
winget install Microsoft.FoundryLocal
# macOS — via Homebrew
brew tap microsoft/foundry
brew install foundry-local
Verify Installation
foundry --version
Foundry Local 1.x.x
Model Commands
All model commands follow the pattern foundry model <command> [model-alias]. Model aliases are short, memorable names like phi-4-mini.
foundry model run
Downloads the model if not already cached, loads it into the service, and starts an interactive chat session in the terminal.
foundry model run phi-4-mini
✓ Downloading phi-4-mini (INT4, 2.5GB)...
✓ Optimizing for your hardware (NPU detected)...
✓ Server ready on localhost:5272
You > Hello, what can you do?
Phi-4 Mini > I can help with writing, coding, analysis...
The model stays running as a background service after model run. Your applications can call localhost:5272 without restarting the model each time.
foundry model list
Lists all available models in the catalog — both downloaded and available for download.
foundry model list
ALIAS SIZE STATUS HARDWARE
phi-4-mini 2.5 GB downloaded NPU/CPU
phi-4 8.5 GB available GPU/CPU
llama-3.2-3b 2.0 GB available CPU/GPU
llama-3.1-8b 5.0 GB available GPU
mistral-7b 4.5 GB available GPU
deepseek-r1-7b 5.0 GB available GPU
foundry model download
Downloads a model to the local cache without starting the service. Useful for pre-caching models before going offline.
foundry model download phi-4-mini
Downloading phi-4-mini... [████████████████████] 100%
✓ Saved to ~/.foundry/models/phi-4-mini
foundry model info
Shows detailed information about a specific model: full name, quantization type, hardware compatibility, license, and source.
foundry model info phi-4-mini
Alias: phi-4-mini
Full name: phi-4-mini-instruct-cuda-int4-rtn-block-32-acc-level-4
Parameters: 3.8B
Quantization: INT4 (RTN, block-32)
Size: 2.5 GB
Hardware: NPU, NVIDIA CUDA, CPU
License: MIT
Source: Azure AI Foundry catalog
foundry model load
Loads an already-downloaded model into the inference service without starting an interactive session.
foundry model load phi-4-mini
✓ phi-4-mini loaded · API available at localhost:5272
foundry model unload
Removes a model from memory, freeing hardware resources for other workloads.
foundry model unload phi-4-mini
✓ phi-4-mini unloaded · 2.5 GB freed
Service Commands
Manage the Foundry Local background service directly.
# Start the service (without loading a model)
foundry service start
# Check service status
foundry service status
✓ Running · localhost:5272 · phi-4-mini loaded
# Stop the service
foundry service stop
# Show service logs
foundry service logs
Using the API
Once a model is loaded, call it using any OpenAI-compatible client. The endpoint is http://localhost:5272/v1.
# Python — openai SDK
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:5272/v1",
api_key="not-needed" # any string works
)
response = client.chat.completions.create(
model="phi-4-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# curl — test from any terminal
curl http://localhost:5272/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi-4-mini",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'
Quick Reference
| Command | What it does |
|---|---|
foundry model run <alias> | Download (if needed) + load + interactive chat |
foundry model list | Show all catalog models and download status |
foundry model download <alias> | Download model to local cache only |
foundry model info <alias> | Show model details, hardware, license |
foundry model load <alias> | Load cached model into service |
foundry model unload <alias> | Remove model from memory |
foundry service start/stop/status | Control the background service |
foundry service logs | View service output and errors |