Foundation Models On-Device LLM: The Tool Protocol
Genre: frontier-essay. The post names the on-device LLM contract Apple shipped at WWDC 2025 and walks the routing question: when is LanguageModelSession the right answer, when is AppIntent, when is MCP, when is none of the above.
iOS 26 ships a 3-billion-parameter language model on every Apple Intelligence-capable device.1 Apple calls the framework Foundation Models. The framework is local. Inference is optimized for Apple silicon and runs on-device; network is not in the call path. The model lives at SystemLanguageModel.default, your app gets a LanguageModelSession, and the typed surface for letting that session do useful work is the Tool protocol.
The Tool protocol is the part that matters for app developers. Without it, the on-device LLM is a chat completion endpoint with no connection to your app’s data, your user’s data, or the rest of the system. With it, the model can call typed Swift functions, get typed results back, and reason about the answer in its next turn. Tool-augmented on-device generation is the framework’s actual capability. The chat surface is the demo.
TL;DR
- Foundation Models gives every Apple Intelligence-eligible device a 3B-parameter LLM at
SystemLanguageModel.default. The model is local; inference is optimized for Apple silicon and runs on-device; network is out of the call path. - The Tool protocol is the contract between the model and your app. A tool declares typed
Arguments, returns a typedOutput, and is bound to aLanguageModelSessionat construction time. GenerableandGuideannotations let the model produce typed Swift values directly, not just strings. The decoder is part of the framework, not your code.- The routing rule between Foundation Models, App Intents, and MCP is who runs the model and where. Foundation Models = your app runs the model on-device. App Intents = Apple Intelligence runs the model on-device and routes to your app. MCP = an external host runs the model wherever it likes and reaches into your app through a tool server.
What The Framework Actually Provides
Three primitives carry the framework: the model, the session, and the tool.2
SystemLanguageModel. A reference to the on-device foundation model. The default instance is bound to the user’s device, available on Apple Intelligence-eligible hardware, and exposes capability checks the app reads at runtime to decide whether the model is available. The framework supports configuration through SystemLanguageModel(useCase:guardrails:), custom adapters, and GenerationOptions, but you do not pick arbitrary cloud model IDs the way you would against OpenAI or Anthropic; Apple ships and updates the on-device model per OS release, and the framework hands you whichever version is currently installed.
LanguageModelSession. A stateful object that holds conversation state across calls. The session takes a system prompt at construction, accumulates user/assistant turns over time, and exposes async methods for generating responses. Sessions are lightweight to create and disposable; you create one per task, not one per app. A meditation timer creates a session for “summarize my last 7 days of practice”; a recipe app creates a different session for “convert this for two people instead of four.”
Tool (the protocol). A Swift protocol that declares an Arguments type, an Output type, and an async call(arguments:) function. Tools are bound to a session at construction (LanguageModelSession(tools: [...], instructions: ...)). When the model decides it needs a tool, it emits a structured call; the framework decodes the arguments, runs the tool, encodes the result, and feeds it back into the session for the next turn. The model does not see Swift; the framework does the marshalling.
The Tool protocol shape, condensed:
import FoundationModels
struct WaterEntryLookup: Tool {
let name = "lookup_water_entries"
let description = "Look up water intake entries for a given date range."
@Generable
struct Arguments {
@Guide(description: "Start date in ISO-8601 format")
let startDate: String
@Guide(description: "End date in ISO-8601 format")
let endDate: String
}
func call(arguments: Arguments) async throws -> ToolOutput {
let entries = try store.entries(
from: ISO8601DateFormatter().date(from: arguments.startDate) ?? .now,
to: ISO8601DateFormatter().date(from: arguments.endDate) ?? .now
)
return ToolOutput(GeneratedContent(properties: [
"count": entries.count,
"total_ml": entries.reduce(0) { $0 + $1.amountMl }
]))
}
}
The model sees a tool description and a JSON-shaped argument schema. The Swift code sees typed input and typed output. The decode/encode boundary is the part Apple owns.
Generable And Guide: Typed Output Without A Parser
The same annotation system that makes tool arguments typed also lets the model produce typed Swift values directly.3 A @Generable struct declares its shape; the framework constrains the model’s output to match.
@Generable
struct PracticeSummary {
@Guide(description: "Single-sentence headline summarizing the user's week")
let headline: String
@Guide(description: "Total practice duration this week in minutes")
let totalMinutes: Int
@Guide(description: "Three short observations as bullet points")
let observations: [String]
}
let session = LanguageModelSession(instructions: "You are a meditation coach.")
let summary = try await session.respond(
to: "Summarize this week of practice given the entries.",
generating: PracticeSummary.self
)
The model returns a PracticeSummary value. No JSON parsing in your code, no string-matching for “headline:”, no fallback when the model returned a malformed object. The framework uses constrained decoding to keep the model’s token-by-token output structurally aligned with the schema, so structurally invalid output does not slip past the boundary.
The Swift-typed surface is what distinguishes the framework from cloud LLM SDKs. Cloud SDKs (OpenAI Structured Outputs, Anthropic tool use, others) also support constrained decoding, but the typed value the developer receives is a JSON object validated against a schema, then decoded into a Codable Swift type as a separate step. Foundation Models collapses those steps: the @Generable macro and the framework’s decoder produce a typed Swift value as the direct return, with the per-field @Guide annotations carrying intent into the generation constraint. The output is typed because the generation was typed against the Swift schema, not against a JSON spec the developer reconstructed in Swift.
@Guide annotations are how you communicate per-field intent to the model without writing it into the prompt. The generated description becomes part of the generation constraint. Field-level guides keep the prompt clean and the schema close to the data.
The Routing Question, Three Ways
Apple now offers three protocol surfaces an app can use to expose its domain to a language model. They route to different runners.
Foundation Models (LanguageModelSession). Your app loads the on-device model and runs the inference. Tools the session can call are tools your app’s code defines. The model never leaves the device. The user does not invoke this through Siri; your app’s code does. The use case is inside your app: a meditation app that uses the LLM to summarize a week, a recipe app that adapts a recipe for fewer servings, a water tracker that turns “I had a glass with lunch” into a structured entry.
App Intents. Apple Intelligence runs an LLM on the user’s behalf (Apple’s first-party agent) and routes capability calls to your app’s AppIntent types. Your app does not run the model. You declare typed actions through the App Intents framework, and Apple’s system stack decides when to invoke them based on user request, Spotlight query, Siri input, or Shortcuts orchestration. Covered in detail in App Intents Are Apple’s New API to Your App.4
MCP. An external host (Claude Desktop, Claude Code, Cursor, ChatGPT) runs whichever model the developer chose. Your app exposes a server that the host’s model can call. The model runs wherever the host runs it; tool calls cross a JSON-RPC transport. Covered in Two Agent Ecosystems, One Shopping List and the routing-question synthesis in App Intents vs MCP.5
The routing decision boils down to who is the agent.
┌──────────────────────────────────┐
│ Who is the language model? │
└────┬─────────────┬─────────────┬──┘
│ │ │
┌────────┴────┐ ┌──────┴──────┐ ┌────┴──────┐
│ Your app's │ │ Apple │ │ External │
│ own use of │ │ Intelligence│ │ host's │
│ LLM │ │ agent │ │ agent │
└──────┬──────┘ └──────┬──────┘ └────┬──────┘
│ │ │
▼ ▼ ▼
Foundation Models App Intents MCP
+ Tool protocol + AppEntity + tools/list
(on-device, your (system runs (host runs
app runs model) the model) the model)
A meditation app summarizing the user’s week uses Foundation Models because the app itself wants to call the model and present a result inside the app. The same app’s “log a 5-minute session” capability uses App Intents so Siri can invoke it. The same app’s “show me my last meditation log entries” capability used by a Claude Code session uses MCP. Three different runners, three different obligations, one shared domain layer underneath.
Inference Budgets: What The Framework Asks Of You
Running an LLM on-device is not free. Apple silicon handles the inference, but the model still has a context window, a token budget, and a wall-clock latency that depends on the device. Three constraints shape how you design with the framework:6
Availability is per-device. Not every iOS 26 device has Apple Intelligence. Older iPhones, locked-down devices, and devices where the user has disabled Apple Intelligence return a non-available state from SystemLanguageModel.default.availability. Code that calls LanguageModelSession without checking availability surfaces a generation error at runtime; the right pattern is to branch the UI on the availability state ahead of time and present an LLM-free path when the state is not available. Treat the model as a feature flag, not a guarantee.
Latency is non-trivial. First-token latency on the iPhone 16 Pro is usable for in-app interactions; longer generations and tool-calling chains are not instant. UI patterns that work for cloud LLM streaming work here too; do not block the main thread, do show progressive output, and do design for the case where the user navigates away mid-generation.
Context windows are smaller than cloud. The on-device model has a smaller context window than GPT-4-class cloud models. Long documents need summarization or chunking. Long conversation history needs trimming. Tool outputs that return large structured payloads should return a reference (an ID, a key) the next turn can re-fetch on demand, not the entire payload inline.
The constraint set is similar to designing for a low-end edge runtime, not a frontier cloud model. The framework’s affordances make it more pleasant; the underlying physical limits do not move.
When To Reach For Foundation Models
The framework’s strongest fits are where on-device, low-friction generation is the product:
Reformatting and rewriting. Turn a user’s freeform note into a structured entry, polish a draft message, summarize a captured transcript. Latency tolerance is moderate; data sensitivity is high; cloud inference is overkill.
Local synthesis over private data. A workout app turning a user’s workout history into a “this week” summary. A finance app explaining a user’s spending pattern. A journal app surfacing themes across a quarter of entries. The data should not leave the device; the answer should appear in-app; the prompt is bounded.
Lightweight tool-calling for app-internal automation. An app that lets the user say “show me Tuesday’s meditation log” and uses a tool to fetch the underlying records, then formats the answer. The agent is the app, the tool is the app’s own data layer, the model is local.
Type-conforming generation. Anywhere the app would otherwise hand-write a JSON parser or a string template, @Generable plus @Guide is a more durable surface.
When To Not Reach For Foundation Models
The framework is the wrong answer for several common cases:
Anything the user might ask Siri. “Log 250ml of water”, “Start a 5-minute meditation”, “Add bananas to my list” are App Intents. Apple Intelligence is the runner; your app is the destination. Foundation Models is for inside-the-app generation, not for Siri-routed actions. If you build the same capability twice (App Intent + LanguageModelSession with a tool), the App Intent wins because the user invokes Siri, not your in-app screen.
Anything an external LLM agent should drive. A Claude Code session reaching into your app’s domain belongs over MCP. The app does not run the LLM; the host does; the model lives wherever the host put it. Foundation Models cannot serve external agents.
Heavy reasoning on large documents. The on-device model is small. A 200-page contract, a long codebase context, or multi-image reasoning over many photos belongs in cloud inference (yours or a vendor’s), where the context window and parameter count match the workload. Tasks that exceed the framework’s envelope produce concrete errors: exceeded context window, guardrail violations, unsupported locales. Surface those errors deliberately rather than designing flows that depend on the model handling out-of-envelope work.
Cross-device and cross-user workflows. The on-device model has access only to what the app passes into the session. Cross-device sync (timer state from Watch to iPhone), cross-user collaboration (shared lists, shared documents), and any flow that benefits from server-side coordination need a server. The model is not a network primitive.
What I Would Build Differently In My Stack
The framework rewards a specific architecture choice that is easy to get wrong on the first pass. Capabilities the user invokes through app UI and the LLM should consume as tools, not as duplicate prose paths.
A meditation app might add an LLM-summarized “weekly review” pane. The naive build is one prompt: “Here are the user’s entries this week, write a paragraph.” The better build defines a WeeklyEntries tool the model can call when it needs to know what was in the week, plus structured WeeklySummary output via @Generable. The first build is brittle (the model has to ingest a long entry list every call), expensive in tokens, and produces unstructured prose. The second is durable (the tool-call separates “what happened” from “how to talk about it”), cheap (the model only fetches what it needs), and structured (the result is a typed Swift value).
The pattern composes with App Intents and MCP cleanly. The same WeeklyEntries query is also the body of an AppIntent parameter resolver and an MCP tool handler. One Swift function; three surfaces. The model calls the same function the user calls.
The other architecture decision: tool descriptions are part of the prompt. The model reads Tool.description to decide whether and when to call. Treat the description like a docstring you actually expect a future contributor to read; the model is the future contributor.
What The Pattern Means For The Apple Stack On iOS 26+
Three takeaways.
-
The on-device LLM is a runtime feature, not a backend. Treat it like a system framework with a context window and an on-device inference budget, not like a remote service. The architecture decisions are availability, latency, context-window discipline, and structured output.
-
The Tool protocol is the surface. Without tools, the model is a chat completion endpoint with no connection to your domain. With tools, the model becomes a structured query layer over your app’s data.
-
The routing rule between Foundation Models, App Intents, and MCP is “who runs the model.” Inside-the-app generation goes to Foundation Models. Apple Intelligence-routed capabilities go to App Intents. External-agent capabilities go to MCP. The same Swift domain function can be called by all three surfaces.
The full Apple Ecosystem cluster: typed App Intents for Apple Intelligence; MCP servers for cross-LLM agents; the routing question between the two; Live Activities for the Lock Screen state machine; Liquid Glass patterns for the visual layer; multi-platform shipping for cross-device reach. The hub is at the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.
FAQ
What is the Foundation Models framework in iOS 26?
Foundation Models is Apple’s framework for accessing the on-device language model that ships with Apple Intelligence-eligible devices in iOS 26 (and iPadOS 26, macOS 26, visionOS 26). The framework exposes SystemLanguageModel, LanguageModelSession, and the Tool protocol so apps can run typed, on-device LLM calls without network access.
How does the Tool protocol work?
A Tool is a Swift type that declares an Arguments struct (annotated with @Generable and @Guide), an async call(arguments:) method, and a name + description the model uses to decide when to call. Tools are bound to a LanguageModelSession at construction. When the model decides it needs a tool, the framework decodes the arguments, runs the call, and feeds the typed output back into the session.
What’s the difference between Foundation Models, App Intents, and MCP?
Foundation Models is for your app to run the LLM on-device for in-app generation. App Intents is for Apple Intelligence (the system agent) to call your app’s typed capabilities. MCP is for external LLM hosts (Claude, ChatGPT, etc.) to call your app’s typed tools across a JSON-RPC transport. The three protocols differ in who runs the model. The same Swift domain function can serve all three.
Can Foundation Models call MCP tools?
No. LanguageModelSession.tools accepts conformers to Apple’s Tool protocol, not MCP tool servers. To bridge the two, you would write a Foundation Models Tool whose call method invokes an MCP client. Apple has not shipped a built-in adapter; the bridge would be app-side code.
Is the on-device model good enough for production?
For the use cases the framework is designed for (reformatting, summarization, structured generation over local data, lightweight tool-calling), yes. For frontier reasoning over large contexts, multimodal understanding at scale, or cross-document reasoning, no. The on-device model is a 3-billion-parameter model with a smaller context window than cloud LLMs; pick workloads that fit the envelope.
References
-
Apple Developer, “Apple Intelligence and machine learning” and the WWDC 2025 session “Meet the Foundation Models framework”. The framework’s headline number (a 3-billion-parameter on-device language model) is from Apple’s WWDC 2025 announcement. ↩
-
Apple Developer, “FoundationModels framework”.
SystemLanguageModel,LanguageModelSession,Tool,ToolOutput, and supporting types. ↩ -
Apple Developer, “Generating Swift data structures with guided generation” and the
@Generable/@Guidemacro reference. Type-constrained generation as a first-class capability via constrained decoding. ↩ -
Author’s analysis in App Intents Are Apple’s New API to Your App, April 28, 2026. ↩
-
Author’s analysis in Two Agent Ecosystems, One Shopping List, April 29, 2026, and App Intents vs MCP: The Routing Question, April 30, 2026. ↩
-
Apple Developer, “Adopting Apple Intelligence in your app” and “SystemLanguageModel” for
availabilitypatterns. Apple’s WWDC 2025 sessions cover the on-device inference path on Apple silicon and per-device availability constraints. ↩