Apple Foundation Models: The On-Device LLM Framework, Explained

Q: How do I get structured, type-safe output instead of a string?

Annotate a Swift type with @Generable and the model returns that type, populated and type-checked, instead of a string you have to parse. This guided generation is the single feature that makes the framework worth using3.

Q: What is the context window of Apple’s on-device model?

SystemLanguageModel.default.contextSize reports the token budget, which is shared across the prompt and the generated response6. The on-device model offers 4K tokens; the Private Cloud Compute model offers 32K14. Long documents and long multi-turn histories will exceed the on-device budget, so plan for the limit or the session throws.

Q: Can the on-device model call my own code mid-generation?

Yes. The Tool protocol lets the model invoke your code to fetch data or take an action during generation, then fold the result back into its answer4.

June 03, 2026 18 min czytania Zaktualizowano July 07, 2026

ios swift foundation-models apple-intelligence on-device-llm ios-26

The Foundation Models framework gives an app direct, free, offline access to the same on-device large language model that powers Apple Intelligence¹. No API key, no per-token bill, no network round trip, no data leaving the device. For a class of features that used to mean a cloud LLM and a privacy review, the cost now rounds to zero. The trade is capability: the on-device model is small, the context window is finite, and the framework draws hard lines around what it will and will not do. Knowing those lines is the whole game.

This is the reference for the framework itself: the types you actually call, the one feature that makes it worth using, and the point where you should stop and reach for something bigger.

TL;DR

LanguageModelSession is the entry point. Create one, call respond(to:), get text back. Multi-turn context lives in the session; single-turn work gets a fresh session each time².
Guided generation is the reason to use this framework. Annotate a Swift type with @Generable and the model returns that type, populated and type-checked, instead of a string you have to parse³.
The Tool protocol lets the model call your code mid-generation to fetch data or take an action, then fold the result back into its answer⁴.
Check SystemLanguageModel.default.availability before you do anything. The model is absent on ineligible devices, with Apple Intelligence off, or while it downloads⁵.
The context window is real and small. SystemLanguageModel.default.contextSize reports the token budget shared across prompt and response⁶. On-device the budget is 4K tokens; the Private Cloud Compute model raises it to 32K¹⁴. Plan for it, or the session throws.
Requires iOS 26 and an Apple-Intelligence-capable device. Below that floor, the framework does not exist. The iOS 27 betas extend the same API with image input, per-request tool-calling control, and a server model on Private Cloud Compute¹²¹³¹⁴.

What the framework is, and what it is not

Foundation Models is not a wrapper around a cloud endpoint. The model lives on the device, ships with the operating system, and runs against the Neural Engine. That single fact drives every design decision in the API and every decision you make using it.

What you get: text generation, summarization, classification, extraction, short-form rewriting, and structured output, all on-device and all free. What you do not get: a frontier model. Apple built the on-device model for focused language tasks inside an app, not open-ended reasoning, not long-document analysis, not world knowledge you can quiz. Apple says as much, and the framing matters because it sets expectations the API will otherwise let you violate¹.

The mental model that keeps you out of trouble: treat the on-device model as a fast, private, free intern who is excellent at shaping text and terrible at knowing facts. Hand it material and a clear task. Do not ask it questions it has no way to answer.

LanguageModelSession: the entry point

Every interaction starts with a session.

import FoundationModels

let session = LanguageModelSession()
let response = try await session.respond(to: "Summarize this review in one sentence: \(reviewText)")
print(response.content)

The session holds conversation state. Each call to respond(to:) appends to the running transcript, so a session you keep around remembers what came before. For a chat feature, that is what you want. For independent one-shot tasks (summarize this, classify that), create a fresh session per call so stale context does not leak in and eat your token budget².

respond(to:) is async throws. It suspends while the model works and throws when the request exceeds the context window, when the model is unavailable, or when guardrails reject the content. Every one of those is a real branch you handle, not an edge case you ignore.

For a responsive UI, stream instead of waiting. streamResponse(to:) yields partial output as the model produces it, which turns a three-second stall into text that appears as it forms⁷.

Guided generation: the feature that earns the framework

Here is the part worth the price of admission. Most LLM integrations spend a third of their code coaxing valid JSON out of a model and the other two-thirds defending against the times it fails anyway. Foundation Models deletes that work.

Annotate a Swift type with @Generable, ask the session to generate it, and the model returns an instance of that type, populated and type-safe³:

@Generable
struct Recipe {
    @Guide(description: "The dish name")
    let title: String

    @Guide(description: "Ingredients, each as 'quantity item'")
    let ingredients: [String]

    @Guide(description: "Total minutes, start to finish", .range(5...240))
    let minutes: Int
}

let session = LanguageModelSession()
let response = try await session.respond(
    to: "A weeknight pasta for two.",
    generating: Recipe.self
)
let recipe = response.content   // a Recipe, not a String

No parsing. No JSONDecoder. No retry loop for malformed output. The @Guide macro constrains individual fields: a description the model reads as instruction, and optional limits like a numeric range or a regular expression the output must match⁸. The framework does not ask the model nicely for a number between 5 and 240; it constrains decoding so the field cannot come back otherwise.

The discipline this enforces is the real value. You design the output type first, in Swift, with the compiler checking it. The model fills a contract you defined instead of returning prose you reverse-engineer. For extraction, form-filling, and any feature that turns language into data, guided generation is the difference between a demo and shipping code.

One control worth knowing: respond(to:generating:) defaults includeSchemaInPrompt to true, which injects your type’s shape into the prompt to bias the model toward it. Leave it on unless the model already knows the format from training or from earlier turns in the session; turning it off to save tokens on a format the model has not seen is how you get garbage back⁹.

Tool calling: letting the model reach your code

Guided generation shapes what comes out. Tool calling changes what goes in. A tool is a piece of your code the model can invoke mid-generation to fetch information it does not have or perform an action, then continue its answer using the result⁴.

A tool conforms to the Tool protocol: a name, a description the model reads to decide when to call it, a @Generable Arguments type, and a call(arguments:) method that does the work⁴:

struct FindContacts: Tool {
    let name = "findContacts"
    let description = "Find a specific number of contacts from the address book"

    @Generable
    struct Arguments {
        @Guide(description: "How many contacts to return", .range(1...10))
        let count: Int
    }

    func call(arguments: Arguments) async throws -> [String] {
        // Fetch contacts, return formatted names.
    }
}

let session = LanguageModelSession(tools: [FindContacts()])
let response = try await session.respond(to: "Draft a dinner invite to three of my contacts.")

The flow: the model decides it needs contacts, calls your tool with a validated count, you return data, and the model writes the invite using real names. The arguments arrive type-checked through the same guided-generation machinery, so you never parse the model’s intent out of free text. The tool description is your only lever on when the model reaches for it, so write it like a function doc that another engineer (with no other context) has to read and use correctly.

This is also the seam where Foundation Models meets the rest of the agent story. A tool the on-device model calls and an App Intent Apple Intelligence calls are different surfaces with the same shape: a named, described, typed capability. Design the capability once and you can expose it through both.

Availability: the check you cannot skip

The model is not always there. It is absent on devices that do not support Apple Intelligence, when the user has it switched off, and during the window when the operating system is still downloading model assets. Ship code that assumes the model exists and it will crash, silently degrade, or hang for a population of your users you never tested on.

Check SystemLanguageModel.default.availability and branch on the reason⁵:

switch SystemLanguageModel.default.availability {
case .available:
    // Show the intelligence feature.
case .unavailable(.deviceNotEligible):
    // Hide it. This device will never have the model.
case .unavailable(.appleIntelligenceNotEnabled):
    // Prompt the user to turn on Apple Intelligence.
case .unavailable(.modelNotReady):
    // Downloading or otherwise not ready yet. Try again later.
case .unavailable(let other):
    // Unknown reason. Fail closed.
}

The three reasons demand three different product responses, and conflating them is the most common way these features feel broken. deviceNotEligible is permanent: hide the feature, do not nag. appleIntelligenceNotEnabled is a setting the user controls: a one-time prompt is fair. modelNotReady is temporary: retry, do not show an error. Build the unavailable path with the same care as the happy path, because for a real slice of devices it is the only path.

When the model is available and you know a request is coming, prewarm() on the session warms the model so the first real response lands faster¹⁰. Worth it on a screen the user is about to act on, wasteful if you call it speculatively.

Hands-on: a complete feature in one file

The pieces above compose into a real feature in less code than most networking layers need for one endpoint. The example below is a complete, compilable SwiftUI screen that turns freeform meeting notes into structured action items: availability check, @Generable output type, one guided-generation call, and the three unavailable branches handled. Every symbol comes from the framework surface documented above²³⁵⁸.

import SwiftUI
import FoundationModels

@Generable
struct ActionItems {
    @Guide(description: "One-sentence summary of the meeting")
    let summary: String

    @Guide(description: "Concrete follow-up tasks, each starting with a verb")
    let tasks: [String]

    @Guide(description: "How urgent the follow-ups are overall", .anyOf(["low", "medium", "high"]))
    let urgency: String
}

struct MeetingNotesView: View {
    @State private var notes = ""
    @State private var result: ActionItems?
    @State private var errorMessage: String?

    var body: some View {
        Form {
            TextField("Paste meeting notes", text: $notes, axis: .vertical)
                .lineLimit(6...12)

            Button("Extract action items") {
                Task { await extract() }
            }
            .disabled(notes.isEmpty)

            if let result {
                Section(result.summary) {
                    ForEach(result.tasks, id: \.self) { Text($0) }
                    Text("Urgency: \(result.urgency)")
                }
            }

            if let errorMessage {
                Text(errorMessage).foregroundStyle(.secondary)
            }
        }
    }

    private func extract() async {
        switch SystemLanguageModel.default.availability {
        case .available:
            do {
                let session = LanguageModelSession()
                let response = try await session.respond(
                    to: "Extract the action items from these notes: \(notes)",
                    generating: ActionItems.self
                )
                result = response.content
            } catch {
                errorMessage = "The model could not process these notes."
            }
        case .unavailable(.appleIntelligenceNotEnabled):
            errorMessage = "Turn on Apple Intelligence in Settings to use this feature."
        case .unavailable(.modelNotReady):
            errorMessage = "The model is still downloading. Try again shortly."
        case .unavailable:
            errorMessage = "This feature needs an Apple Intelligence-capable device."
        }
    }
}

Three details worth noticing in a sample this small. The output type is the API: ActionItems defines exactly what the feature produces, and the @Guide constraint on urgency means the string cannot come back as anything outside the three allowed values⁸. The session is created per call because each extraction is independent; a retained session would drag prior notes into the token budget². And the unavailable branches produce three different user experiences, not one generic error, which is the difference between a feature that degrades honestly and one that looks broken. Paste the file into an iOS 26 project, run it on an Apple Intelligence-capable device, and it works.

The context window, and where it stops being enough

SystemLanguageModel.default.contextSize reports the token budget the model works inside, and that budget is shared: prompt plus response together must fit⁶. The number is small relative to a cloud model, and you feel it fast on real input. A long document, a full chat history, a fat tool result: any of them can blow the budget and make respond throw.

Two failure modes follow, and both are yours to prevent. First, the slow creep: a multi-turn session accumulates transcript until one more turn overflows. Manage it by starting fresh sessions for unrelated work and by keeping per-turn input lean. Second, the single oversized request: a 20-page PDF does not fit, full stop. Chunk it, summarize the chunks, then reason over the summaries (the map-reduce that LLM engineers know well), or accept that the task is the wrong shape for an on-device model.

The context window is the cleanest signal for the decision that actually matters with this framework: when to stay on-device and when to leave. The numbers are now public: the on-device model works inside a 4K-token budget, and the Private Cloud Compute server model raises that to 32K¹⁴. Everything below about chunking applies with those figures attached.

What the iOS 27 betas add

Everything above describes the framework as it shipped in iOS 26, and all of it still holds. The iOS 27 betas extend the same surface in four directions, none of which breaks the iOS 26 mental model¹².

The prompt takes images. The on-device model gains Vision capabilities: you insert an image attachment into a prompt alongside text and the model answers about both. The new types are Attachment, ImageAttachmentContent, and ImageReference, and attachments accept UIImage, NSImage, CGImage, Core Image types, CoreVideo pixel buffers, and file URLs¹²¹³. Images work at any size and aspect ratio, but they spend from the same token budget your text does, so the 4K on-device window becomes the design constraint fast¹³. The full walkthrough is in Foundation Models image input in iOS 27.

Tool calling gets a throttle. GenerationOptions gains a toolCallingMode you set per request, which controls how the model interacts with the tools you attached, and the Vision framework ships ready-made OCRTool and BarcodeReaderTool implementations you attach to a session instead of writing your own recognition code¹⁵. The behavior details live in tool-calling control in iOS 27.

A bigger model, one line away. PrivateCloudComputeLanguageModel runs the same API against Apple’s server model on Private Cloud Compute, behind an entitlement, with the 32K context window and reasoning the on-device model does not have¹²¹⁴. Guided generation and tools work unchanged; switching models is the session’s model argument.

Sessions get more control surface. The betas add ContextOptions, TranscriptErrorHandlingPolicy, dynamic profiles (DynamicInstructions, LanguageModelSession.DynamicProfile), and a custom language model provider protocol (LanguageModel, LanguageModelExecutor) that lets a session drive a model you supply rather than the system’s¹². watchOS also joins the platform list at 27.0¹².

The framing to keep: iOS 26 code compiles and behaves the same on iOS 27. The betas widen what a prompt can carry and where the model can run; they do not change what the framework is.

When not to use Foundation Models

The framework is free, private, and offline, which makes it tempting to reach for everywhere. Resist. Reach past it when:

You need real reasoning or breadth of world knowledge. The on-device model is small by design. Open-ended reasoning, code generation, and deep analysis belong to a frontier cloud model. Asking the on-device model for them produces confident, wrong answers.
The input does not fit the context window and chunking would destroy the meaning. Some tasks need to see everything at once.
You need a model you control: a specific checkpoint, a fine-tune, custom weights, deterministic versioning across OS updates. Apple ships and updates the model on its schedule, not yours.
You are below iOS 26 or on an ineligible device. The framework simply is not there, and the availability check will tell you so on every run.

For the on-device cases the framework does not cover (a custom model, your own weights, training on the device), the layers below are Core ML for a fixed converted model, MLX for open-weight models and fine-tunes you own, and iOS 27’s Core AI when you need explicit control over specialization and scheduling. For the cases that genuinely need scale, Private Cloud Compute or a cloud LLM behind a privacy boundary is still the honest answer. Foundation Models is not a replacement for any of them. It is the right first reach for focused language work on text you already hold, and the wrong reach for everything else.

The skill this framework rewards is not prompt-craft. It is taste about scope: feeding the model tasks it is good at, designing @Generable types that capture exactly what you need, and recognizing the moment the work outgrows the device. Build with those instincts and the on-device model does a surprising amount of real work for free. Ignore them and you ship a feature that breaks for every user whose input ran one token too long.

FAQ

Is Apple’s Foundation Models framework free to use?

Yes. The framework gives an app direct, free, offline access to the same on-device model that powers Apple Intelligence. There is no API key, no per-token bill, and no network round trip¹.

What devices and iOS version does Foundation Models require?

It requires iOS 26 and an Apple-Intelligence-capable device. Below that floor the framework does not exist, and even on a supported OS the model is absent on ineligible devices, with Apple Intelligence turned off, or while the model downloads. Always check SystemLanguageModel.default.availability before you use it⁵.

How do I get structured, type-safe output instead of a string?

Annotate a Swift type with @Generable and the model returns that type, populated and type-checked, instead of a string you have to parse. This guided generation is the single feature that makes the framework worth using³.

What is the context window of Apple’s on-device model?

SystemLanguageModel.default.contextSize reports the token budget, which is shared across the prompt and the generated response⁶. The on-device model offers 4K tokens; the Private Cloud Compute model offers 32K¹⁴. Long documents and long multi-turn histories will exceed the on-device budget, so plan for the limit or the session throws.

Does Foundation Models work offline, and does it send data to Apple?

It runs entirely on-device against the Neural Engine. No data leaves the device and no network round trip is required, which is what makes it suitable for features that used to need a cloud LLM and a privacy review¹.

Can the on-device model call my own code mid-generation?

Yes. The Tool protocol lets the model invoke your code to fetch data or take an action during generation, then fold the result back into its answer⁴.

When should I not use Foundation Models?

Reach past it when you need a frontier model: open-ended reasoning, code generation, long-document analysis, or world knowledge. Apple built the on-device model for focused language tasks inside an app, so asking it for general intelligence produces confident, wrong answers¹.

What does iOS 27 add to Foundation Models?

The iOS 27 betas add image input (attachments in the prompt, created from UIImage, CGImage, pixel buffers, and more), per-request tool-calling control through GenerationOptions, ready-made OCRTool and BarcodeReaderTool Vision tools, and PrivateCloudComputeLanguageModel for running the same API against Apple’s 32K-context server model¹²¹³¹⁴¹⁵. iOS 26 code runs unchanged.

Apple Developer, “Foundation Models” framework overview. Apple describes the framework as access to the on-device model that powers Apple Intelligence, suited to focused language tasks such as text generation, summarization, classification, and structured output rather than open-ended reasoning or world knowledge. ↩↩↩↩↩
Apple Developer, “LanguageModelSession” and “Generating content and performing tasks with Foundation Models”. A session holds multi-turn context; Apple’s guidance is to create a new session for each distinct single-turn interaction. ↩↩↩↩
Apple Developer, “Generable” and “Prompting an on-device foundation model”. The @Generable macro lets the framework return a populated, type-checked Swift value rather than a string. ↩↩↩↩
Apple Developer, “Tool” protocol. Defines protocol Tool<Arguments, Output>: Sendable with required name, description, and parameters: GenerationSchema, plus call(arguments:) async throws -> Output. The Arguments type conforms to ConvertibleFromGeneratedContent and is typically declared @Generable. ↩↩↩↩
Apple Developer, “SystemLanguageModel.Availability” and its UnavailableReason. Cases: .available and .unavailable(...) with reasons deviceNotEligible, appleIntelligenceNotEnabled, and modelNotReady. SystemLanguageModel.default.isAvailable is the convenience boolean. ↩↩↩↩
Apple Developer, “SystemLanguageModel.contextSize”. An instance property (reached through SystemLanguageModel.default) documented as the maximum context size, representing the total tokens across input prompt and generated response. ↩↩↩
Apple Developer, “LanguageModelSession.streamResponse(to:)”. Streams partial generated output as the model produces it, for incremental UI updates. ↩
Apple Developer, “Guide(description:_:)”. A peer macro that attaches a natural-language description and optional constraints (numeric ranges, regular-expression guides) to a @Generable property. Requires iOS 26.0+. ↩↩↩
Apple Developer, “respond(to:schema:includeSchemaInPrompt:options:)”. includeSchemaInPrompt defaults to true; Apple’s discussion recommends keeping the default unless the model already knows the expected format. ↩
Apple Developer, “LanguageModelSession.prewarm()”. Asks the framework to load model resources ahead of a known upcoming request to reduce first-response latency. ↩
Author’s related analysis: On-Device LLMs with Apple’s Foundation Models, Custom Adapters for Foundation Models, Foundation Models Use Cases, and Agentic Workflows on Foundation Models. The App Intents and tool-surface argument is developed in App Intents Are Apple’s New API to Your App. ↩
Apple Developer, “Foundation Models” framework topics as of July 2026. The types marked beta for the 27.0 releases include Attachment, ImageAttachmentContent, and ImageReference (prompt attachments); ContextOptions and TranscriptErrorHandlingPolicy; DynamicInstructions and LanguageModelSession.DynamicProfile (dynamic profiles); PrivateCloudComputeLanguageModel with the com.apple.developer.private-cloud-compute entitlement; and the custom provider surface LanguageModel, LanguageModelCapabilities, and LanguageModelExecutor. The framework’s platform list adds watchOS 27.0 (beta). ↩↩↩↩↩↩↩
Apple, WWDC26 session 241, “What’s new in the Foundation Models framework”. Image attachments “can be created from a variety of types including UIImage, NSImage, CGImage, Core Image types, CoreVideo Pixel Buffers, and file URLs”; “the model supports images in any size and aspect ratio,” and “larger images will consume more tokens and incur more latency.” ↩↩↩↩
Apple, WWDC26 session 319, “Build with the new Apple Foundation Model on Private Cloud Compute”. “The on-device model offers 4k, and with PCC you get 32K”; the session demonstrates switching from the on-device model to the PCC server model by changing one line, with guided generation and tool calling working the same on both. ↩↩↩↩↩↩
Apple Developer, “GenerationOptions.ToolCallingMode” (iOS 27 beta; the toolCallingMode property and the init(samplingMode:temperature:maximumResponseTokens:toolCallingMode:) initializer), and the Vision framework’s “OCRTool” and “BarcodeReaderTool” (iOS 27 beta), which conform to the Foundation Models Tool protocol. ↩↩