Foundation Models on Private Cloud Compute
The on-device Foundation Model grew a sibling. iOS 27 gives the framework a server-scale model that runs on Private Cloud Compute, with a 32K context window and reasoning, and you reach it by changing one line of code1. Same LanguageModelSession, same Generable, same Tool protocol4. The bigger move sits underneath: Apple opened the framework to nearly any LLM through a public protocol, so the on-device model, the cloud model, a local model you ship, an open-source model from Hugging Face, and soon Claude and Gemini all answer to the same Swift API2. You stop coding against a model and start coding against a slot you can swap.
This post is the cloud-and-provider layer on top of the framework reference. If you have not met LanguageModelSession, the Tool protocol, or guided generation yet, start with the Foundation Models framework explainer and the iOS 27 tool-calling post, then come back.
TL;DR
- Private Cloud Compute brings a larger server model into the Foundation Models framework, switchable by changing one line from the on-device model. It offers a 32K context window against the on-device 4K, supports reasoning at three levels, and runs from iOS, macOS, visionOS, and watchOS1.
- The privacy posture matches the system model’s: Apple designed PCC so user data is never stored and is used only for the request, independently verified by researchers, with no API keys, no account setup, and no token cost to the developer1.
- Each user gets a daily request limit counted against their iCloud account, upgradeable through iCloud+. Handle the limit in your UI by checking the model’s quota state and showing a persistent, actionable control rather than an alert. Apply for access on the developer website; available to apps under 2M downloads1.
- The new
LanguageModelprotocol makes every model a swap-in: System, PCC, Core AI for local models on the ANE, MLX for the Hugging Face community, and provider packages from Anthropic and Google to come2. DynamicProfilelets a single session move between those models mid-conversation, so a brainstorming turn can use PCC with high temperature and a review turn can drop to the on-device model to save server calls3.
A bigger model, the same three lines
Last year the pitch was that prompting the on-device model takes three lines: create a session, call respond, read the answer1. This year that pitch extends to the cloud. The framework offers a unified Swift API regardless of which model you talk to, so switching from the on-device System model to the PCC model changes the model you construct and nothing else1. Structured output through Generable and tool calling behave identically across both1.
Louis on session 319: prompting the on-device model takes three lines, and switching to the PCC server model is a one-line change to a much larger model with bigger context and reasoning.
The shape of the swap, in the framework’s own terms:
import FoundationModels
// On-device: the System model.
let onDevice = LanguageModelSession(model: SystemLanguageModel.default)
// Cloud: swap the model. Same session API, same prompts, same tools.
let cloud = LanguageModelSession(model: PrivateCloudComputeLanguageModel.default)
let summary = try await cloud.respond(to: "Summarize this 30-page contract.")
The symbol names come straight from the session: Apple exposes the cloud model as PrivateCloudComputeLanguageModel, and the session shows context size read off a contextSize property on both SystemLanguageModel and PrivateCloudComputeLanguageModel1. Because the cloud model conforms to the same LanguageModel protocol every other model conforms to, the rest of your code does not notice the difference2.
One constraint carries over from the on-device model and deserves a hard check: PCC runs only on devices that support Apple Intelligence. Check the availability API and handle the case where Apple Intelligence is unavailable, the same way you already gate the on-device model1.
What PCC buys you, and what it costs
PCC is Apple’s answer to the use cases the on-device model cannot reach: assistants that reason over large user input, or features that fire many tool calls with large outputs1. The trade is concrete rather than vibes-based, and the session lays it out as a head-to-head.
| On-device System model | Private Cloud Compute | |
|---|---|---|
| Privacy | On-device | Data never stored, used only for the request1 |
| Connectivity | Works offline | Requires an internet connection1 |
| Request limits | None | Daily limit per user1 |
| Context size | 4K | 32K1 |
| Reasoning | — | Three levels: light, moderate, deep1 |
Two rows carry most of the decision. The jump from 4K to 32K is what makes the “summarize a long document with images” feature viable on the cloud model and cramped on the on-device one1. Reasoning is the other: where a plain response reads the prompt and generates, a reasoning response generates extra text in a separate segment of the transcript before it answers1. The three levels scale that thinking budget. Light gathers a little extra context, moderate reasons deeper, and deep can produce a reasoning segment longer than the answer itself1. You set the level when you call respond on the session1.
Reasoning is not free. The reasoning segment is text the model generates, so it consumes tokens and counts against the 32K context budget1. The session is blunt about the discipline that demands: decide between on-device and PCC, and pick the reasoning level, from data rather than vibes1. Apple shipped a new Evaluations framework in Xcode for exactly that, because the on-device model performs better than you expect at many tasks and the only way to know is to measure1.
The privacy posture is the headline
A server model that handles the user’s private input is usually where the privacy story falls apart. PCC is built so it does not. Apple designed Private Cloud Compute with end-to-end privacy in mind, ensuring user data is never stored and is used only for the request, and researchers have independently verified the design1. PCC already powers Apple Intelligence’s own complex tasks; the framework opens that same infrastructure to your app1.
The operational consequences are the part developers feel. PCC is integrated into the OS alongside iCloud, so there is no authentication to wire up, no API keys to rotate, and no account setup to ask the user for1. The user needs a device that supports Apple Intelligence and nothing more. There are no token costs to you as the developer; each user gets a daily limit, and users can raise it through iCloud+1. The model is available to apps with fewer than 2M downloads, and you apply on the developer website1.
Session 319 on the privacy guarantees: no account setup, no authentication, no API keys, and no token cost to the developer, with each user’s requests counted against their iCloud account.
Handling the daily limit without breaking the UI
The daily limit is the one place a cloud model intrudes on UX, and the session is opinionated about how to handle it. Requests count against the user’s iCloud account, and a request that exceeds the limit throws an error1. Surfacing that raw error in the UI is the wrong move, because the error is not actionable1.
Instead, check the quota state on the model and render your own control. The session checks isLimitReached on the model’s quotaUsage and, when the limit is exceeded, shows a button that lets the user manage or upgrade their limit1. Two rules govern the presentation. Do not use an alert, because the limit state should persist rather than be dismissed; update your UI’s state instead, for example by disabling the request button and showing a subtle label with an upgrade action beneath it1. And detect the approaching case too: the model exposes a belowLimit state so you can warn a user who is close, letting them decide which requests are worth spending1.
// Sketch following the session's pattern.
let quota = PrivateCloudComputeLanguageModel.default.quotaUsage
if quota.isLimitReached {
// Persistent label + upgrade button. No alert.
showUpgradeAffordance()
} else if quota.belowLimit {
// Optional: warn the user they are nearing the daily limit.
showNearingLimitNotice()
}
Xcode helps you build this without burning real quota. In the scheme’s Debug Options, the “Simulate Apple Foundation Models Availability” setting offers “Quota Usage Limit Reached” and “Nearing Usage Limit,” so you can exercise both UI states in the simulator1.
Bring your own LLM: the provider protocol
The deeper change in iOS 27 is that Foundation Models stopped being a single-model framework. Apple rebuilt the on-device System model and added three more first-party options, then opened the door to everyone else. PCC brings the server model with reasoning and 32K context. Core AI runs local models efficiently on the Apple Neural Engine. MLX unlocks the thousands of models in the MLX community on Hugging Face by model ID2. And because all of it sits on a new public protocol, frontier providers can ship Swift packages of their own; Apple named Anthropic and Google as bringing Claude and Gemini to Swift developers through the same framework2.
Christopher Webb on session 339: beyond the system model, the framework adds PCC, Core AI, and MLX, and a public protocol lets providers like Anthropic and Google extend it with their own Swift packages.
The protocol has two pieces, and the split is the whole design. LanguageModel describes the model to the framework: it declares capabilities and hands back a configuration. LanguageModelExecutor is where the work lives, with an initializer that takes that configuration, a prewarm for loading weights or opening connections ahead of the first request, and a respond that streams generation back to the session2. The configuration is the link between the two, and it is the lookup key. Each session holds an executor store; when a model produces a configuration the store has not seen, the framework builds an executor and caches it, and the session describes the configuration as Hashable, so a second model with the same configuration resolves to the same executor2. That caching is what lets a stateful integration hold a KV cache or a persistent connection across calls instead of redoing work2.
For a model provider, the executor’s job is translation. The framework hands it a transcript, a sequence of typed entries, and the executor maps those entries onto whatever roles its own inference engine speaks2. Apple defines six entry types: instructions, prompts, tool calls, tool outputs, responses, and reasoning2. A model with only system, user, and assistant roles maps tool calls and reasoning to assistant; a model with a dedicated tool role routes there instead2. Each request also carries the developer’s intent in two property bags: ContextOptions for what goes into the prompt, like reasoning level or a response schema, and GenerationOptions for the decoder loop, like sampling, temperature, and length2. On the way out, the executor streams events on a channel, leading with a metadata update (model and request IDs) and a usage update (prompt token counts) before the text deltas, so the developer learns a request’s cost without waiting for the whole stream2.
The error story matters to app developers even if they never write a provider. Foundation Models ships LanguageModelError for the cases every model hits: context window overflows, rate limits, refusals, and more2. A provider should throw one of those when it fits, because any framework user already knows how to catch it, and reserve custom error types for failures only its own service produces, like a subscription tier or account state2. Providers also get room to differentiate through custom response metadata (tokens-per-second, time-to-first-token) and custom segment types that extend the protocol to new modalities such as audio or video, all flowing through the same session2. Cloud providers get a pointed credential reminder: do not take an API key as a plain string; offer a token provider or sign-in flow, persist tokens in the Keychain, and pair it with device attestation through App Attest2.
Agentic implications: route models inside one session
The provider protocol and PCC pay off when you stop thinking about one model per app and start thinking about one model per task. That is what DynamicProfile enables. It lets a single LanguageModelSession switch models mid-conversation, selecting the best configuration for the task in front of it3.
Erik and Oliver on session 242: a craft app declares profiles that act as agents, brainstorming on PCC at high temperature, planning with deep reasoning, and reviewing on the on-device model to save server calls.
The session’s example is a craft app with three phases. Brainstorming wants broad knowledge and creativity, so its profile uses PrivateCloudComputeLanguageModel with temperature set to 13. Planning wants depth, so it stays on PCC and sets reasoningLevel to deep3. Reviewing is routine guidance as the user works, so it drops to SystemLanguageModel to save unnecessary server calls, which also keeps the user’s daily PCC quota for the work that needs it3. The body of a DynamicProfile re-evaluates on every prompt, so as the app changes mode the session changes persona: swapping hats, or swapping agents3.
Routing between models of different context sizes forces a discipline the on-device-only framework never demanded. Moving from PCC’s 32K to the on-device 4K may require trimming entries to fit, and the session names a privacy use too: redact private information from existing entries when moving to a less private model3. The framework’s historyTransform applies a local, non-destructive transform before prompting, so you trim for one model without losing context the next turn might need3. Mutation costs something: appending to the transcript preserves the KV cache and minimizes time-to-first-token, while rewriting history (removing entries, changing tools, updating instructions) typically invalidates the cache and adds latency3. Last year the session API was append-only to guarantee that optimization; this year Apple took the training wheels off, and the only way to know a model’s caching behavior is to measure with the Foundation Models Instrument in Xcode3.
Decision: on-device, PCC, or your own provider
The three options are not a ladder. Each is right for a different shape of problem.
Reach for the on-device System model first. It is free, it works offline, it has no request limits, and the iOS 27 rebuild made it better at instruction following and added image input2. Its 4K context is the real ceiling1. Evaluate before you assume you need more, because the session warns you will be surprised how well it performs1.
Reach for Private Cloud Compute when the task exceeds the on-device model and the data is sensitive. Long documents that need the 32K window, multi-step reasoning, or many tool calls with large outputs1. PCC is the only cloud option that keeps Apple’s privacy posture with no keys, no account, and no token cost, paid for by a per-user daily limit you design around1. Pick it when you would otherwise stand up your own server model and dread the privacy review.
Reach for your own provider when you need a specific model the platform does not give you. Core AI for a local model you bundle and run on the ANE, MLX for an open-source model by ID, or a provider package (Claude, Gemini) for a frontier model2. You take on credential handling, attestation, and the privacy disclosure, and in exchange you get a named model behind the same LanguageModelSession your app already speaks2. The session is explicit that on-device and cloud models have very different privacy characteristics, and the user deserves to know which one is answering2.
Mix them in one session when phases differ. That is the DynamicProfile case: PCC for the heavy creative or reasoning turns, the on-device model for routine ones, each profile carrying its own model, temperature, and reasoning level3.
FAQ
How do I switch from the on-device model to Private Cloud Compute?
Change the model you pass to LanguageModelSession. The framework offers a unified Swift API across models, so moving from the on-device System model to PrivateCloudComputeLanguageModel is a one-line change, and your prompts, Generable output, and tools work the same1. PCC runs only on devices that support Apple Intelligence, so keep your availability check1.
Is Private Cloud Compute as private as the on-device model?
Apple designed PCC so user data is never stored and is used only for the request, and the design has been independently verified by researchers1. It is integrated into the OS alongside iCloud, so there are no API keys, no account setup, and no authentication for you to manage1. On-device still wins on offline operation and unlimited requests; PCC wins on context size and reasoning1.
What does PCC cost, and what is the daily limit?
There are no token costs to you as the developer1. Each user gets a daily request limit counted against their iCloud account, and users can upgrade through iCloud+ for a higher limit1. Handle the limit in your UI by checking the model’s quota state (isLimitReached, belowLimit) and showing a persistent, actionable upgrade control rather than an alert1. The model is available to apps with fewer than 2M downloads, and you apply on the developer website1.
What does “bring your own LLM provider” actually mean?
Apple added a public LanguageModel protocol, so any model can plug into the Foundation Models framework and be called through the same API as Apple’s own2. Beyond the System model and PCC, the framework adds Core AI for local models on the ANE and MLX for Hugging Face community models, and Apple named Anthropic and Google as shipping Swift packages for Claude and Gemini2. A provider implements LanguageModel plus a LanguageModelExecutor that translates the framework’s transcript into its own format and streams generation back2.
Can one session use more than one model?
Yes. DynamicProfile lets a single LanguageModelSession switch models mid-conversation, choosing the best configuration per task3. A profile carries its own model, instructions, temperature, and reasoning level, and the profile body re-evaluates on every prompt, so a session can brainstorm on PCC and review on the on-device model in the same conversation3. Watch the context-size gap between models and the KV-cache cost of rewriting history when you do3.
The full Apple Ecosystem cluster: the Foundation Models framework explainer; the iOS 27 tool-calling controls; the agentic workflow distinction; and the on-device LLM. The hub is the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.
-
Apple, WWDC 2026 session 319, “Build with the new Apple Foundation Model on Private Cloud Compute”, presented by Louis. Source for: the one-line switch from the on-device model to
PrivateCloudComputeLanguageModel; the 4K vs 32K context comparison; reasoning at light, moderate, and deep levels set when callingrespond; thecontextSizeproperty onSystemLanguageModelandPrivateCloudComputeLanguageModel; the privacy design (data never stored, used only for the request, independently verified); no API keys, no account setup, no token cost, iCloud-counted daily limit upgradeable via iCloud+; availability for apps under 2M downloads and the developer-website application; thequotaUsageisLimitReached/belowLimithandling and the no-alert UI guidance; and the Xcode “Simulate Apple Foundation Models Availability” debug option. ↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩ -
Apple, WWDC 2026 session 339, “Bring an LLM provider to the Foundation Models framework”, presented by Christopher Webb. Source for: the public
LanguageModelprotocol andLanguageModelExecutor; the configuration-as-lookup-key executor store andHashableconfiguration; the additional model options (Core AI on the ANE, MLX via Hugging Face); the rebuilt on-device System model with image input; Anthropic and Google shipping Swift packages for Claude and Gemini; the six transcript entry types and role mapping;ContextOptionsandGenerationOptions; the metadata/usage/text-delta streaming order;prewarm;LanguageModelErrorversus custom errors; custom response metadata and custom segment types; credential and App Attest guidance; and the privacy-characteristics disclosure between on-device and cloud models. ↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩ -
Apple, WWDC 2026 session 242, “Build agentic app experiences with the Foundation Models framework”, presented by Erik and Oliver. Source for:
DynamicProfileswitching models within aLanguageModelSession; the craft-app example (brainstorming on PCC at temperature 1, planning with deepreasoningLevel, reviewing onSystemLanguageModel); profile body re-evaluation per prompt; trimming and redacting the transcript when moving between models;historyTransformas a local non-destructive transform; and the KV-cache implications of appending versus rewriting history, measured with the Foundation Models Instrument in Xcode. ↩↩↩↩↩↩↩↩↩↩↩↩↩↩ -
Apple Developer, “Foundation Models” framework and the “Tool” protocol. The framework’s
LanguageModelSession, guided generation via@Generable, and theToolprotocol that the on-device model invokes mid-generation carry over unchanged to the PCC model and to provider models that conform to the newLanguageModelprotocol. ↩