When The LLM Lives In Your App Vs In Your Tooling

From the guide: Claude Code Comprehensive Guide

A Swift app on iOS 26 has two LLMs touching it, at very different layers. One is the on-device model the user runs through the app’s LanguageModelSession. The other is the agent the developer ran through Claude Code or Cursor or Codex CLI to write the app in the first place. Conflating those two LLMs is the most common architecture mistake in agentic Apple development. They are not the same problem; they do not share a security model; they do not share a deployment story; and the patterns that work for one actively fail for the other.

The runtime LLM is a feature shipped to the user. The tooling LLM is a stylus the developer holds. The runtime model lives behind the user’s privacy expectations, the system’s availability checks, and the App Store review. The tooling model lives behind the developer’s API key, the IDE’s filesystem permissions, and a code review the developer is responsible for. The two stacks rarely intersect, and when they do (an MCP server the developer uses to operate the app’s domain during development that the runtime app could also expose for end-user automation), the trust boundary moves and the architecture has to acknowledge it.

The post names that distinction and the routing question that follows: which LLM should serve which capability, and what does each owe the user.

TL;DR

  • The runtime LLM is Foundation Models (SystemLanguageModel.default plus the Tool protocol). Inference is local, the model ships with the OS, the app runs the call on the user’s behalf.1
  • The tooling LLM is whatever the developer chose: Claude in Claude Code, GPT in Cursor, Codex CLI for Swift. Inference is remote (Anthropic’s infrastructure or the configured Claude provider, OpenAI, etc.), the model is wherever the host put it, the developer drives the agent.
  • The two LLMs do not share security, deployment, latency budgets, or accountability. A capability that makes sense at one layer is often the wrong shape at the other.
  • The same MCP server the developer uses during a Claude Code session is not automatically the right surface for end-user agent automation. The trust boundary changes; what was a developer-controlled tool becomes a user-controlled (or system-controlled) tool.

Two Stacks, Same Word “LLM”

The collision happens in conversations like this. Someone says “we should add an LLM to the app.” Whether that means a feature the user invokes (write me a meditation summary, polish this draft, classify this photo) or a tool the developer wires into their own iteration loop (let Claude Code write the migration, let Cursor refactor the view) is not clear from the sentence. Both are LLM additions. Neither is the same engineering decision.

Foundation Models is one stack. The model lives at SystemLanguageModel.default, has a fixed-context window, runs on Apple silicon, never leaves the device, and is gated by the user’s Apple Intelligence eligibility.1 The app developer constrains inputs through @Generable types, exposes app capabilities through the Tool protocol, and ships a binary that calls the model when the feature triggers. The user invokes the feature; the OS supplies the model; the app stitches them together.

Claude Code, Cursor, Codex CLI, and any other agentic IDE is a different stack. The model lives wherever the host LLM provider runs it (Anthropic’s servers for Claude, OpenAI’s for GPT, etc.). The IDE is the host. The MCP servers are tools the host’s model can call. The developer’s machine has filesystem access, shell access, and whatever else the IDE chose to expose. The developer invokes the agent; the agent reaches into the developer’s filesystem; output lands in the developer’s project.2

Same word “LLM,” very different blast radii.

Six Axes Where The Two Stacks Diverge

Six properties make the divergence concrete:

Property Runtime LLM (Foundation Models) Tooling LLM (Claude Code, Cursor, Codex CLI)
Where inference runs On-device (Apple silicon) On the LLM provider’s infrastructure
Who runs the call The app, in response to user action The developer, during the iteration loop
Who is accountable The app developer (App Store review) The developer (their commits, their code review)
What the model touches The app’s data inside the app sandbox The developer’s filesystem, shell, MCP tools
Trust boundary User → app → on-device model Developer → IDE → remote model + MCP servers
Cost of misuse Privacy, app crash, App Store rejection Bad code, security leak, broken build

The trust boundary is the load-bearing row. The runtime LLM operates inside the app’s sandbox under the user’s privacy expectations; the tooling LLM operates inside the developer’s machine under the developer’s authority. A pattern like let the LLM run a shell command is normal in tooling (Claude Code does this constantly through its Bash tool)3 and a non-starter in runtime: Foundation Models has no Bash tool, and the Tool protocol is a typed Swift function the app developer wrote and reviews.1

The misuse cost row is the consequence of getting the trust boundary wrong. A runtime LLM that exfiltrates user data to a server is a privacy violation and a guideline rejection. A tooling LLM that exfiltrates the developer’s source code to an LLM provider is, depending on the developer’s contract, either expected behavior or a leak. Both matter; they matter for different reasons.

The MCP Server That Sits Between

The cleanest place to see the boundary move is when a single MCP server is used by both stacks. Get Bananas ships an MCP server that exposes shopping-list operations: read items, add items, mark complete. The same server runs in two places.4

In the developer’s Claude Code session during iteration, the MCP server is a tool the developer’s agent calls to manipulate the developer’s own list. The server runs against a JSON file in iCloud Drive. The developer wired the server into their MCP host config; the host knows to call it; the agent reads/writes shopping items as part of larger development tasks.

In a future end-user agent surface (a hypothetical Apple Intelligence MCP integration, or an external Claude desktop user pointing at a shared list), the same MCP server has different obligations. The caller is no longer Blake-the-developer with full filesystem trust; the caller is an end user whose authentication, authorization, and intent vetting are not the developer’s responsibility. The MCP server has to enforce those guardrails (or refuse to expose itself) before that surface becomes safe.

The same JSON-RPC method, add_item, served to a developer over a local stdio transport with no auth, is the right shape. Served to an internet-reachable host on behalf of an arbitrary end user with no auth, is a data integrity hazard. The MCP server is the same code; the surrounding deployment changes everything.

That is the routing rule for MCP servers in agentic Apple development. The server is a typed contract over a domain. Where it sits in the stack (developer tool vs end-user surface) is a deployment decision, not a protocol decision. Code review the deployment; do not assume the protocol’s permissive defaults are the deployment’s right defaults.

The On-Device Tool Protocol Is Not An MCP Server

A common confusion: Foundation Models has a Tool protocol, and MCP has tool calls. Are they the same? No, and the difference matters for routing.

Foundation Models’s Tool protocol is a Swift API the app developer implements:1

struct WaterEntryLookup: Tool {
    let name = "lookup_water_entries"
    let description = "Look up water intake entries for a given date range."
    @Generable struct Arguments { ... }
    func call(arguments: Arguments) async throws -> ToolOutput { ... }
}

The tool runs inside the app process. The model the tool serves is the device’s SystemLanguageModel. The arguments and outputs are Swift types. The developer reviews the implementation, the App Store reviews the app. The user invokes a feature; the app’s session calls the tool; the local model uses the result.

An MCP tool is a JSON-RPC method exposed by an MCP server, which is a separate process the host LLM (Claude, GPT, etc.) connects to:2

{
  "name": "add_item",
  "description": "Add an item to the shopping list.",
  "inputSchema": {"type": "object", "properties": {"name": {"type": "string"}}}
}

The tool runs outside the agent’s process, in whatever language the developer chose, talking JSON over stdio or Streamable HTTP. The model is wherever the host put it. The arguments are JSON validated against the schema. The accountability is with whoever deployed the MCP server.

The two protocols solve overlapping problems with different scopes:

Decision Foundation Models Tool MCP tool
Caller The on-device language model An external agent (Claude, GPT, Cursor, etc.)
Where it runs Inside the app process, on-device A separate process the host connects to
Schema language Swift @Generable types JSON Schema
Trust posture App owns it; user’s privacy posture Developer or vendor owns it; agent’s authority
Update cadence App update Server redeployment

The routing rule is straightforward: if the capability serves the app’s own LLM features for end users, it goes in a Foundation Models Tool. If the capability serves an external agent (developer or end user) operating across processes, it goes in an MCP tool. Some apps need both; the same Swift function can back both adapters, but the adapters live at different stack layers and ship through different release cycles.5

Hooks Are Where The Tooling LLM Earns Its Place

The tooling LLM’s blast radius makes hooks the load-bearing safety primitive. Claude Code’s hook system runs scripts on lifecycle events (PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, Stop).6 An iOS developer using Claude Code sets up hooks not because the agent is malicious, but because the agent’s authority is broad: filesystem write, shell execution, git commits, push.

The patterns that earn their hook slot in agentic Apple work:

A PreToolUse block on Bash commands that match xcodebuild or xcrun without explicit approval. Claude Code can run builds, erase simulators, invoke signing or export steps, or mutate generated project state if you let it. The hook turns “the agent ran a build” into “the agent asked to run a build and got a yes.” Slowing the agent down on irreversible actions is the right tradeoff for the developer’s confidence.

A PostToolUse validator on every Edit or Write tool call against .pbxproj files. The Xcode project file is human-edited but agent-toxic; one wrong line silently breaks the build for every developer on the team. A hook that runs plutil -lint (or a similar structural check) on every .pbxproj write before committing is the difference between “agent wrote the migration in five minutes” and “agent wrote the migration and forty-five minutes of git bisect.”

A Stop hook that runs swift build (or the appropriate build command) before letting the agent declare a task done. Agents are trained on what done looks like in conversation. The hook makes “done” mean “the build still compiles,” which is the only definition that matters for shipping.

The runtime LLM does not need any of this. Foundation Models has no shell, no git, no project file, no MCP server config. The on-device Tool is whatever Swift function the app developer wrote; the user invokes a feature; nothing escapes the app’s sandbox or entitlements unless the app’s own Tool implementation does it.

The asymmetry is the point. The tooling LLM has more authority and needs more guardrails. The runtime LLM has less authority by construction. Apple did the work of making the runtime LLM safe; the developer does the work of making the tooling LLM safe.

Architecture Rules

Three architectural rules follow from the runtime/tooling distinction.

Pick the layer per capability, not per app. A meditation app might use Foundation Models for in-app summarization (runtime LLM, on-device, ships with the app) and expose an MCP server the developer uses with Claude Code to bulk-import session history during iteration. The two LLMs serve different jobs at different layers. Treating them as one decision produces a worse outcome at both layers than treating them as two.

Code-review the tooling LLM’s reach. A Claude Code session with full filesystem access and remote MCP servers is a powerful development environment and a generous attack surface. The mitigation is not “trust the agent”; the mitigation is hooks, scoped permissions, and a developer who reads the diff. The agent works for you; the agent is not you.

Ship the runtime LLM’s Tool set as a stable API. Foundation Models tools are part of your app’s binary contract. Removing or renaming a tool between releases is a behavioral change for users who relied on the feature. Treat tool definitions like UI affordances, not like internal helpers.

What I Would Build Differently In My Stack

Two patterns the cluster’s apps either ship or wish they shipped.

Build the domain layer first; let runtime tools and tooling MCP servers wrap the same Swift functions. The dual-adapter pattern from App Intents vs MCP extends naturally to runtime LLM tools. A logWater(amount:caller:) domain method is wrapped by an AppIntent (Apple Intelligence surface), an MCP tool (external agent surface), and a Foundation Models Tool (in-app runtime LLM surface). Three protocol adapters, one domain function, three caller classes (system agent, external agent, on-device model) with three different obligations. The function does not know which caller invoked it; the adapters carry the trust signals.

Treat the agent’s MCP servers as code, not as configuration. A .mcp.json referenced in an iOS project is a scope and precedence trust surface (covered in The Repo Shouldn’t Get to Vote on Its Own Trust). Claude Code resolves MCP server scope as local > project > user, and project-scoped servers prompt the developer for approval before they are used. The agent reads the config and connects to the servers the developer approves; the developer reviews the config and the servers. Adding an MCP server to a project is a code review, not a configuration tweak.

When Foundation Models Is Right And When The Tooling LLM Is Right

The decision tree the cluster’s posts converge on:

Is the capability a feature an end user invokes inside your app?
├── Yes → Runtime LLM (Foundation Models or cloud LLM behind an Apple Intelligence-aware surface)
│         Use the Tool protocol for app-internal tool calls.
│         Use App Intents for capabilities the system agent should reach.
└── No → It is part of the developer's iteration loop.
          ├── Is the capability local to one developer's machine? → Tooling LLM
          │     Use Claude Code, Cursor, or Codex CLI directly.
          │     Wrap shared utilities as MCP servers behind hooks.
          └── Is the capability shared across the team? → Tooling LLM with shared MCP servers
                Deploy the MCP server somewhere the team can reach.
                Code review the server like production code; gate dangerous tools behind explicit approval.

The decision rarely produces a tie. When it does (the same capability could legitimately serve both end users and developers), the answer is two adapters, not one shared surface, because the trust postures and update cadences are different enough that one surface trying to serve both will compromise on both.

What The Pattern Means For Apps Shipping On iOS 26+

Three takeaways.

  1. Two LLMs, two stacks. The runtime LLM (Foundation Models, on-device) is the user’s agent operating on their data inside your app. The tooling LLM (Claude Code, Cursor, Codex CLI) is the developer’s agent operating on the developer’s machine to build the app. They share the word “LLM” and almost nothing else.

  2. The trust boundary is the architecture. Where the model runs, who runs it, and what it touches define the obligations. Patterns that fit one boundary actively fail the other.

  3. MCP servers carry the boundary. The same server is a developer tool in one deployment and an end-user surface in another. The protocol does not change; the deployment does, and the deployment is the part that needs the engineering attention.

The full Apple Ecosystem cluster: typed App Intents for Apple Intelligence; MCP servers for cross-LLM agents; the routing question between the two; Foundation Models for the on-device LLM and the Tool protocol; Live Activities for the iOS Lock Screen state machine; the watchOS runtime contract on Apple Watch; SwiftUI internals for the framework substrate; RealityKit’s spatial mental model for visionOS scenes; SwiftData schema discipline for persistence; Liquid Glass patterns for the visual layer; multi-platform shipping for cross-device reach. The hub is at the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.

FAQ

What’s the difference between Foundation Models and Claude Code from an architecture standpoint?

Foundation Models is a runtime feature: the LLM ships with iOS 26, runs on the user’s device through SystemLanguageModel.default, and is invoked when the app’s LanguageModelSession triggers. The app runs the call on the user’s behalf. Claude Code is a development tool: the LLM runs on Anthropic’s infrastructure (or the configured Claude provider), the developer’s machine hosts the IDE, and the agent has access to the developer’s filesystem, shell, and MCP servers. The developer drives the agent; the agent helps build the app.

Should the same MCP server serve both my agent and my end users?

Probably not. The same JSON-RPC contract can be the right shape for both, but the deployments are different: developer-side stdio with no auth is normal for a developer tool, and a hazard for an end-user surface. The protocol is reusable; the deployment is not. If you do expose the same server to both, treat it as two deployments behind one codebase, not one surface for both audiences.

Why does the tooling LLM need hooks but the runtime LLM does not?

The tooling LLM has filesystem access, shell access, MCP servers, and arbitrary code execution authority on the developer’s machine. The runtime LLM (Foundation Models) has whatever the app’s Tool implementations expose, inside the app’s sandbox, with no shell. The blast radius is asymmetric. Hooks give the developer pre-execution review and post-execution validation on the broad authority. The runtime LLM does not need them because its authority is constrained by construction.

Can a single Swift domain function serve both runtime and tooling LLM use cases?

Yes, and that is the right pattern. The dual-adapter approach (one Swift function, multiple protocol wrappers) extends from App Intents vs MCP to include Foundation Models tools. The function does not know which caller invoked it; the adapters carry the schema, trust signals, and protocol-specific obligations. Three adapters, one domain method.

Where do hosted cloud LLMs (OpenAI, Anthropic API direct) fit into this picture?

Cloud LLMs called from inside an app at runtime are a third category: runtime LLM with off-device inference. They share Foundation Models’s “app runs the call on the user’s behalf” trust posture but lose the on-device privacy story and the OS-supplied availability story. The decision tree extends: cloud runtime LLMs are appropriate for capabilities that genuinely exceed the on-device model’s envelope (large context, frontier reasoning, multimodal at scale) and acceptable to the user’s privacy expectations (with transparent disclosure). Foundation Models is the default when the workload fits; cloud is the escalation when it does not.

References


  1. Author’s analysis in Foundation Models On-Device LLM: The Tool Protocol, April 30, 2026, covering SystemLanguageModel, LanguageModelSession, the Tool protocol, @Generable / @Guide macros, and constrained generation. 

  2. Anthropic, “Model Context Protocol” and “MCP Specification: Tools (2025-06-18)”. JSON-RPC tool exposure, host/server architecture, and the stdio + Streamable HTTP transports. 

  3. Anthropic, “Claude Code reference: Hooks”. PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, Stop lifecycle events; the validation surface that wraps the tooling LLM’s broad authority. 

  4. Author’s analysis in Two Agent Ecosystems, One Shopping List, April 29, 2026, and the Get Bananas project’s MCP server (mcp-extension/server/index.js). The single-codebase, multi-deployment pattern. 

  5. Author’s analysis in App Intents vs MCP: The Routing Question, April 30, 2026. The dual-adapter pattern (one Swift domain method, two protocol wrappers) extended in this post to a triple-adapter pattern with Foundation Models as the third caller class. 

  6. Anthropic, “Hooks reference”. Lifecycle events, matchers, command shape, and the role of hooks as pre-execution validation against agent authority. 

相关文章

MCP Servers Are the New Attack Surface

50 MCP vulnerabilities, 30 CVEs in 60 days, 13 critical. Tool-use protocols are the attack surface nobody is auditing — …

8 分钟阅读