Three Surfaces: Human, Apple Intelligence, Agent

Q: What&rsquo;s the difference between the Apple Intelligence and agent surfaces?

Apple Intelligence is Apple&rsquo;s first-party agent: the user invokes Siri, Shortcuts, or Spotlight; the system routes through App Intents. Trust comes from the OS. The agent surface is every other LLM host: developers run Claude Code or Cursor, end users run Claude Desktop or ChatGPT. Trust comes from whoever deployed the MCP server. App Intents are the protocol surface for the first; MCP is the protocol surface for the second.

Q: How does the domain-layer pattern simplify multi-surface code?

By centralizing the rules. One Swift function takes a Caller argument and enforces surface-specific behavior (confirmation prompts, rate limits, audit logging) in one place. Each surface is a thin adapter (SwiftUI binding, AppIntent.perform, MCP handler) that translates the surface&rsquo;s protocol to the domain function. Drift between surfaces becomes impossible because there is one source of truth.

Every meaningful capability in an iOS app on iOS 26+ now faces up to three surfaces it can be invoked from. The same Swift function that logs a glass of water can be triggered by a human tapping a button, by Apple Intelligence routing a Siri request, or by an external agent (Claude Code, Cursor, ChatGPT) calling an MCP tool. Three different callers, three different obligations, three different rendering surfaces. The capability is the same; the surfaces are not.

A lot of iOS architecture mistakes come from designing for one surface and then force-fitting the capability into the others. UI flows leak into Siri responses; agent tools that are correct for a developer become hazards for end users; on-device LLM features assume cloud-grade context. The cluster has been mapping these surfaces in individual posts. The post at hand is the synthesis: the three surfaces, their differences, the routing rule, and what an app’s domain layer needs to look like to serve all three without compromising any.

The mental model: pick a domain capability. Ask which of the three surfaces should be able to invoke it. Ask which can. Ask what each surface needs from the capability and what the capability owes back. The answers shape the architecture.

TL;DR

Three surfaces: human (SwiftUI views, taps, gestures, screen), Apple Intelligence (App Intents, Siri, Shortcuts, Spotlight), agent (MCP servers, external LLM hosts).
Each surface has different obligations: trust posture, latency budget, rendering location, persistence semantics, error handling, accessibility requirements.
The right architecture is a domain layer beneath the surfaces. Each surface is a thin adapter over the same Swift functions; the function takes a typed Caller argument so it can branch on cross-surface rules (rate limits, audit, confirmations) without knowing surface protocol details.
Not every capability serves all three. The decision which surface is the design call. Hiding capabilities from the surfaces that should not have them is as much a product decision as exposing them to the surfaces that should.

Surface One: Human

The human surface is the screen. The user looks at the app, taps, scrolls, drags, swipes, types. The framework is SwiftUI (or UIKit, or for some workloads RealityKit on visionOS). The rendering happens on the user’s device, in the app process, against the user’s chosen color scheme, dynamic type size, and accessibility settings.¹

What the human surface needs from a capability:

A visual affordance. A button, a list row, a swipe gesture, a context menu. The capability has to be discoverable through the app’s navigation and styled consistently with the rest of the UI.
Real-time feedback. Every interaction needs an immediate visible response. A button that fires a long-running operation has to show a progress indicator, an enabled/disabled state, an animation.
Accessibility. VoiceOver labels, Dynamic Type support, color contrast, motor-control alternatives. The human surface is the one that has the most demanding accessibility requirements because the user is interacting directly with the rendering.
Error visibility. Errors land in the user’s view. A failed save shows an alert; a network timeout shows a retry; a permission denial shows a settings link.

What the human surface owes back to the capability:

User intent that is unambiguous. The user tapped a specific button; the capability knows exactly what was requested. There is no inference layer.
Tight latency budget. A tap that takes more than a few hundred milliseconds to respond visibly feels broken. The capability has to be either fast or designed to show progress immediately.
No external authority. The user is in the app; the user is the agent in the loosest sense (the human is the one driving the action). No third-party LLM, no system agent, just the user’s hands.

The human surface is the longest-standing of the three. Every iOS framework, design pattern, and accessibility rule the platform has accumulated since iOS 7 is in service of this surface. The other two surfaces are recent enough that the patterns are still settling.

Surface Two: Apple Intelligence

The Apple Intelligence surface is the system agent. Siri, Shortcuts, Spotlight, the system suggestion stack. The user speaks, types into Spotlight, or chains an action in Shortcuts; the system routes the request through the App Intents framework, finds an AppIntent that matches, resolves the parameters, and runs the intent’s perform() body. The framework is App Intents.²

What the Apple Intelligence surface needs from a capability:

A typed schema. AppIntent types declare @Parameter properties; AppEntity types provide persistent identity for things the system can talk about; AppEnum types name closed sets of options. The system reads the schema at install time.
Identity that survives the app process. A water entry the user logged through Siri yesterday should be referenceable through Siri today. The AppEntity model gives the system a stable way to talk about objects across sessions.
Quiet error handling. Errors do not land in a user’s view; they land in a Siri response, a Shortcuts output, or a Spotlight result. The error format the system expects is structured (Apple’s AppIntentError plus LocalizedError-conforming throws), not visual.
Idempotence under retry. The system can re-invoke an intent during a Shortcut chain or after a partial failure. Capabilities that mutate state need to be safe under repeat calls or to surface a clear “already done” semantic.

What the Apple Intelligence surface owes back to the capability:

The user’s actual identity. The system knows who the user is, has authenticated them via the OS, and runs the intent in their context. The capability does not need to verify identity beyond what the OS provides.
System-level rendering. The result the intent returns gets formatted by the system into the appropriate chrome (Lock Screen banner, Siri response card, Shortcuts output). The app does not control how the response presents.
Discoverability without your code running. App Intents can be invoked when your app is not running. The system reads the schema and surfaces the capability proactively.

The trust posture: Apple Intelligence is Apple’s first-party agent. The user did not configure it; the system did. The user trusts the OS; the OS trusts the App Intents schema your app shipped through review. The trust chain is OS → app. App Intents do support requestConfirmation(...) and foreground-mode confirmations, so capabilities that need a “are you sure?” can technically live there; the product judgment, not the platform constraint, is whether high-risk confirmations belong inside a Siri turn or on the app’s own screen. Anything irreversible (account deletion, destructive bulk edits, payment) is usually safer on the human surface even though App Intents can request confirmation.³

Surface Three: Agent

The agent surface is every other LLM-powered system that wants to operate the app’s domain. Claude Desktop, Claude Code, Cursor, the ChatGPT desktop app, Codex CLI, custom agent harnesses. The framework is the Model Context Protocol: an MCP server exposes the app’s domain through JSON-RPC tools/call methods; the host LLM discovers tools at session start and calls them by name with a JSON payload.⁴

What the agent surface needs from a capability:

A JSON-RPC contract. Tool name, description, inputSchema, optional outputSchema. The agent reads the description to decide whether to call; it follows the schema to format arguments.
A useful description. The model decides when to use the tool based on its description. Treat the description like a docstring you expect another developer (the model) to read. Vague descriptions produce wrong tool selection.
Errors with two shapes. Tool execution errors return as a content block plus isError: true on the tool result the model reads. Protocol-level errors (malformed request, missing tool, transport failure) return as standard JSON-RPC error responses the host handles. The tool author owns the first; the protocol owns the second.
Stateless or explicitly-stateful semantics. MCP is stateful at the protocol layer (session lifecycle, session IDs in Streamable HTTP), but durable domain identity is server-side responsibility, not a protocol-level guarantee. If the same identifier should mean the same thing across sessions, the server has to enforce it.

What the agent surface owes back to the capability:

The host’s authentication, not the user’s. The trust comes from whoever deployed the MCP server. Developer’s local stdio: the developer’s own filesystem permissions. Internet-reachable HTTP: whatever auth the server enforces. The capability has to assume the identity claim is whatever the server gave it.
Variable latency tolerance. The host can wait longer than the human surface or the Apple Intelligence surface. A tool call that takes thirty seconds is acceptable on the agent surface and unacceptable on the others.
No rendering surface. The result is text or structured data the model interprets. No chrome, no UI, no system formatting.

The trust posture: the MCP server is the developer’s contract for who gets to call it. Two deployments of the same server (local stdio for development, internet HTTP for end users) have very different trust postures and need very different guardrails. The protocol is the same; the deployment is the architecture. Covered in detail in App Intents vs MCP: The Routing Question and When the LLM Lives in Your App vs in Your Tooling.⁵

The Six Axes The Surfaces Disagree On

Pulling the three surfaces into a comparison table makes the architecture decisions concrete:

Axis	Human	Apple Intelligence	Agent
Caller identity	The user (in-app, authenticated by OS)	The user (system-resolved through OS)	The host’s identity claim (server-enforced)
Latency budget	Hundreds of milliseconds	Seconds (Siri turn-taking)	Seconds to tens of seconds
Rendering	App’s SwiftUI views	System chrome (banner, Siri card, Shortcuts)	Content blocks the model interprets
Discovery	App’s navigation	App Intent schema read at install	Tool list returned at session start
Persistence semantics	App-managed state	`AppEntity` identity across sessions	Server-managed; not protocol-level
Error format	Alerts, banners, view state	`AppIntentError` + `LocalizedError` throws	Tool exec: content + `isError`; protocol: JSON-RPC `error`

The disagreements compose. A capability designed for the human surface assumes tight latency, rich rendering, app-managed errors. Force-fitting it through Apple Intelligence loses the rendering control and adds OS-mediated identity. Force-fitting it through the agent surface loses the rendering entirely and shifts the trust boundary to whoever deployed the server. The capability has to be re-shaped, not just re-wrapped.

The Architecture Rule: Domain Layer Below The Surfaces

The pattern that survives across the three surfaces is a domain layer beneath them. The domain layer is plain Swift functions: typed inputs, typed outputs, no protocol assumptions. Each surface is a thin adapter over the domain. The same logWater(amount:caller:) function backs the SwiftUI button, the App Intent’s perform(), and the MCP tool’s handler.

The sketch (real production would conform WaterEntry to AppEntity for the App Intent return, inject domain as a dependency rather than a top-level reference, and add the required static var title on the intent):

// Domain layer (the actual capability)
func logWater(amount: Measurement<UnitVolume>, at: Date, caller: Caller) throws -> WaterEntry {
    try guards.requireWritePermission(caller)
    let entry = WaterEntry(amount: amount, timestamp: at)
    try store.insert(entry)
    return entry
}

// Adapter A: human surface (SwiftUI button)
Button("Log 250ml") {
    Task {
        let entry = try await domain.logWater(
            amount: .init(value: 250, unit: .milliliters),
            at: .now,
            caller: .human
        )
        // Update view state, show confirmation animation, etc.
    }
}

// Adapter B: Apple Intelligence surface (AppIntent)
struct LogWaterIntent: AppIntent {
    static var title: LocalizedStringResource = "Log Water"
    @Parameter(title: "Amount") var amount: Measurement<UnitVolume>
    func perform() async throws -> some IntentResult & ReturnsValue<WaterEntry> {
        let entry = try domain.logWater(amount: amount, at: .now, caller: .siri)
        return .result(value: entry)  // WaterEntry conforms to AppEntity
    }
}

// Adapter C: agent surface (MCP tool handler)
let entry = try domain.logWater(
    amount: .init(value: ml, unit: .milliliters),
    at: .now,
    caller: .mcp(host: hostName)
)
return .text("Logged \(entry.amount) at \(entry.timestamp)")

Three callers. One domain function. The domain function takes a Caller parameter so it can enforce different rules per surface (rate limits, audit logging, confirmation requirements) without each surface having to re-implement them. The adapters are dumb; the domain is smart.

The shape generalizes the dual-adapter pattern from App Intents vs MCP; adding the human surface as a third caller class is the natural extension. The Foundation Models on-device LLM, when used inside the app, sits on the human surface (the user invoked an in-app feature that happens to call the model); the runtime LLM is not a fourth surface, it is a way of executing capabilities that already belong to the human surface.⁶

Not Every Capability Serves All Three Surfaces

Equal exposure is not the goal. Different capabilities belong on different surfaces.

Capabilities that should usually require foreground human presence. Photo capture, biometric authentication, sensitive PII entry, payment confirmation, account deletion. The human has to be looking at the screen, has to consent, has to authenticate. Apple Intelligence can technically foreground the app and request confirmation; the agent surface has no equivalent presence guarantee. The product judgment is that these capabilities should run as foreground UI with explicit deliberate action, not as a quiet Siri or background tool call.

Capabilities that should live on the human + Apple Intelligence surfaces. Most user-facing actions. Log water, start a meditation, add an item to the list, show me my Tuesday’s entries. The user might tap a button or might say “Hey Siri.” Both surfaces are valid; both should reach the same domain function.

Capabilities that should live on all three surfaces. Cross-process integrations. A shopping list shared across the user’s iPhone and a Claude Code session that imports recipes. The human surface owns day-to-day use; the Apple Intelligence surface owns Siri/Spotlight reach; the agent surface owns the developer- or user-driven external workflow.

Capabilities that should live on the agent surface only. Developer or admin bulk imports without an end-user review flow, integrations with external systems, agent-orchestrated workflows that have no Siri or in-app expression. Bulk-import 500 historical entries from a developer’s CSV during a one-time backfill. End-user file imports often have a human-surface flow (Shortcuts can pass files; an in-app importer can chunk progress); the agent-only case is the workflow that genuinely has no place in either of the other two surfaces.

The decision is the design. Listing the surfaces a capability does not serve is as important as listing the ones it does.

What I Would Build Differently

Two patterns the cluster’s apps either ship or wish they shipped.

Make Caller a first-class type in the domain layer. Every public domain function takes a Caller argument. The type encodes which surface invoked the call (.human, .siri, .mcp(host:)). Domain logic branches on it for confirmation prompts, rate limits, audit logging, and sensitive-action gates. The alternative (each surface re-implementing the rules) drifts; the centralized version stays consistent.

Treat surface coverage as an explicit checklist. When adding a capability, the design doc lists which of the three surfaces should expose it and which should refuse to. The refuse list is not a default; it is a deliberate choice. Refused: Apple Intelligence surface, because the capability requires user-attention proof Siri cannot provide. The reasoning gets recorded; the audit catches drift later.

What The Pattern Means For Apps Shipping On iOS 26+

Three takeaways.

Three surfaces, three trust postures. Human, Apple Intelligence, agent. Each has obligations the others do not. Designing for one and force-fitting into others produces bad architecture on every surface.
Domain below; adapters above. One Swift function per capability; thin adapters per surface; the function takes a Caller parameter so it can enforce surface-specific rules in one place.
Not every capability serves all three. Hiding a capability from the surfaces that should not have it is as much a design decision as exposing it. The refuse list earns its place.

The full Apple Ecosystem cluster: typed App Intents for the Apple Intelligence surface; MCP servers for the agent surface; the routing question between them; Foundation Models for on-device LLM features inside the human surface; the runtime vs tooling LLM distinction; Live Activities for the iOS Lock Screen state machine; the watchOS runtime contract on Apple Watch; SwiftUI internals for the human surface’s substrate; RealityKit’s spatial mental model for visionOS scenes; SwiftData schema discipline for persistence across surfaces; Liquid Glass patterns for the human visual layer; multi-platform shipping for cross-device reach. The hub is at the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.

FAQ

What are the three surfaces of an iOS app?

The human surface (SwiftUI views, taps, gestures, screen), the Apple Intelligence surface (App Intents, Siri, Shortcuts, Spotlight), and the agent surface (MCP servers exposed to external LLM hosts like Claude Code, Cursor, ChatGPT). Each has its own caller identity, latency budget, rendering location, persistence semantics, and trust posture. A capability that wants to serve more than one surface should sit on a domain layer beneath thin per-surface adapters.

Should every capability be exposed to all three surfaces?

No. Some capabilities are correctly limited to one or two surfaces. Photo capture, biometric authentication, and sensitive-action confirmations are usually best as foreground human-surface flows because the trust signals (user attention, deliberate action) are most reliably present there. Developer-driven bulk operations belong on the agent surface alone when no end-user review flow exists. The design call is which surfaces a capability serves and which it refuses.

What’s the difference between the Apple Intelligence and agent surfaces?

Apple Intelligence is Apple’s first-party agent: the user invokes Siri, Shortcuts, or Spotlight; the system routes through App Intents. Trust comes from the OS. The agent surface is every other LLM host: developers run Claude Code or Cursor, end users run Claude Desktop or ChatGPT. Trust comes from whoever deployed the MCP server. App Intents are the protocol surface for the first; MCP is the protocol surface for the second.

Where does the on-device Foundation Models LLM fit in?

Inside the human surface. When the user invokes an in-app feature that calls Foundation Models, the runtime LLM is the implementation of a human-surface capability, not a fourth surface. The runtime LLM has no Siri or external-host caller of its own. Foundation Models tools are how the on-device model reads/writes app domain state; the user is the one driving the call.

How does the domain-layer pattern simplify multi-surface code?

By centralizing the rules. One Swift function takes a Caller argument and enforces surface-specific behavior (confirmation prompts, rate limits, audit logging) in one place. Each surface is a thin adapter (SwiftUI binding, AppIntent.perform, MCP handler) that translates the surface’s protocol to the domain function. Drift between surfaces becomes impossible because there is one source of truth.

References

Author’s analysis in What SwiftUI Is Made Of, April 30, 2026, covering the value-typed view tree, result-builder DSL, and the substrate underneath the human surface. ↩
Author’s analysis in App Intents Are Apple’s New API to Your App, April 28, 2026, covering AppIntent, AppEntity, AppEnum, and the typed-schema model that lets Apple Intelligence operate the app. ↩
Apple Developer, “App Intents framework”. Surface for declaring intents, entities, parameters, and queries that Apple Intelligence, Siri, Shortcuts, and Spotlight can route. Discovery is install-time-plus-update; donation and indexing surface intents into Spotlight searches and Siri suggestions. ↩
Anthropic, “Model Context Protocol” and “MCP Specification: Tools (2025-06-18)”. JSON-RPC tool exposure, host/server architecture, the stdio + Streamable HTTP transports, and inputSchema / optional outputSchema. ↩
Author’s analysis in App Intents vs MCP: The Routing Question, April 30, 2026, and When the LLM Lives in Your App vs in Your Tooling, May 1, 2026. The deployment-not-protocol framing for trust posture and the runtime/tooling LLM distinction. ↩
Author’s analysis in Foundation Models On-Device LLM: The Tool Protocol, April 30, 2026. The on-device LLM as a runtime feature backing human-surface capabilities; the Tool protocol as the bridge between the in-app model and the app’s domain. ↩