Apple's First-Party Answer to Prompt Injection

Apple now cites Simon Willison by name. In WWDC 2026 session 347, an Apple security engineer frames agentic risk exactly the way this blog’s security thread has for a year: “we can look to Simon Willison’s Lethal Trifecta, which describes that a user is in most danger whenever an agentic system has: access to private data, exposure to untrusted content, and the ability to externally communicate.”1 The session, the Privacy and Security group lab, and a security.apple.com announcement the same week add up to the most complete picture yet of how the platform vendor with the largest device fleet thinks about securing agents: deterministic guardrails as the baseline, probabilistic ones as reinforcement, and infrastructure attestation underneath it all.

Watch on Apple Developer ↗

The lethal trifecta, cited at 5:55 in session 347.

TL;DR

  • Session 347 is Apple’s first-party prompt-injection doctrine: identify untrusted context through threat modeling, then “focus on deterministic mitigations as a baseline because their security guarantees are easier to audit and reason about,” with probabilistic mitigations like spotlighting layered on top.1
  • The guardrails are shipping APIs, not advice. Foundation Models lifecycle event modifiers give deterministic hooks: .onToolCall intercepts every tool call before execution and blocks it by throwing, and .historyTransform rewrites the transcript before each inference pass for spotlighting delimiters and PII redaction.1
  • App Intents enforces risk automatically: intents inherit risk metadata from the schemas they adopt, a risk evaluation system triggers contextual confirmations, and authenticationPolicy can be overridden only toward stricter.1
  • The same week, Apple extended Private Cloud Compute beyond its own data centers to Google Cloud on NVIDIA hardware, keeping the same five core requirements and rooting software attestation “in at least two separate roots of trust from independent vendors.”2
  • The Privacy and Security group lab filled in the texture: Apple describes using this deterministic-plus-probabilistic stack across Siri AI, Safari, and Xcode, whose agentic features use tool allowlists when Xcode acts as an MCP server.3

The doctrine: deterministic first, probabilistic second

Session 347 walks an example app through a threat model that will look familiar to anyone running agents in production. Indirect prompt injection is defined as “instructions embedded in extra context provided to the model with the intent to redirect control flow,” and the session splits its consequences into two effects worth keeping apart: data poisoning, “an attacker influencing the parameters of an executed action,” and action poisoning, “where the attacker influences what action to execute.”1 The session is honest about the state of the art in a way vendor material rarely is: “solving indirect prompt injection is an active research area, meaning that our best approach at the moment is to understand how much your app is at risk, and aim to mitigate that risk.”1

The ordering principle is the part worth quoting in design reviews. Deterministic mitigations come first “because their security guarantees are easier to audit and reason about”; probabilistic mitigations are worth adding because “different models could more effectively enforce these restrictions,” but the session immediately concedes the limit: spotlighting “is a probabilistic mitigation because the prompt injection could be constructed in a way that negates the spotlighting.”1 User confirmations and device-unlock requirements land on the deterministic side of the ledger. Redaction keeps PII from ever reaching the model, “and thus cannot be exfiltrated.”1 Apple states it has used these mitigations in designing Siri AI.1

One subtlety from the threat model deserves attention because it catches a case most allowlists miss. A create-timer action looks harmless until you notice its optional label parameter: a prompt injection can set the label to attacker-controlled text, and “a subsequent query to list timers, can then pull this attacker controlled data into that context, thus poisoning the new context too.”1 Side-effect-free tools with writable string fields are persistence mechanisms for injections.

The Foundation Models guardrail APIs

The implementation half of the session maps the doctrine onto two shipping surfaces. In the Foundation Models framework, lifecycle event modifiers are “callbacks that deterministically trigger at certain lifecycle points in a session execution.”1

.onToolCall is the action checkpoint. It “is guaranteed to trigger when the LLM outputs a tool call, before the executor runs the tool,” and the contract is the useful part: “if this callback throws an error, then the tool is never executed.”1 The session’s example gates a financial-impact tool behind user confirmation in one place and gets coverage for every tool call in the session. The shape is the same one this blog argued for in approval prompts are not authorization: the check lives in the execution path, not in the model’s instructions.

.historyTransform is the input checkpoint. It “fires before the transcript is rendered to the model for inference,” both on new user requests and on every loop iteration, and the session uses it for the two prompt mitigations: wrapping tool outputs from untrusted sources in spotlighting delimiters, and replacing sensitive data with a redaction placeholder.1 A detail that matters for implementers: transformed entries are scoped to the current inference pass only, so transformations re-apply each iteration, with the @SessionProperty annotation as the escape hatch for expensive stateful transformations.1

App Intents: risk metadata you inherit, not write

The Siri-facing side gets its guardrails from the schema system. When an intent adopts an intent schema, risk metadata “is automatically assigned” based on the schema’s side effects: destructive, exfiltrating, and shared-content-updating actions are riskier, and “the system is more likely to trigger confirmations for high-risk tools.”1 A risk evaluation system combines that static metadata with dynamic system state to decide, contextually, whether to interpose a confirmation before the intent executes; declining blocks the intent entirely.1

Lock-screen exposure gets the same treatment. Because Siri works on a locked device, an attacker in physical possession can reach your intents, so custom intents set an authenticationPolicy, schemas carry sensitivity-based defaults, and the constraint is exactly right: “you can override the schema policy, but only to make it stricter,” with a build error naming the minimum allowed policy if you try to weaken it.1 The compiler refusing to let you under-protect an action is the most Apple-shaped prompt-injection mitigation imaginable.

The infrastructure layer: PCC leaves Apple’s data centers

Three days before the session aired, Apple published “Expanding Private Cloud Compute” on its security blog: new Apple Intelligence workloads now run on Google Cloud with NVIDIA GPUs, “extending our industry-leading PCC privacy commitments to third-party data centers for the first time.”2 The five core requirements carry over unchanged: “stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency.”2 What changes is the implementation: NVIDIA Confidential Computing, Intel CPUs with TDX, and Google’s Titan chip.2

Two design choices stand out against the confidential-computing status quo. For components that could exfiltrate user data if compromised, “software attestation is rooted in at least two separate roots of trust from independent vendors,” and Apple maintains “a cryptographically verifiable, append-only ledger of all Google Cloud hardware that is part of the PCC fleet” against supply-chain attacks.2 The architectural patterns from PCC on Apple silicon carry over too: per-request network parsing in a dedicated namespaced process, shared inference software recycled on a short time-to-live, attested keys held in a separate confidential VM isolated from external inputs.2 Control stays centralized: “Apple retains complete control over PCC software; Apple devices will only trust PCC software that is cryptographically approved by Apple,” with all binaries published for public inspection and live research-mode nodes reachable through the Apple Security Bounty Program.2 The rollout is staged, “gradually ramping towards the complete set of protections throughout the summer preview period.”2

What the lab added

The Privacy and Security group lab ran the same week, and Apple publishes no captions for labs, so what follows is paraphrased from a locally transcribed recording rather than quoted.3 The panel connected the session’s doctrine to shipping surfaces: the deterministic-plus-probabilistic stack runs across Siri AI, Safari, and Xcode’s agentic features, and when Xcode acts as an MCP server, it constrains agents with allowlists of permitted tools.3 On the Siri AI architecture, a panelist described a dedicated hardened, sandboxed daemon with entitlement gating as the only path for collecting and formatting user data before it leaves for Private Cloud Compute, with multi-turn requests re-prompting permission for newly accessed data mid-conversation.3

Two more lab threads are worth flagging for follow-up. The panel said the Foundation Models privacy guarantees do not extend to third-party models reached through the framework’s language model protocol; the developer owns reading those providers’ terms and disclosing accordingly.3 And on the passkey lifecycle question that has dogged WebAuthn adoption, a panelist pointed to the Signal API as the solved answer: web standards now define signalUnknownCredential, signalAllAcceptedCredentials, and signalCurrentUserDetails for keeping credentials in sync between relying parties and authenticators, and the API is real and shipping in W3C WebAuthn Level 3.4

What to take from it

The useful part is not that Apple solved prompt injection; the session says plainly that nobody has. The useful part is watching a platform vendor commit to an ordering: deterministic controls in the execution path first, model-level hints second, infrastructure attestation underneath. For agent builders off Apple’s platforms, every piece has an equivalent: .onToolCall is your tool-call interceptor, .historyTransform is your context sanitizer, schema-inherited risk metadata is your tool-classification table, and stricter-only authenticationPolicy overrides are your policy floor. The framework names are Apple’s; the architecture is portable, and it matches the defense-in-depth this blog laid out in an agent with two untrusted inputs and runtime defense for tool-augmented agents.

FAQ

Threat-model first (identify untrusted context sources and action side effects), then apply “deterministic mitigations as a baseline because their security guarantees are easier to audit and reason about,” with probabilistic mitigations such as spotlighting added on top.1 Concretely: user confirmations and device-unlock requirements on risky actions, PII redaction and spotlighting delimiters on untrusted context.

What APIs implement these guardrails?

In Foundation Models, lifecycle event modifiers: .onToolCall (deterministically intercepts every tool call before execution; throwing blocks the tool) and .historyTransform (rewrites the transcript tail before each inference pass), with @SessionProperty for persistent transformations.1 In App Intents, schema-inherited risk metadata drives contextual confirmations, and authenticationPolicy controls lock-screen access with stricter-only overrides.1

Did Apple really move Private Cloud Compute to Google’s cloud?

Yes, for new Apple Intelligence workloads. PCC now extends to Google Cloud on NVIDIA GPUs with Intel TDX and Google’s Titan chip, keeping the same five PCC requirements, dual-vendor attestation roots, an append-only hardware ledger, and Apple-only software approval, ramping through a summer preview period.2 PCC’s guarantees still do not extend to third-party models like Gemini or Claude reached through the language model protocol.3

Does any of this apply outside Apple platforms?

The architecture does. Execution-path interceptors, context sanitizers, tool risk classification, and policy floors are portable patterns; Apple’s versions are notable because they ship as framework APIs with deterministic contracts rather than as guidance.


Apple’s mitigation stack lands in territory this blog has mapped for a year: the trifecta framing in an agent with two untrusted inputs, the execution-path argument in approval prompts are not authorization, and the infrastructure story in Foundation Models and Private Cloud Compute. The full series hub is the Apple Ecosystem Series.

References


  1. Apple, WWDC 2026 session 347, Secure your app: mitigate risks to agentic features. Official transcript. Source for the Simon Willison Lethal Trifecta citation (private data, untrusted content, external communication), the indirect-prompt-injection definition (“instructions embedded in extra context provided to the model with the intent to redirect control flow”), the data-poisoning and action-poisoning distinction, the active-research-area framing, the deterministic-baseline doctrine and the spotlighting caveat, the Siri AI usage statement, the timer-label context-poisoning example, the .onToolCall contract (guaranteed trigger before execution, throwing blocks the tool), the .historyTransform behavior (fires before each inference render, spotlighting delimiters, “[REDACTED]” placeholder, per-iteration scoping, @SessionProperty for stateful transformations), and the App Intents guardrails (schema-inherited risk metadata, the risk evaluation system combining static metadata and dynamic system state, contextual confirmations, authenticationPolicy with sensitivity-based schema defaults and stricter-only overrides enforced by a build error). 

  2. Apple Security Engineering and Architecture et al., Expanding Private Cloud Compute, Apple Security Research blog, June 8, 2026. Source for the Google Cloud and NVIDIA expansion (“extending our industry-leading PCC privacy commitments to third-party data centers for the first time”), the unchanged core requirements (“stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency”), the implementation stack (NVIDIA Confidential Computing, Intel CPUs with TDX, Google’s Titan chip), the dual-vendor attestation (“software attestation is rooted in at least two separate roots of trust from independent vendors”), the append-only hardware ledger, the carried-over architectural patterns (namespaced per-request parsing, short-TTL software recycling, isolated attested-key VMs), Apple’s retained software control, public binary inspection with bounty-program research access, and the summer preview ramp. 

  3. Apple, WWDC 2026 session 8009, Privacy and Security Group Lab. Paraphrased from a locally transcribed recording; Apple publishes no official captions for the labs, so the wording here is a paraphrase, not a quotation, and exact phrasing is unverified. Source for the deterministic-plus-probabilistic stack described across Siri AI, Safari, and Xcode; the Xcode MCP-server tool allowlists; the Siri AI hardened-daemon architecture with entitlement gating and mid-conversation permission re-prompts; the statement that PCC guarantees do not extend to third-party models reached through the language model protocol; and the panel’s pointer to the WebAuthn Signal API for passkey lifecycle. 

  4. W3C, Web Authentication: An API for accessing Public Key Credentials Level 3. Source for the Signal API methods signalUnknownCredential, signalAllAcceptedCredentials, and signalCurrentUserDetails, which let relying parties signal credential changes so authenticators can remove or update stale passkeys. 

Related Posts

Foundation Models on Private Cloud Compute

iOS 27 adds a server-scale Foundation Model on Private Cloud Compute with on-device privacy, plus a protocol to plug in …

20 min read

Apple Is Open-Sourcing the Foundation Models Framework

WWDC 2026: the Foundation Models framework goes open source this summer, so the same Swift API runs server-side, plus a …

16 min read

When the Maintainer Is the Attacker: jqwik 1.10.0

jqwik 1.10.0 emits a destructive prompt-injection string in Maven output. ANSI escapes hide it from humans. The maintain…

18 min read