Apple's First-Party Answer to Prompt Injection

Q: What APIs implement these guardrails?

In Foundation Models, lifecycle event modifiers: .onToolCall (deterministically intercepts every tool call before execution; throwing blocks the tool) and .historyTransform (rewrites the transcript tail before each inference pass), with @SessionProperty for persistent transformations.1 In App Intents, schema-inherited risk metadata drives contextual confirmations, and authenticationPolicy controls lock-screen access with stricter-only overrides.1

Blake Crosley June 12, 2026 12 min read

security prompt-injection foundation-models app-intents private-cloud-compute agents wwdc26

Listen to article

Apple now cites Simon Willison by name. In WWDC 2026 session 347, an Apple security engineer frames agentic risk exactly the way this blog’s security thread has for a year: “we can look to Simon Willison’s Lethal Trifecta, which describes that a user is in most danger whenever an agentic system has: access to private data, exposure to untrusted content, and the ability to externally communicate.”¹ The session, the Privacy and Security group lab, and a security.apple.com announcement the same week add up to the most complete picture yet of how the platform vendor with the largest device fleet thinks about securing agents: deterministic guardrails as the baseline, probabilistic ones as reinforcement, and infrastructure attestation underneath it all.

Watch on Apple Developer ↗

The lethal trifecta, cited at 5:55 in session 347.

TL;DR

Session 347 is Apple’s first-party prompt-injection doctrine: identify untrusted context through threat modeling, then “focus on deterministic mitigations as a baseline because their security guarantees are easier to audit and reason about,” with probabilistic mitigations like spotlighting layered on top.¹
The guardrails are shipping APIs, not advice. Foundation Models lifecycle event modifiers give deterministic hooks: .onToolCall intercepts every tool call before execution and blocks it by throwing, and .historyTransform rewrites the transcript before each inference pass for spotlighting delimiters and PII redaction.¹
App Intents enforces risk automatically: intents inherit risk metadata from the schemas they adopt, a risk evaluation system triggers contextual confirmations, and authenticationPolicy can be overridden only toward stricter.¹
The same week, Apple extended Private Cloud Compute beyond its own data centers to Google Cloud on NVIDIA hardware, keeping the same five core requirements and rooting software attestation “in at least two separate roots of trust from independent vendors.”²
The Privacy and Security group lab filled in the texture: Apple describes using this deterministic-plus-probabilistic stack across Siri AI, Safari, and Xcode, whose agentic features use tool allowlists when Xcode acts as an MCP server.³

The doctrine: deterministic first, probabilistic second

Session 347 walks an example app through a threat model that will look familiar to anyone running agents in production. Indirect prompt injection is defined as “instructions embedded in extra context provided to the model with the intent to redirect control flow,” and the session splits its consequences into two effects worth keeping apart: data poisoning, “an attacker influencing the parameters of an executed action,” and action poisoning, “where the attacker influences what action to execute.”¹ The session is honest about the state of the art in a way vendor material rarely is: “solving indirect prompt injection is an active research area, meaning that our best approach at the moment is to understand how much your app is at risk, and aim to mitigate that risk.”¹

The ordering principle is the part worth quoting in design reviews. Deterministic mitigations come first “because their security guarantees are easier to audit and reason about”; probabilistic mitigations are worth adding because “different models could more effectively enforce these restrictions,” but the session immediately concedes the limit: spotlighting “is a probabilistic mitigation because the prompt injection could be constructed in a way that negates the spotlighting.”¹ User confirmations and device-unlock requirements land on the deterministic side of the ledger. Redaction keeps PII from ever reaching the model, “and thus cannot be exfiltrated.”¹ Apple states it has used these mitigations in designing Siri AI.¹

One subtlety from the threat model deserves attention because it catches a case most allowlists miss. A create-timer action looks harmless until you notice its optional label parameter: a prompt injection can set the label to attacker-controlled text, and “a subsequent query to list timers, can then pull this attacker controlled data into that context, thus poisoning the new context too.”¹ Side-effect-free tools with writable string fields are persistence mechanisms for injections.

The Foundation Models guardrail APIs

The implementation half of the session maps the doctrine onto two shipping surfaces. In the Foundation Models framework, lifecycle event modifiers are “callbacks that deterministically trigger at certain lifecycle points in a session execution.”¹

.onToolCall is the action checkpoint. It “is guaranteed to trigger when the LLM outputs a tool call, before the executor runs the tool,” and the contract is the useful part: “if this callback throws an error, then the tool is never executed.”¹ The session’s example gates a financial-impact tool behind user confirmation in one place and gets coverage for every tool call in the session. The shape is the same one this blog argued for in approval prompts are not authorization: the check lives in the execution path, not in the model’s instructions.

.historyTransform is the input checkpoint. It “fires before the transcript is rendered to the model for inference,” both on new user requests and on every loop iteration, and the session uses it for the two prompt mitigations: wrapping tool outputs from untrusted sources in spotlighting delimiters, and replacing sensitive data with a redaction placeholder.¹ A detail that matters for implementers: transformed entries are scoped to the current inference pass only, so transformations re-apply each iteration, with the @SessionProperty annotation as the escape hatch for expensive stateful transformations.¹

App Intents: risk metadata you inherit, not write

The Siri-facing side gets its guardrails from the schema system. When an intent adopts an intent schema, risk metadata “is automatically assigned” based on the schema’s side effects: destructive, exfiltrating, and shared-content-updating actions are riskier, and “the system is more likely to trigger confirmations for high-risk tools.”¹ A risk evaluation system combines that static metadata with dynamic system state to decide, contextually, whether to interpose a confirmation before the intent executes; declining blocks the intent entirely.¹

Lock-screen exposure gets the same treatment. Because Siri works on a locked device, an attacker in physical possession can reach your intents, so custom intents set an authenticationPolicy, schemas carry sensitivity-based defaults, and the constraint is exactly right: “you can override the schema policy, but only to make it stricter,” with a build error naming the minimum allowed policy if you try to weaken it.¹ The compiler refusing to let you under-protect an action is the most Apple-shaped prompt-injection mitigation imaginable.

The infrastructure layer: PCC leaves Apple’s data centers

Three days before the session aired, Apple published “Expanding Private Cloud Compute” on its security blog: new Apple Intelligence workloads now run on Google Cloud with NVIDIA GPUs, “extending our industry-leading PCC privacy commitments to third-party data centers for the first time.”² The five core requirements carry over unchanged: “stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency.”² What changes is the implementation: NVIDIA Confidential Computing, Intel CPUs with TDX, and Google’s Titan chip.²

Two design choices stand out against the confidential-computing status quo. For components that could exfiltrate user data if compromised, “software attestation is rooted in at least two separate roots of trust from independent vendors,” and Apple maintains “a cryptographically verifiable, append-only ledger of all Google Cloud hardware that is part of the PCC fleet” against supply-chain attacks.² The architectural patterns from PCC on Apple silicon carry over too: per-request network parsing in a dedicated namespaced process, shared inference software recycled on a short time-to-live, attested keys held in a separate confidential VM isolated from external inputs.² Control stays centralized: “Apple retains complete control over PCC software; Apple devices will only trust PCC software that is cryptographically approved by Apple,” with all binaries published for public inspection and live research-mode nodes reachable through the Apple Security Bounty Program.² The rollout is staged, “gradually ramping towards the complete set of protections throughout the summer preview period.”²

What the lab added

The Privacy and Security group lab ran the same week, and Apple publishes no captions for labs, so what follows is paraphrased from a locally transcribed recording rather than quoted.³ The panel connected the session’s doctrine to shipping surfaces: the deterministic-plus-probabilistic stack runs across Siri AI, Safari, and Xcode’s agentic features, and when Xcode acts as an MCP server, it constrains agents with allowlists of permitted tools.³ A separate Apple Intelligence lab drew a useful line between two failure modes developers conflate. A panelist distinguished a refusal error, where the model’s own alignment training declines a request and the failure surfaces under guided or structured generation, from a guardrail error, where a separate moderation model inspects the input and the output independently of the main model.⁵ The same panelist noted an opt-in setting that lets emotionally charged but legitimate input through rather than tripping the guardrail; the exact name of that setting was not legible in the recording and stays unconfirmed.⁵ On the Siri AI architecture, a panelist described a dedicated hardened, sandboxed daemon with entitlement gating as the only path for collecting and formatting user data before it leaves for Private Cloud Compute, with multi-turn requests re-prompting permission for newly accessed data mid-conversation.³

Two more lab threads are worth flagging for follow-up. The panel said the Foundation Models privacy guarantees do not extend to third-party models reached through the framework’s language model protocol; the developer owns reading those providers’ terms and disclosing accordingly.³ And on the passkey lifecycle question that has dogged WebAuthn adoption, a panelist pointed to the Signal API as the solved answer: web standards now define signalUnknownCredential, signalAllAcceptedCredentials, and signalCurrentUserDetails for keeping credentials in sync between relying parties and authenticators, and the API is real and shipping in W3C WebAuthn Level 3.⁴

What to take from it

The useful part is not that Apple solved prompt injection; the session says plainly that nobody has. The useful part is watching a platform vendor commit to an ordering: deterministic controls in the execution path first, model-level hints second, infrastructure attestation underneath. For agent builders off Apple’s platforms, every piece has an equivalent: .onToolCall is your tool-call interceptor, .historyTransform is your context sanitizer, schema-inherited risk metadata is your tool-classification table, and stricter-only authenticationPolicy overrides are your policy floor. The framework names are Apple’s; the architecture is portable, and it matches the defense-in-depth this blog laid out in an agent with two untrusted inputs and runtime defense for tool-augmented agents.

FAQ

What is Apple’s recommended defense against prompt injection?

Threat-model first (identify untrusted context sources and action side effects), then apply “deterministic mitigations as a baseline because their security guarantees are easier to audit and reason about,” with probabilistic mitigations such as spotlighting added on top.¹ Concretely: user confirmations and device-unlock requirements on risky actions, PII redaction and spotlighting delimiters on untrusted context.

What APIs implement these guardrails?

In Foundation Models, lifecycle event modifiers: .onToolCall (deterministically intercepts every tool call before execution; throwing blocks the tool) and .historyTransform (rewrites the transcript tail before each inference pass), with @SessionProperty for persistent transformations.¹ In App Intents, schema-inherited risk metadata drives contextual confirmations, and authenticationPolicy controls lock-screen access with stricter-only overrides.¹

Did Apple really move Private Cloud Compute to Google’s cloud?

Yes, for new Apple Intelligence workloads. PCC now extends to Google Cloud on NVIDIA GPUs with Intel TDX and Google’s Titan chip, keeping the same five PCC requirements, dual-vendor attestation roots, an append-only hardware ledger, and Apple-only software approval, ramping through a summer preview period.² PCC’s guarantees still do not extend to third-party models like Gemini or Claude reached through the language model protocol.³

Does any of this apply outside Apple platforms?

The architecture does. Execution-path interceptors, context sanitizers, tool risk classification, and policy floors are portable patterns; Apple’s versions are notable because they ship as framework APIs with deterministic contracts rather than as guidance.

Apple’s mitigation stack lands in territory this blog has mapped for a year: the trifecta framing in an agent with two untrusted inputs, the execution-path argument in approval prompts are not authorization, and the infrastructure story in Foundation Models and Private Cloud Compute. The full series hub is the Apple Ecosystem Series.

References

Apple, WWDC 2026 session 347, Secure your app: mitigate risks to agentic features. Official transcript. Source for the Simon Willison Lethal Trifecta citation (private data, untrusted content, external communication), the indirect-prompt-injection definition (“instructions embedded in extra context provided to the model with the intent to redirect control flow”), the data-poisoning and action-poisoning distinction, the active-research-area framing, the deterministic-baseline doctrine and the spotlighting caveat, the Siri AI usage statement, the timer-label context-poisoning example, the .onToolCall contract (guaranteed trigger before execution, throwing blocks the tool), the .historyTransform behavior (fires before each inference render, spotlighting delimiters, “[REDACTED]” placeholder, per-iteration scoping, @SessionProperty for stateful transformations), and the App Intents guardrails (schema-inherited risk metadata, the risk evaluation system combining static metadata and dynamic system state, contextual confirmations, authenticationPolicy with sensitivity-based schema defaults and stricter-only overrides enforced by a build error). ↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩
Apple Security Engineering and Architecture et al., Expanding Private Cloud Compute, Apple Security Research blog, June 8, 2026. Source for the Google Cloud and NVIDIA expansion (“extending our industry-leading PCC privacy commitments to third-party data centers for the first time”), the unchanged core requirements (“stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency”), the implementation stack (NVIDIA Confidential Computing, Intel CPUs with TDX, Google’s Titan chip), the dual-vendor attestation (“software attestation is rooted in at least two separate roots of trust from independent vendors”), the append-only hardware ledger, the carried-over architectural patterns (namespaced per-request parsing, short-TTL software recycling, isolated attested-key VMs), Apple’s retained software control, public binary inspection with bounty-program research access, and the summer preview ramp. ↩↩↩↩↩↩↩↩↩
Apple, WWDC 2026 session 8009, Privacy and Security Group Lab. Paraphrased from a locally transcribed recording; Apple publishes no official captions for the labs, so the wording here is a paraphrase, not a quotation, and exact phrasing is unverified. Source for the deterministic-plus-probabilistic stack described across Siri AI, Safari, and Xcode; the Xcode MCP-server tool allowlists; the Siri AI hardened-daemon architecture with entitlement gating and mid-conversation permission re-prompts; the statement that PCC guarantees do not extend to third-party models reached through the language model protocol; and the panel’s pointer to the WebAuthn Signal API for passkey lifecycle. ↩↩↩↩↩↩
W3C, Web Authentication: An API for accessing Public Key Credentials Level 3. Source for the Signal API methods signalUnknownCredential, signalAllAcceptedCredentials, and signalCurrentUserDetails, which let relying parties signal credential changes so authenticators can remove or update stale passkeys. ↩
Apple, WWDC 2026 session 8011, Apple Intelligence Group Lab. Paraphrased from a locally transcribed recording of the WWDC 2026 Apple Intelligence Group Lab; Apple publishes no official captions for the labs, so the wording here is a paraphrase, not a quotation, and exact phrasing is unverified. Source for the distinction between a refusal error (the model’s own alignment training declining a request, surfaced under guided or structured generation) and a guardrail error (a separate moderation model inspecting input and output), and the opt-in setting that lets emotionally charged but legitimate input through; the name of that setting was not legible in the recording and is left unconfirmed. ↩↩