Foundation Models in iOS 27: Tool-Calling Control

iOS 26 gave an app an on-device large language model, a way to get type-safe output through @Generable, and a Tool protocol that let the model call your code mid-generation1. The model decided when to reach for a tool, and you wrote the tool. The one thing you could not do was steer the calling behavior itself, and the one thing you always had to do was write every tool by hand, including the ones every app needs. iOS 27 closes both gaps. GenerationOptions.ToolCallingMode lets you control how the model interacts with tools on a per-request basis2, and the Vision framework now ships two ready-made tools, OCRTool and BarcodeReaderTool, that you attach to a session without writing the recognition code yourself34. Together they finish the agentic loop the framework started: the model decides what to do, you decide how aggressively it is allowed to do it, and Apple supplies the perception tools that read the physical world.

What follows is the iOS 27 layer on top of the framework reference. If you have not met LanguageModelSession, the Tool protocol, or guided generation yet, start with the Foundation Models framework explainer and come back.

TL;DR

  • GenerationOptions.ToolCallingMode is a new iOS 27 structure that describes model behavior around tool usage, set per request through GenerationOptions2. Apple documents three modes.
  • The framework can change its mode after the first tool call so the model stops calling tools and produces a final response, which bounds a single request’s tool activity2.
  • OCRTool recognizes text in an image and returns a string of everything it read. You enable it by configuring your LanguageModelSession with an OCRTool instance3.
  • BarcodeReaderTool scans machine-readable codes and returns an array of Barcode results, each carrying the decoded content and the symbology type. You enable it the same way, by configuring the session with an instance4.
  • Both Vision tools let you override the default name and description, so you control how the model identifies and decides to use each one34.
  • Everything here is iOS 27 beta (and the matching iPadOS, macOS, visionOS, and, for two of the three symbols, watchOS betas)234.

What changed between iOS 26 and iOS 27

The iOS 26 framework treated tool calling as binary at the API surface. You handed a session a set of tools, and from then on the model alone decided whether and how often to call them. That works for a single lookup. It gets awkward the moment you want different behavior across requests inside one session: one prompt where the model must consult a tool, another where you would rather it answer from context and skip the round trip.

iOS 27 moves that decision into your hands. ToolCallingMode is a value you pass through GenerationOptions, the same options object that already controls decoding25, and the mode is a property of the request, not the session. The built-in Vision tools change the other side of the equation: instead of writing an OCR pipeline or a barcode scanner and wrapping it in your own Tool conformance, you attach Apple’s implementation and spend your effort on the prompt.

GenerationOptions.ToolCallingMode: steering the calls

ToolCallingMode is a structure under GenerationOptions, available across the iOS 27, iPadOS 27, Mac Catalyst 27, macOS 27, visionOS 27, and watchOS 27 betas2. Apple’s abstract is one sentence: a value you use to describe the model behavior when it comes to tool usage2. The declaration is as plain as it gets:

// iOS 27 beta
struct ToolCallingMode

Apple’s documentation states that tool calling mode supports three modes2. The discussion text that would name each one is partially elided in the reference at the time of writing, so rather than guess at identifiers I will describe what the framework documents about the behavior, which is the part that actually drives your design.

The behavior Apple does spell out: the framework can change the mode after the first tool call, which lets the model produce a final response2. That single sentence is the load-bearing piece. It means a request can start in a posture where the model is free to (or required to) call a tool, and once that first call returns, the framework shifts the mode so the model stops reaching for tools and commits to an answer. The practical effect is a bound on a single request’s tool activity: you are not at the mercy of a model that keeps calling tools in a loop until it exhausts the context window.

You set the mode through the options object you already pass to respond(to:):

import FoundationModels

let session = LanguageModelSession(tools: [FindContacts()])

// A request where you want to govern tool-calling behavior explicitly.
var options = GenerationOptions()
options.toolCallingMode = .someMode   // one of the three documented modes
let response = try await session.respond(
    to: "Draft a dinner invite to three of my contacts.",
    options: options
)

The exact spelling of .someMode comes from the three documented cases; the mechanism is what matters, and the mechanism is that the behavior is per request and carried by GenerationOptions. That object is the same iOS 26 structure that governs the decoding strategy, the way the model chooses output tokens, and the optional response-token cap you reach for only when guarding against runaway verbosity5. Tool-calling mode is a new dimension on a control surface you already use, not a new object to thread through your code.

The control sits at the request level instead of the session level because tool need is a property of the question, not the conversation. A chat session might field one turn that genuinely requires a contacts lookup and a next turn that is pure rephrasing the model can do from what it already holds. Forcing a tool call on the second turn wastes a round trip and burns tokens the shared context window cannot spare5. Per-request mode lets each turn declare its own posture.

Built-in Vision tools: OCRTool and BarcodeReaderTool

The second half of the iOS 27 story lives in the Vision framework, not Foundation Models. Apple now ships two tools you attach to a LanguageModelSession the same way you attach one of your own, except you write none of the recognition code.

OCRTool

OCRTool recognizes text in an image. Apple’s abstract is exactly that, and the discussion is precise about the contract: the tool returns a string containing all recognized text from the image3. To turn it on, you configure your LanguageModelSession with an instance of OCRTool3. The declaration:

// iOS 27 beta, Vision framework
struct OCRTool

Attaching it follows the same shape as any tool, because to the session it is just another Tool:

import FoundationModels
import Vision

// Configure the session with an OCRTool instance to enable it.
let session = LanguageModelSession(tools: [OCRTool()])

let response = try await session.respond(
    to: "Pull the total and the date off this receipt image and summarize them."
)

The model decides when the prompt needs text out of an image, calls OCRTool, gets back a string of everything the tool read, and folds that string into its answer the same way it would fold the result of a tool you wrote3. You wrote no Vision request and no handling code. You attached a tool and described the job.

Apple lets you override the default name and description to customize how the model identifies and uses the tool3. That hook is the only lever you have on when the model reaches for OCR. If your app reads receipts, writing the tool’s description in receipt terms biases the model toward calling it on receipt-shaped prompts and away from prompts where the image is decorative. The description is a function doc the model reads, so write it like one.

BarcodeReaderTool

BarcodeReaderTool scans machine-readable codes in an image4. Where OCRTool returns a flat string, the barcode tool returns structure: when the model encounters an image containing machine-readable codes, it can call this tool to decode them, and the tool returns an array of Barcode results, each containing the decoded content and the symbology type4. The declaration and attachment mirror OCRTool:

// iOS 27 beta, Vision framework
struct BarcodeReaderTool

// Configure the session with a BarcodeReaderTool instance to enable it.
let session = LanguageModelSession(tools: [BarcodeReaderTool()])

let response = try await session.respond(
    to: "Scan this label and tell me what product it is and which standard the code uses."
)

The symbology type in each Barcode result is the detail that earns the structured return4. A QR code, an EAN-13 grocery barcode, and a PDF417 on a driver’s license are all machine-readable codes, and they mean different things to your app. Because the tool hands back the symbology alongside the decoded payload, the model (and your downstream code) can branch on the kind of code, not only the bytes inside it. As with OCRTool, you can override the default name and description to steer how the model identifies and uses the tool4.

Both tools carry the same beta availability: iOS 27, iPadOS 27, Mac Catalyst 27, macOS 27, and visionOS 27 for both, with BarcodeReaderTool also listed for watchOS 2734.

Composing the loop: perception plus controlled calling

The two features are interesting on their own and better together, because they sit on opposite ends of one agentic request. The Vision tools are perception, the model’s eyes on an image. ToolCallingMode is governance, your hand on how hard the model leans on those eyes.

Picture a pantry-restock feature. The user photographs a shelf. The session has both Vision tools attached and one tool of your own, a LookUpProduct that hits the app’s catalog. A single request asks the model to identify the items and build a reorder list. The model calls BarcodeReaderTool to decode the labels it can see, reads any printed text with OCRTool for the items without a clean code, and calls your LookUpProduct to resolve each decoded payload into a catalog entry. Three tools, one prompt, one coherent answer.

import FoundationModels
import Vision

let session = LanguageModelSession(tools: [
    OCRTool(),
    BarcodeReaderTool(),
    LookUpProduct(),     // your own Tool conformance over the app catalog
])

var options = GenerationOptions()
options.toolCallingMode = .someMode   // govern how the model sequences the calls
let response = try await session.respond(
    to: "Identify everything on this shelf and build a reorder list.",
    options: options
)

That is the loop the framework has been building toward. iOS 26 supplied the runtime model, guided generation, and the Tool protocol that lets the on-device model invoke your code without you parsing free text1. The architecture post in this cluster drew the line between that runtime model and the tooling LLM a developer runs in Claude Code to write the app, and made the case for a single Swift domain function backing a Foundation Models Tool, an App Intent, and an MCP tool through three thin adapters6. iOS 27 slots into the runtime side of that picture: the built-in Vision tools are domain functions Apple wrote and you mount, LookUpProduct is the domain function you wrote, the model orchestrates all of them, and ToolCallingMode is the throttle on the orchestration.

The trust boundary does not move. OCRTool and BarcodeReaderTool run inside the app process on-device, against the user’s image, under the same sandbox and privacy posture as a tool you wrote yourself. Apple supplying the implementation changes who maintains the recognition code, not who is accountable for the feature. You still own the prompt, the session, the availability check, and the decision to put a camera in front of the user.

When to use each mode and tool

A few rules that follow from the contracts above.

Reach for ToolCallingMode when tool need varies per request. If every turn in a session needs the same tool behavior, the default is fine and the mode is noise. The mode earns its place when one request must consult a tool and another should answer from context, or when you want the framework’s after-first-call shift to bound a request that could otherwise loop2. Set it on the request, not once for the session, because that is where the control lives2.

Reach for OCRTool when the answer is text trapped in an image. Receipts, signs, handwritten notes, screenshots of text. The tool returns one string of everything it read3, so it fits prompts where you want the model to reason over the words, not the layout. If you need bounding boxes or per-line confidence, that is a lower-level Vision request, not this tool.

Reach for BarcodeReaderTool when the image carries machine-readable codes and the kind of code matters. Product labels, tickets, IDs, inventory tags. The structured return, decoded content plus symbology4, is the reason to prefer it over treating a barcode as generic text. Branch on the symbology in your own tool or your post-processing.

Override the name and description whenever your app has a specific job for a generic tool. Both Vision tools default to generic identities, and the model picks tools partly by their descriptions34. An app that only ever reads receipts should say so in the OCR tool’s description, so the model does not call it on every photo that happens to contain a word.

FAQ

What is GenerationOptions.ToolCallingMode in iOS 27?

It is a structure, new in the iOS 27 beta, that describes the model’s behavior around tool usage for a given request. You set it through the GenerationOptions you pass to respond(to:), so tool-calling behavior is a property of each request rather than the whole session. Apple documents three modes2.

How many tool-calling modes does Apple document, and what are they named?

Apple’s documentation states that tool calling mode supports three modes2. The reference text that names each individual mode is partially elided at the time of writing, so I describe the documented behavior rather than guess at the identifiers. The behavior Apple does state explicitly: the framework can change the mode after the first tool call so the model produces a final response, which bounds a single request’s tool activity2.

How do I enable Apple’s built-in OCR tool?

Configure your LanguageModelSession with an instance of OCRTool, the same way you attach any tool3. The model then calls it when a prompt needs text from an image, and the tool returns a string containing all the recognized text. OCRTool is in the Vision framework and is available in the iOS 27 beta3.

What does BarcodeReaderTool return?

It returns an array of Barcode results, each containing the decoded content and the symbology type4. The symbology lets you tell a QR code from an EAN-13 from a PDF417 and branch on the kind of code, not only its payload. You enable it by configuring a LanguageModelSession with a BarcodeReaderTool instance4.

Can I change how the model decides to use the built-in Vision tools?

Yes. Both OCRTool and BarcodeReaderTool let you override the default name and description to customize how the model identifies and uses the tool34. The description is the lever on when the model reaches for the tool, so writing it in your app’s own terms biases the model toward the right calls.

Do the built-in Vision tools send images off the device?

No. OCRTool and BarcodeReaderTool are Foundation Models tools that run inside the app process on-device, under the same sandbox and privacy posture as a tool you write yourself134. Apple supplying the recognition code changes who maintains it, not where it runs or who is accountable for the feature.

The full Apple Ecosystem cluster: the Foundation Models framework explainer; the on-device LLM; the runtime vs tooling LLM distinction; custom adapters; typed App Intents; the new App Intents iOS 27 background execution and sync; the routing question against MCP tools; the Vision framework; Core ML inference; three surfaces. The hub is at the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.



  1. Apple Developer, “Foundation Models” framework overview and “Tool” protocol. The iOS 26 framework introduced the on-device model, LanguageModelSession, guided generation via @Generable, and the Tool protocol that lets the model invoke app code mid-generation. 

  2. Apple Developer, “GenerationOptions.ToolCallingMode”. A structure (struct ToolCallingMode) available in the iOS 27.0, iPadOS 27.0, Mac Catalyst 27.0, macOS 27.0, visionOS 27.0, and watchOS 27.0 betas, abstracted as a value that describes model behavior around tool usage. Apple’s discussion states tool calling mode supports three modes and that the framework can change the mode after the first tool call, which lets the model produce a final response. 

  3. Apple Developer, “OCRTool”. A Vision-framework structure (struct OCRTool) available in the iOS 27.0, iPadOS 27.0, Mac Catalyst 27.0, macOS 27.0, and visionOS 27.0 betas, abstracted as a tool that recognizes text in an image. Apple’s discussion states the tool returns a string containing all recognized text, that you enable it by configuring your LanguageModelSession with an instance of OCRTool, and that you can override the default name and description. 

  4. Apple Developer, “BarcodeReaderTool”. A Vision-framework structure (struct BarcodeReaderTool) available in the iOS 27.0, iPadOS 27.0, Mac Catalyst 27.0, macOS 27.0, visionOS 27.0, and watchOS 27.0 betas, abstracted as a tool that scans machine-readable codes in an image. Apple’s discussion states the tool returns an array of Barcode results, each containing the decoded content and the symbology type, that you enable it by configuring your LanguageModelSession with an instance of BarcodeReaderTool, and that you can override the default name and description. 

  5. Apple Developer, “GenerationOptions”. The iOS 26 structure (struct GenerationOptions) whose options determine the decoding strategy the framework uses to adjust how the model chooses output tokens; Apple notes a strict response-token limit should be used only to guard against unexpectedly verbose responses, and that all input contributes to the shared context window. 

  6. Author’s analysis in Foundation Models Agentic Workflow: In-App vs Tooling LLM, May 1, 2026, on the runtime/tooling LLM distinction, the on-device Tool protocol’s trust boundary, and the single-domain-function, multiple-adapter pattern across Foundation Models tools, App Intents, and MCP. The routing question between those surfaces is developed in App Intents vs MCP: The Routing Question

Powiązane artykuły

Apple Foundation Models: The On-Device LLM Framework, Explained

Apple's Foundation Models framework: LanguageModelSession, @Generable guided generation, tool calling, availability, and…

13 min czytania

App Intents in iOS 27: Background, Sync, Spotlight

iOS 27 gives App Intents background execution past 30 seconds via LongRunningIntent, cross-device identity with Syncable…

18 min czytania

Your Agent Has Two Untrusted Inputs

AI agents have two untrusted inputs: code the model writes and tool output it reads. One now has a real WASM sandbox; th…

12 min czytania