Foundation Models from Python: the fm CLI

For a year, Apple’s on-device large language model lived behind a wall: you reached it only from Swift, only inside an app you built in Xcode1. macOS 27 tears that wall down. Apple now ships a command line tool called fm pre-installed with the OS, and a Foundation Models SDK for Python you install with pip1. The model that used to require a project, a build, and a LanguageModelSession in compiled Swift now answers a one-line shell command and runs inside a Jupyter notebook. Eric Gourlaouen, an engineer on the Foundation Models Framework team, framed the shift plainly in WWDC26 session 334: “until now, those models were only available from Swift code”1. The change is not a new model. The change is that the same on-device model is suddenly scriptable, automatable, and evaluable from outside an app, with no API key and no cloud cost1.

Watch on Apple Developer ↗
Apple introduces two new ways to reach the on-device Apple Foundation Model on macOS: the pre-installed fm command line tool and a Foundation Models SDK for Python.

TL;DR

  • macOS 27 ships fm, a pre-installed command line tool for the on-device Apple Foundation Model. Its subcommands include respond (one-shot prompt to stdout), chat (interactive session), and schema (define structured output)1.
  • fm respond takes options for the model (switch to Private Cloud Compute), an image input, and a schema for structured output; --help lists the rest1.
  • The Python SDK reaches the same on-device model from Python, requiring Python 3.10 or later, Xcode installed, and an Apple Silicon Mac, installed through pip or another package manager1.
  • The SDK mirrors the Swift framework: a LanguageModelSession you call respond on, tool calling, and guided generation via the fm.generable decorator passed to fm.respond as the generating argument1.
  • Both surfaces default to the always-available on-device model and can opt into the larger Private Cloud Compute model, which is more capable but carries usage limits1.
  • The payoff is prototyping and automation: shell scripts that sort files by meaning, and Python evaluation pipelines that grade prompt variants with Pandas and matplotlib1.

The fm command line tool

Open Terminal on macOS 27 and type fm, and the tool prints the commands it supports1. Apple highlights three. fm respond prompts the model and returns a response. fm chat starts an interactive conversation. fm schema creates a schema for structured output1. The simplest possible use is the one Eric demonstrated first: type fm respond, type a prompt, press enter, and read the model’s answer in the terminal a moment later1.

Watch on Apple Developer ↗
The tool comes pre-installed on macOS 27 and lives in the Terminal app; typing fm lists the available commands.

The two top-level commands split along a clean line: exploration versus scripting. fm chat is for getting a first pulse of the model. You ask a question, ask a follow-up, and the conversation holds, with its own slash commands: /model switches the conversation to the Private Cloud Compute model, and /save saves the conversation to resume later1. When you would rather have an inline response you can capture, like in a script, you reach for fm respond instead, which writes the model’s output to stdout1.

fm respond is where the options live. Eric named three explicitly. A model option prompts the Private Cloud Compute model rather than the on-device default. An image option includes an image in the prompt. And a schema option pairs with fm schema object to constrain the output to a structure you defined1. He noted there are more, and pointed at the help option to list them all1. The transcript names the options by their role rather than their exact flag spelling for each one (the one literal form shown on screen is fm schema object), so where I describe an option below I am describing the documented behavior, not inventing a flag string.

The model choice is the decision that matters most. By default fm uses the on-device model that comes with macOS, which is always available1. You can switch to the Apple Foundation Model on Private Cloud Compute, which Eric described as “a much bigger model than the on-device model, so it will perform better on complex problems,” with the tradeoff that it carries usage limits1. The default is the right starting point: it is free, local, and has no cap. You opt up to Private Cloud Compute when a task is genuinely hard enough to need it.

Building an automation script

The file-sorting demo is the clearest argument for why a CLI matters. Eric had a project folder full of draft and final asset versions and wanted a repeatable script that keeps the finals, backs them up, and moves the drafts to an archive disk1. The hard part is not moving files. The hard part is deciding which file is a draft when the names are messy. As he put it, calling into a language model from the script lets it “sort draft versus final files” even when “the names are messy and are difficult to sort predictably”1.

Watch on Apple Developer ↗
A practical script: load the files in a folder, prompt the model to sort drafts from finals, and act on a structured JSON result to back up and archive.

The shape generalizes to any “judgment over a list” automation. The script loads the files in the working directory, then prompts the model through fm respond to sort that list into two groups: final files and draft files1. To make the output usable, it defines a schema up front with fm schema object describing two fields, a list of finals and a list of drafts, and passes that schema to fm respond through the schema option1. The model returns its answer as JSON, and the script reads it to copy the finals to a backup and move the drafts to the archive1.

The structured-output step is the load-bearing one. A free-text answer would force the script to parse prose, the brittle part of every shell pipeline that talks to an LLM. By declaring the schema with fm schema object and getting JSON back, the script gets a contract it can act on directly1. The pattern is the same one Swift developers know as guided generation, exposed here as a CLI option1. Any task that ends in “do something deterministic with the model’s decision” wants exactly this shape: prompt, schema, JSON, act.

The Python SDK

The second surface is for a different person at a different moment. As Eric put it, “if you’re a machine learning engineer, you might use more Python than Swift,” and the SDK “makes it easy to use the on-device model in your Python code”1. The pitch rests on Python’s ecosystem: “Python has a rich ecosystem of open-source packages for machine learning and data science,” which means you can write evaluation pipelines and “leverage those packages to quantify the quality of your feature”1.

Installation has four requirements, all stated in the session. You need Python 3.10 or later, Xcode installed, and an Apple Silicon Mac, and you install the SDK through pip or any other package manager of your choice1. The Apple Silicon and Xcode requirements are the tell that the package is a binding to the same on-device model the OS runs, not a hosted API.

The API will feel familiar to anyone who used the Swift framework2, by design: “the APIs and abstractions will quickly feel familiar”1. You prompt by creating a LanguageModelSession, optionally passing instructions, then calling session.respond with your prompt; the result contains the model’s output1. The SDK carries the framework’s core features across: text and image inputs, streaming responses, tool calling so the model can interact with your code, and guided generation for structured output1.

Watch on Apple Developer ↗
The grocery-app example: create a LanguageModelSession, call respond, expose a tool that fetches recent orders, and constrain output with the fm.generable decorator.

Two of those features got concrete treatment. For tool calling, Eric defined a tool the model can call to fetch a user’s last few orders, “so that it can provide more personalized information,” the same pattern as the Swift framework’s Tool protocol1. For guided generation, he used the fm.generable decorator to define the desired output structure, an ItemsSuggestion object, and passed it to fm.respond as the generating argument1. The decorator is the Python analog of Swift’s @Generable macro, and the generating argument is how you hand the model the shape you want back. Because the transcript shows these by role and object name rather than printing the full class body, treat ItemsSuggestion as the example’s name for a structure you would define yourself.

Evaluation pipelines: the real reason to use Python

The case study is where the Python SDK stops being a convenience and becomes a method. Eric was building a feature to predict what a user wants to add to their grocery cart, and he had three different prompt implementations: a minimal one, a more descriptive one, and a detailed one that spelled out a full list of rules1. The question every prompt engineer faces is which one is actually best, and the honest answer requires measurement, not taste.

Watch on Apple Developer ↗
A Jupyter-notebook evaluation pipeline: generate eval data with a server model, run three prompt variants, store inputs and outputs in a Pandas DataFrame, grade with judge functions, and chart with matplotlib.

Apple is explicit that Swift developers have their own answer here. The Evaluations framework ships with Xcode 27 and makes it easy to create evaluations and track feature accuracy across iterations1. The Python SDK is the parallel path for data scientists who live in notebooks. Eric ran the whole analysis from Jupyter1.

The pipeline reads like a standard ML evaluation loop pointed at the on-device model. First, he used a large server model to generate evaluation data, giving him inputs and an expected output for each1. Then, for every input, he generated outputs from each of the three prompt implementations and stored the inputs and outputs as rows in a Pandas DataFrame1. Next, judge functions backed by a server model scored each output against criteria he chose, and those metrics went back into the DataFrame1. Finally, matplotlib turned the grades into charts1.

The charts told a story no amount of staring at prompts would have: the detailed prompt hit a high rate of generation errors, which Eric attributed to reaching the model’s maximum context window size; the two less-detailed prompts added excess items to the cart while the detailed one added fewer; the detailed prompt missed more expected items; and the minimal prompt hallucinated the most items1. Every prompt had a different failure mode, and only the measurement surfaced them. That is the argument for the whole approach. “With Python, I can make those iterations quickly right from my notebook without having to rebuild the whole project,” Eric said1.

When to reach for each

A few rules follow from the contracts above.

Reach for fm respond when a shell script needs a judgment call. Sorting messy filenames, classifying a line of input, extracting a field from unstructured text. Pair it with fm schema object and the schema option so the script acts on JSON instead of parsing prose1.

Reach for fm chat when you are exploring, not scripting. It is the fastest way to get a first pulse of how the model handles your prompts, with /model to escalate to Private Cloud Compute and /save to keep a session1.

Reach for the Python SDK when you want to measure, beyond simply calling. The moment you have more than one prompt and need to know which is better, the notebook-plus-Pandas-plus-matplotlib loop is the tool, because the on-device model is free and local enough to run an entire evaluation set without a bill1.

Default to the on-device model; opt up to Private Cloud Compute deliberately. The on-device model is always available and has no usage limit. Private Cloud Compute is bigger and better on complex problems but carries usage limits, so save it for the tasks that earn it1.

Prototype here, ship in Swift. Eric’s own framing is that you can use these tools “alongside your Xcode project, as a way to prototype and evaluate prompts,” or “on their own, to use the model in novel ways”1. The grocery-app example prototypes prompts in Python “before implementing them in Swift”1. The CLI and SDK shorten the loop between idea and evidence; the app is still where the feature lands.

FAQ

What is the fm command line tool?

fm is a command line tool that comes pre-installed with macOS 27 and reaches the on-device Apple Foundation Model from the Terminal app1. Its subcommands include respond to prompt the model and print a response, chat to start an interactive conversation, and schema to define structured output. You run it without an API key and without cloud cost, because the default model runs on-device1.

How do I get structured JSON out of fm?

Define a schema with fm schema object, then pass that schema to fm respond through its schema option. The model returns its answer as JSON matching the schema, which a script can act on directly instead of parsing free text1. The mechanism is the CLI version of the framework’s guided generation1.

What does the Foundation Models Python SDK require?

Python 3.10 or later, Xcode installed, and an Apple Silicon Mac1. You install it through pip or another package manager of your choice. The Apple Silicon and Xcode requirements reflect that the SDK binds to the same on-device model the OS runs, rather than calling a hosted API1.

How is the Python SDK different from the Swift framework?

It is the same model and a deliberately familiar API in a different language. You create a LanguageModelSession, call respond, expose tools, and use guided generation through the fm.generable decorator passed to fm.respond as the generating argument1. The reason to choose Python is the ecosystem: Pandas, matplotlib, Jupyter, and the rest of the data-science stack for evaluation pipelines that Swift does not reach as directly1.

When should I use Private Cloud Compute instead of the on-device model?

Both fm and the SDK default to the on-device model, which is always available and has no usage limit1. Switch to Private Cloud Compute, via fm respond’s model option or fm chat’s /model command, when a problem is complex enough to need the bigger model, accepting that it carries usage limits1.

The full Apple Ecosystem cluster: the Foundation Models framework explainer for the Swift foundation these tools mirror; the iOS 27 tool-calling controls for how the model uses tools; the Foundation Models agentic workflow for the on-device-versus-larger-model choice; and Xcode 27’s coding agents for the in-IDE side of an agent-heavy workflow. The hub is at the Apple Ecosystem Series. For the broader picture of building iOS with agents, see the iOS Agent Development guide.



  1. Apple, WWDC26 session 334, “Build AI-powered scripts with the fm CLI and Python SDK,” presented by Eric Gourlaouen of the Foundation Models Framework team. developer.apple.com/videos/play/wwdc2026/334. Source for: the fm tool pre-installed with macOS 27 and its respond, chat, and schema subcommands; fm chat’s /model and /save commands; fm respond’s model, image, schema, and help options; fm schema object for defining structured output and the JSON result contract; the file-sorting automation script; the on-device versus Private Cloud Compute model choice and the latter’s usage limits; the Python SDK’s requirements (Python 3.10+, Xcode, Apple Silicon, install via pip); LanguageModelSession, session.respond, tool calling, and guided generation via the fm.generable decorator passed to fm.respond as the generating argument; the Jupyter/Pandas/matplotlib evaluation pipeline, the three prompt variants, the judge functions backed by a server model, and the per-prompt failure modes (generation errors at max context window, excess items, missed items, hallucinated items); the Xcode 27 Evaluations framework reference; and the “prototype in Python before implementing in Swift” framing. The Python SDK GitHub repository with example snippets and documentation is referenced in the session but no URL is given on screen, so it is described rather than linked. 

  2. Apple Developer, “Foundation Models” framework overview. The WWDC25 Swift framework that introduced the on-device Apple Foundation Model, LanguageModelSession, guided generation, and the Tool protocol, which the fm CLI and Python SDK mirror on macOS 27. 

Artículos relacionados

Running Agentic AI on the Mac with MLX

WWDC 2026: run the full agentic AI loop locally on the Mac with MLX, scale it across multiple Macs, then secure and prof…

17 min de lectura

Xcode 27 Went Agentic

Xcode 27 builds coding agents into the IDE: plan-mode feature work, UI prototyping via previews, agent-driven localizati…

17 min de lectura

The CLI Thesis: Why Agent Architecture Beats IDE Plugins

Three top HN Claude Code threads converge on one conclusion: CLI-first architecture is cheaper, faster, and more composa…

18 min de lectura