Apple's New Speech Framework: SpeechAnalyzer vs SFSpeechRecognizer

Q: Is SFSpeechRecognizer deprecated?

Apple has not formally deprecated SFSpeechRecognizer. It continues to ship in iOS 26 and remains supported. The framing in WWDC 2025 is that SpeechAnalyzer is the modern, recommended path for new code; the legacy framework is the right tool for specific cases (Custom Vocabulary, older OS support).

Q: Can I use SpeechAnalyzer with pre-recorded audio files?

Yes. SpeechAnalyzer.start(inputSequence:) accepts an AsyncSequence of audio buffers. Apps wrap any audio source (microphone via AVAudioEngine, pre-recorded file URLs, AVAsset instances) into an AsyncSequence adapter and feed it to the analyzer. The transcription stream produces the same for try await result in transcriber.results consumption regardless of input source.

Q: What happens to Custom Vocabulary if I migrate?

Custom Vocabulary is not currently supported by SpeechAnalyzer / SpeechTranscriber. Apps that depend on it for domain-specific accuracy should not migrate that path until Apple adds the feature. A hybrid approach (using SpeechAnalyzer for general transcription and SFSpeechRecognizer with contextualStrings for vocabulary-sensitive transcription) works in iOS 26.

Q: Can I run SpeechAnalyzer server-side?

No. SpeechAnalyzer is an on-device-only framework. It does not have a server-side path. For server-side transcription, the right tools are cloud APIs (OpenAI Whisper API, Google Cloud Speech-to-Text, AWS Transcribe) or self-hosted models. The Apple framework&rsquo;s value is precisely the on-device privacy and zero-cost-per-call story.

Q: How does language detection work?

SpeechTranscriber(locale:) accepts an initial locale. The model can adapt to mid-stream language switches automatically. For apps where the language is known up front (a localized app&rsquo;s dictation feature), specify it explicitly. For multilingual contexts (a meeting transcriber where speakers may switch), the automatic management is the right behavior.

Q: Where does this fit with the cluster&rsquo;s other on-device ML posts?

SpeechAnalyzer is the third pillar of the on-device perception stack: Vision (covered in Vision Framework) handles images, Speech handles audio, and Core ML (covered in Core ML On-Device Inference) is the engine underneath both. Foundation Models (covered in Foundation Models on-device LLM) handles language reasoning. Together they form a complete on-device AI pipeline that does not require network calls.

iOS 26 introduces a new speech-recognition framework alongside the existing SFSpeechRecognizer. The new API surface is SpeechAnalyzer plus modules (SpeechTranscriber, SpeechDetector) that compose around it¹. Apple’s own framing is that SpeechAnalyzer is the modern path: a new on-device model, long-form audio support, automatic language management, low latency for real-time use cases, and a modular architecture that supports adding more analysis types over time. SFSpeechRecognizer continues to ship and work; it remains the right tool for apps that depend on its custom-vocabulary feature, which the new framework does not yet offer.

The post walks the new framework against the old one. The frame is “when to migrate” rather than “how to use the new API,” because every team with a working SFSpeechRecognizer integration faces the same triage decision: is the new framework’s modern model and architecture worth the migration cost, or do existing custom-vocabulary investments justify staying?

TL;DR

SpeechAnalyzer (iOS 26+) is Apple’s modern on-device speech-recognition framework. It coordinates analysis modules configured at init; iOS 26 ships three: SpeechTranscriber (long-form), DictationTranscriber (short-utterance, the SFSpeechRecognizer-equivalent), and SpeechDetector (voice activity detection, must pair with a transcriber)².
The new framework is built around long-form audio: lectures, meetings, multi-speaker conversations. It runs entirely on-device, manages languages automatically, and ships with a new proprietary Apple model that is reportedly 2× faster than Whisper Large V3 Turbo on equivalent transcription tasks³.
SFSpeechRecognizer continues to ship and work. The legacy framework retains the Custom Vocabulary feature (registering known keywords for higher accuracy on domain-specific terms), which the new framework does not yet offer.
Migration is per-feature, not all-or-nothing. Apps that need long-form transcription, lower latency, or better distant-audio quality migrate to SpeechAnalyzer. Apps with Custom Vocabulary investments keep SFSpeechRecognizer for those features and add SpeechAnalyzer for new ones.
The cluster’s Vision framework post covers Apple’s other on-device perception primitive; SpeechAnalyzer extends the same on-device, no-cloud pattern to audio.

The Architecture: Analyzer + Modules

SpeechAnalyzer is not a transcriber by itself. It is a coordinator that manages an audio analysis session and dispatches the audio buffer to one or more modules². Modules are configured at init through the init(modules:) initializer, and analysis starts by feeding an AsyncSequence of audio buffers via start(inputSequence:):

import Speech

let transcriber = SpeechTranscriber(
    locale: .current,
    transcriptionOptions: [],
    reportingOptions: [.volatileResults],
    attributeOptions: []
)
let analyzer = SpeechAnalyzer(modules: [transcriber])

try await analyzer.start(inputSequence: audioInputSequence)

for try await result in transcriber.results {
    if result.isFinal {
        print(result.text)
    }
}

Three modules ship in iOS 26:

SpeechTranscriber. The speech-to-text module designed for long-form audio (lectures, meetings, multi-speaker conversations). Returns streaming results with timing per token, confidence scores, and a results AsyncSequence the app consumes through for try await. Each result has an isFinal flag separating volatile partial hypotheses from finalized text.

DictationTranscriber. The drop-in equivalent for the older SFSpeechRecognizer use case: short-utterance transcription with the same on-device model SFSpeechRecognizer uses. Apps migrating from SFSpeechRecognizer for short queries reach for DictationTranscriber; apps adopting the framework for long-form recording reach for SpeechTranscriber. The split matters because SpeechTranscriber and DictationTranscriber use different language coverage and different model paths.

SpeechDetector. Voice activity detection. Reports events when speech starts and ends within the audio stream. The detector cannot run alone; it must be paired with one of the transcriber modules in the same SpeechAnalyzer instance. Apps use it to gate transcription compute (don’t transcribe silence) or to drive UI affordances (“speak now” indicators).

The modular architecture is the structural improvement over SFSpeechRecognizer. The old API combines audio session management, language detection, and transcription into a single object; the new API separates concerns, so apps compose the modules they need.

What the New Model Brings

The transcription model behind SpeechTranscriber is a new on-device model Apple developed specifically for this framework⁴. The improvements Apple highlights at WWDC 2025:

Long-form audio quality. The model is trained for sustained transcription over minutes or hours, not just short queries. Lectures, podcasts, multi-speaker meetings, and dictation sessions transcribe with accuracy that Apple positions against Whisper-class models. MacStories’ independent test measured roughly 2.2× faster than MacWhisper’s Large V3 Turbo build on equivalent transcription tasks³.

Distant audio handling. Microphones placed across a room, conference-table audio with multiple speakers, audio with environmental noise. The model is trained for these conditions; SFSpeechRecognizer’s older model handles them less gracefully.

Real-time low-latency operation. The streaming results from SpeechTranscriber arrive faster than the old framework’s SFSpeechRecognitionTask.shouldReportPartialResults callbacks. Apps that surface live transcription (captioning, voice-driven UIs, dictation) get smoother updates.

Automatic language management. SpeechTranscriber(locale:) accepts a starting locale, but the model can adapt to mid-stream language switches. The old framework requires the developer to instantiate per-language recognizers and switch between them.

No app-size cost. The model ships with the OS, not with the app. Apps adopting SpeechAnalyzer do not bundle additional model weights. The contrast with shipping a Whisper-class model in the app bundle is significant: a competitive on-device transcription stack costs zero bundle bytes.

What the Old Framework Still Offers

SFSpeechRecognizer continues to ship and work in iOS 26. Three reasons an app might keep using it:

Custom Vocabulary. SFSpeechRecognitionRequest.contextualStrings lets the app register a list of known keywords (proper nouns, technical terms, product names) that the model will be more likely to recognize accurately. The feature substantially improves accuracy for domain-specific apps (medical dictation with drug names, legal apps with case citations, engineering apps with part numbers). SpeechAnalyzer does not yet offer Custom Vocabulary; for apps depending on this feature, migration would be a regression in accuracy.

Older OS support. SFSpeechRecognizer is available on iOS 10+; SpeechAnalyzer requires iOS 26+. Apps targeting iOS 18 and earlier need the legacy framework.

Existing integration that works. Apps with stable, audited, performant SFSpeechRecognizer integrations have no urgent reason to migrate. The new framework’s improvements matter most for new use cases (long-form transcription, distant audio, multi-speaker conversations); apps that handle short voice queries through the legacy API may not gain enough to justify the migration.

When To Migrate

Three migration triggers worth naming:

The app processes long-form audio. A meeting recorder, a lecture transcription app, a podcast-to-text tool. The new model’s training on sustained audio is the right fit; the old model degrades over long sessions. Migrate first.

The app needs distant or noisy audio. Conference-room transcription, interview recording with a single distant mic, audio captured in environments with ambient noise. The new model handles these conditions noticeably better.

The app surfaces live transcription UI. Caption overlays, dictation interfaces, voice-driven assistive UIs. The lower latency of streaming results from SpeechTranscriber makes the UI feel more responsive.

Cases that don’t necessarily warrant migration:

Short voice queries with custom vocabulary (prescription dictation, legal terminology). Keep SFSpeechRecognizer for the vocabulary feature; reach for SpeechAnalyzer if Apple adds vocabulary support in a future release.
Apps that need to support iOS 18 and earlier. SpeechAnalyzer is iOS 26-only; the codebase needs the legacy framework for older targets regardless.

The Side-By-Side Pattern

For apps that both target older OS versions and want the new framework’s quality on iOS 26+, the side-by-side pattern is the right approach:

import Speech

if #available(iOS 26.0, *) {
    let transcriber = DictationTranscriber(locale: .current)
    let analyzer = SpeechAnalyzer(modules: [transcriber])
    try await analyzer.start(inputSequence: audioInputSequence)
    for try await result in transcriber.results {
        if result.isFinal {
            handleTranscription(result.text)
        }
    }
} else {
    let recognizer = SFSpeechRecognizer(locale: .current)!
    let request = SFSpeechAudioBufferRecognitionRequest()
    request.shouldReportPartialResults = true
    request.requiresOnDeviceRecognition = true
    let task = recognizer.recognitionTask(with: request) { result, error in
        guard let result else { return }
        handleTranscription(result.bestTranscription.formattedString)
    }
}

DictationTranscriber is the right choice for the iOS 26+ branch because the migration target is the SFSpeechRecognizer use case (short queries with the same dictation model). Apps targeting long-form audio swap DictationTranscriber for SpeechTranscriber in the iOS 26 branch.

The two frameworks coexist; the runtime check picks the right one based on availability. Neither blocks the other; the app’s transcription pipeline adapts.

Privacy and the Speech Authorization Surface

Both frameworks share the same Speech framework permission (NSSpeechRecognitionUsageDescription in Info.plist) and the same user-facing authorization flow⁵. The privacy story is the same: speech transcription happens on-device for both frameworks. SpeechAnalyzer is on-device-only by design; SFSpeechRecognizer defaults to on-device when the request’s requiresOnDeviceRecognition flag is set to true on the SFSpeechRecognitionRequest itself, otherwise it can fall back to a server-side path.

The implication: apps using SpeechAnalyzer should still handle the Speech authorization correctly. The user prompt, the Settings entry, and the App Store privacy nutrition label all use the same authorization mechanism.

For apps that stream microphone audio to the analyzer, the standard AVAudioSession configuration applies. The cluster’s Privacy Manifest post covers the manifest entries for Speech-using apps; both frameworks fall under the same privacy declarations.

The Agent-Workflow Connection

SpeechAnalyzer’s on-device model and structured output pair cleanly with two cluster patterns:

Foundation Models for in-app reasoning. A pipeline that transcribes audio with SpeechTranscriber, then summarizes the transcript with the on-device LLM (covered in Foundation Models on-device LLM), runs entirely on-device. Total network calls: zero. Total third-party data exposure: zero.

App Intents for voice-driven actions. An AppIntent that takes a transcript as input can be invoked through Vocal Shortcuts (covered in Accessibility as platform) or through Apple Intelligence’s action surface. The intent’s perform method runs SpeechAnalyzer to transcribe the input, then dispatches to the app’s logic. The whole flow is private and local.

The pattern: the new Speech framework completes the on-device perception triangle (Vision for images, Foundation Models for language reasoning, Speech for audio) that makes fully-local AI features practical for iOS apps.

What This Pattern Means For iOS 26+ Apps

Three takeaways.

Default to SpeechAnalyzer for new code. The modern model, modular architecture, and improved long-form / distant / live performance make it the right starting point. The legacy framework is the fallback when older OS support or Custom Vocabulary is required.
Keep SFSpeechRecognizer for vocabulary-dependent apps. Until Apple adds Custom Vocabulary to the new framework, apps that depend on contextualStrings for accuracy on domain-specific terms keep the old API. The two frameworks coexist; mixing them per-feature is the right pattern.
The on-device privacy story extends from Vision to Speech. Apps that built around Vision’s on-device CV now have the equivalent for audio. Combined with Foundation Models for reasoning, the full perception-to-language pipeline can run locally without third-party data exposure.

The full Apple Ecosystem cluster: typed App Intents; MCP servers; the routing question; Foundation Models; the runtime vs tooling LLM distinction; three surfaces; the single source of truth pattern; Two MCP Servers; hooks for Apple development; Live Activities; the watchOS runtime; SwiftUI internals; RealityKit’s spatial mental model; SwiftData schema discipline; Liquid Glass patterns; multi-platform shipping; the platform matrix; Vision framework; Symbol Effects; Core ML inference; Writing Tools API; Swift Testing; Privacy Manifest; Accessibility as platform; SF Pro typography; visionOS spatial patterns; what I refuse to write about. The hub is at the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.

FAQ

Is `SFSpeechRecognizer` deprecated?

Apple has not formally deprecated SFSpeechRecognizer. It continues to ship in iOS 26 and remains supported. The framing in WWDC 2025 is that SpeechAnalyzer is the modern, recommended path for new code; the legacy framework is the right tool for specific cases (Custom Vocabulary, older OS support).

Can I use `SpeechAnalyzer` with pre-recorded audio files?

Yes. SpeechAnalyzer.start(inputSequence:) accepts an AsyncSequence of audio buffers. Apps wrap any audio source (microphone via AVAudioEngine, pre-recorded file URLs, AVAsset instances) into an AsyncSequence adapter and feed it to the analyzer. The transcription stream produces the same for try await result in transcriber.results consumption regardless of input source.

What happens to Custom Vocabulary if I migrate?

Custom Vocabulary is not currently supported by SpeechAnalyzer / SpeechTranscriber. Apps that depend on it for domain-specific accuracy should not migrate that path until Apple adds the feature. A hybrid approach (using SpeechAnalyzer for general transcription and SFSpeechRecognizer with contextualStrings for vocabulary-sensitive transcription) works in iOS 26.

Can I run `SpeechAnalyzer` server-side?

No. SpeechAnalyzer is an on-device-only framework. It does not have a server-side path. For server-side transcription, the right tools are cloud APIs (OpenAI Whisper API, Google Cloud Speech-to-Text, AWS Transcribe) or self-hosted models. The Apple framework’s value is precisely the on-device privacy and zero-cost-per-call story.

How does language detection work?

SpeechTranscriber(locale:) accepts an initial locale. The model can adapt to mid-stream language switches automatically. For apps where the language is known up front (a localized app’s dictation feature), specify it explicitly. For multilingual contexts (a meeting transcriber where speakers may switch), the automatic management is the right behavior.

Where does this fit with the cluster’s other on-device ML posts?

SpeechAnalyzer is the third pillar of the on-device perception stack: Vision (covered in Vision Framework) handles images, Speech handles audio, and Core ML (covered in Core ML On-Device Inference) is the engine underneath both. Foundation Models (covered in Foundation Models on-device LLM) handles language reasoning. Together they form a complete on-device AI pipeline that does not require network calls.

References

Apple Developer: Bring advanced speech-to-text to your app with SpeechAnalyzer (WWDC 2025 session 277). Introduction of the SpeechAnalyzer framework, modular architecture, and the new on-device transcription model. ↩
Apple Developer Documentation: SpeechAnalyzer and SpeechTranscriber. The framework reference covering analyzer-and-modules architecture. ↩↩
MacStories: Hands-On: How Apple’s New Speech APIs Outpace Whisper for Lightning-Fast Transcription. Independent benchmark of the new model against Whisper Large V3 Turbo, reporting roughly 2× faster transcription on Mac Silicon hardware. ↩↩
Apple Developer Documentation: Bringing advanced speech-to-text capabilities to your app. Apple’s official adoption guide covering streaming results and multi-locale support. ↩
Apple Developer Documentation: SFSpeechRecognizer.requestAuthorization(_:). The shared authorization surface for both speech frameworks. ↩