Accessibility As Platform: Personal Voice, Live Speech, Eye Tracking, Music Haptics
Personal Voice (iOS 17), Live Speech (iOS 17), Eye Tracking (iOS 18), Music Haptics (iOS 18), Vocal Shortcuts (iOS 18). The arc of Apple’s recent accessibility releases is consistent: features that used to require third-party apps, dedicated hardware, or specialized integrations are becoming platform capabilities the OS handles. The result is fewer apps to install for the user and a different participation model for the developer: instead of building the feature, the developer either opts in to a system surface (Personal Voice authorization) or follows the standards every app should already meet (proper accessibility labels and hit targets for Eye Tracking).
The post walks the developer surface for each feature. The frame is “what does my app have to do to participate” rather than “how do I implement this feature.” Apple has built the feature; the question is whether the app is ready to use it.
TL;DR
- Personal Voice (iOS 17+) lets a user record 15 minutes of audio to create an on-device synthesized voice for AAC and assistive communication apps. Apps integrate via
AVSpeechSynthesizer.requestPersonalVoiceAuthorization()and checkvoiceTraitsfor.isPersonalVoice1. - Live Speech (iOS 17+) is a system feature: the user types text and the device speaks it (optionally with their Personal Voice). Apps do not integrate Live Speech directly; the feature works at the OS level across calls, FaceTime, and in-person communication.
- Eye Tracking (iOS 18+) controls the device via gaze + Dwell Control through the front camera. Apps participate by following accessibility standards (proper accessibility labels, hit-target sizing, focus order); no dedicated API is required for most apps2.
- Music Haptics (iOS 18+) translates music playback into Taptic Engine vibrations synchronized to audio via the
MAMusicHapticsManagerAPI in MediaAccessibility.framework. Any music app can integrate by settingMusicHapticsSupportedin Info.plist, becoming the active Now Playing app, and supplying an ISRC3. - Vocal Shortcuts (iOS 18+) let users assign custom phrases to trigger Siri Shortcuts, including third-party
AppIntentactions. The feature compounds with App Intents adoption (covered in App Intents Are Apple’s New API to Your App).
Personal Voice: The Authorization Pattern
Personal Voice is the accessibility feature with the most direct developer surface1. The user opts in through Settings > Accessibility > Personal Voice, records about 15 minutes of audio reading randomized prompts, and the device generates a synthesized voice locally using on-device machine learning. The voice is private to the user; it does not leave the device unless the user explicitly shares it with iCloud-paired devices.
For an app to use the user’s Personal Voice in AVSpeechSynthesizer, it must:
- Request authorization via
AVSpeechSynthesizer.requestPersonalVoiceAuthorization(completionHandler:). - Wait for the user to grant permission through the system prompt.
- On approval, query
AVSpeechSynthesisVoice.speechVoices()and filter for voices whosevoiceTraitscontain.isPersonalVoice. - Use the resulting
AVSpeechSynthesisVoicelike any other voice inAVSpeechUtterance.
import AVFoundation
AVSpeechSynthesizer.requestPersonalVoiceAuthorization { status in
guard status == .authorized else { return }
let personalVoices = AVSpeechSynthesisVoice.speechVoices().filter { voice in
voice.voiceTraits.contains(.isPersonalVoice)
}
if let voice = personalVoices.first {
let utterance = AVSpeechUtterance(string: "Hello.")
utterance.voice = voice
synthesizer.speak(utterance)
}
}
The authorization is sensitive. Apple’s guidance is that Personal Voice should primarily serve augmentative and alternative communication (AAC) apps and similar assistive contexts. A general-purpose voice-over app requesting Personal Voice authorization is likely to be denied by users and may face App Store review scrutiny.
The on-device-first architecture matters here. The user’s voice training data and the resulting voice model never leave the device’s secure enclave area unless the user explicitly opts into iCloud sharing. App Store privacy nutrition labels for apps using Personal Voice should reflect zero data collection, since the synthesis happens locally and the audio output goes to the speaker, not to the network.
Live Speech: The Zero-Integration System Feature
Live Speech is the consumer-facing pairing for Personal Voice4. The user types text, the device speaks it, optionally using their Personal Voice. Live Speech works during phone calls, FaceTime calls, Mac SharePlay, and in-person conversations through the device speaker.
Apps do not integrate Live Speech directly. The feature operates at the OS level, intercepting typed text from the system Live Speech UI and routing it through the audio stack. From an app’s perspective, Live Speech is invisible: the audio stream that comes through the call (or that plays from the device speaker for in-person use) sounds like the user, but no app code is involved.
The implication for app developers: if your app handles voice (a calling app, a video chat app, an accessibility helper), the app’s audio pipeline must respect the system audio routing so that Live Speech can output through the same channel. Apps that fight the audio session (claiming exclusive control without consideration for system-level overlay sounds) break Live Speech.
Eye Tracking: The Standards-Following Feature
Eye Tracking, introduced in iOS 18, lets users control iPhone and iPad through gaze direction plus Dwell Control2. The user calibrates the front camera in a few seconds, then navigates the UI by looking at elements; holding gaze on an element for the configured Dwell timeout activates it (tap, swipe, or other gestures, configurable in Switch Control).
The implementation is on-device. The front camera processes gaze data through on-device machine learning; the data does not leave the device. No additional hardware is required.
For most apps, supporting Eye Tracking does not require dedicated code. The feature works with any UI that follows standard accessibility conventions:
- Proper hit targets. Apple Human Interface Guidelines specify minimum 44pt by 44pt hit targets for tappable elements. Eye Tracking honors these. Buttons smaller than the minimum are harder to dwell-target accurately.
- Accessibility labels. Every interactive element should have a useful
accessibilityLabel(SwiftUI) oraccessibilityLabelproperty (UIKit). Eye Tracking surfaces the label as a tooltip-equivalent when the user dwells near the element. - Logical focus order. The Tab key on Mac and the focus engine on tvOS surface the same focus order Eye Tracking uses to skip between elements. Apps that use SwiftUI’s standard layout primitives get this for free; apps that override focus behavior need to verify.
- Dwell-friendly modal patterns. A modal that auto-dismisses on outside tap can frustrate Eye Tracking users whose dwell point may briefly leave the modal area. Apps with modal UI should provide explicit dismiss buttons.
Apps that want to opt out of Eye Tracking for specific views (sensitive content, complex gesture-based games) do not have a documented opt-out API for Eye Tracking specifically. The feature works on any visible content and the app’s responsibility is to ensure the standard accessibility surface is correct.
The post on Three Surfaces of an iOS App covers the broader pattern: the visible UI is one surface, App Intents are another, accessibility is the third. Eye Tracking participates in the visible UI surface; getting that surface right is what enables Eye Tracking, Switch Control, VoiceOver, and Voice Control simultaneously.
Music Haptics: The Audio-To-Haptic Bridge
Music Haptics translates music playback into Taptic Engine vibrations synchronized to the audio3. The feature is opt-in per-user (Settings > Accessibility > Music Haptics) and works for any music app that integrates the API correctly, not just Apple Music.
The developer surface lives in MediaAccessibility.framework’s MAMusicHapticsManager (iOS 18+). A music app integrates Music Haptics through three steps:
- Declare support in Info.plist. Add the
MusicHapticsSupportedkey with valueYES. The system uses this to know whether the app participates in Music Haptics rendering. - Become the active Now Playing app. The app must publish playback metadata through
MPNowPlayingInfoCenter.default().nowPlayingInfoand own the now-playing audio session. The system needs a known active Now Playing source to drive haptic synthesis. - Provide an ISRC for the playing track. The
MPNowPlayingInfoPropertyInternationalStandardRecordingCodekey (the International Standard Recording Code) lets the system look up the haptic track that pairs with the audio. Apple maintains a haptic asset library keyed by ISRC; tracks without an ISRC do not get haptics, but the rest of the now-playing integration still works.
import MediaPlayer
import MediaAccessibility
// Info.plist: MusicHapticsSupported = YES (boolean)
let info: [String: Any] = [
MPMediaItemPropertyTitle: track.title,
MPMediaItemPropertyArtist: track.artist,
MPNowPlayingInfoPropertyInternationalStandardRecordingCode: track.isrc,
// ... other now-playing properties
]
MPNowPlayingInfoCenter.default().nowPlayingInfo = info
The integration applies to any music app: a streaming client built on AVAudioEngine, a DJ app with custom decoders, a music-learning app with sample playback. The constraint is the ISRC and the active Now Playing role, not the underlying audio API. Apps that don’t have ISRCs (user-uploaded music with no metadata, generative music) simply don’t get haptics; the rest of the playback integration is unaffected.
For apps in adjacent spaces (rhythm games, music visualizations, sound-effects engines), the audio is not what Music Haptics is designed for. Those apps reach for CHHapticEngine directly with hand-authored haptic patterns synchronized to their audio source.
Vocal Shortcuts: Where Accessibility Meets App Intents
Vocal Shortcuts let users assign custom voice phrases to Siri Shortcuts, including those backed by third-party AppIntent types5. A user can configure “Marker” to trigger an AddTodoIntent registered by a to-do app; saying “Marker” wherever the user is, without invoking Siri’s wake phrase, triggers the intent.
The integration uses the App Intents framework the cluster has covered extensively, with one structural piece that’s easy to miss: the app must declare an AppShortcutsProvider that exposes AppShortcut entries with explicit phrases. A bare AppIntent exists in the system but is only invokable through the Shortcuts editor, where the user manually assembles a Shortcut. An AppShortcutsProvider registers system-visible shortcuts the user can immediately assign to a Vocal Shortcut, the Action Button, Siri, or Spotlight.
struct TodoShortcuts: AppShortcutsProvider {
static var appShortcuts: [AppShortcut] {
AppShortcut(
intent: AddTodoIntent(),
phrases: [
"Add a todo in \(.applicationName)",
"\(.applicationName) marker"
],
shortTitle: "Add Todo",
systemImageName: "checkmark.circle"
)
}
}
The phrases array is what the system surfaces to Siri and to Vocal Shortcuts. With the provider in place, the App Intent is immediately eligible for voice activation. Without it, the intent works through manual Shortcuts setup, but the path is longer and many users never reach it.
The pattern compounds with App Intents and App Intents vs MCP Tools. An App Intent that earns its place in the user’s Apple Intelligence surface, paired with an AppShortcutsProvider that declares how the user invokes it, also earns its place as a Vocal Shortcut target. The cluster’s argument that App Intents are the cross-system contract for “what an app can do” applies here: Vocal Shortcuts are another consumer of that same contract.
The Cross-Cutting Pattern: Standards Are The Integration
The accessibility features above share a structural property: each one is built on top of standards apps should already meet, with a small opt-in API surface for cases where the app must explicitly cooperate (Personal Voice authorization, Music Haptics through MPMusicPlayerController).
The implication for development teams: accessibility work is not a separate workstream done after the app ships. The app’s accessibility labels, hit targets, focus order, and standard system API usage are what make Eye Tracking work, Live Speech route correctly, Music Haptics activate, and Vocal Shortcuts surface the right intents. Apps that treat accessibility as a checkbox at the end of the cycle ship features that work for VoiceOver but not for Eye Tracking, or that route audio in ways Live Speech can’t follow.
The cluster’s What I Refuse to Write About post argues for refusal as a positioning move. Accessibility refusals are the inverse: not “I refuse to add this,” but “I refuse to ship something that fails the standards every iOS app should already meet.”
When Apps Need Custom Accessibility Code
Three cases where the standards-following pattern doesn’t cover everything:
Custom drawing surfaces. A drawing app, a chart, a custom-rendered game UI bypasses the SwiftUI/UIKit accessibility tree. The app must build its own accessibility tree using UIAccessibilityCustomAction, UIAccessibilityElement, and explicit accessibility properties for each meaningful element. Eye Tracking, VoiceOver, and Switch Control all rely on the accessibility tree being populated.
Real-time gestural interactions. A game with continuous gesture input (drawing, drag-to-aim) does not map naturally to dwell-based or switch-based input. The right approach is to provide alternative control schemes (button-based input as an option) rather than to fight the accessibility system.
Accessibility-specific features. AAC apps, voice-augmentation apps, sign-language interpretation apps. These apps are accessibility products in their own right and integrate deeply with system frameworks (Personal Voice, Speech framework, Vision framework for sign language detection). The integration work is real and intentional.
What This Pattern Means For iOS 26+ Apps
Three takeaways.
-
Accessibility participation is mostly standards-following, not feature-building. Apple has been moving accessibility into the platform layer. The work is making sure your app meets the standards Eye Tracking, Switch Control, VoiceOver, and Voice Control all rely on: proper labels, hit targets, focus order, system audio routing.
-
Personal Voice integration is sensitive. If your app has a real AAC use case (assistive communication, voice augmentation, accessibility tooling), Personal Voice authorization is the right integration. For general-purpose apps, requesting Personal Voice authorization is more likely to confuse users than to help them.
-
App Intents are accessibility infrastructure. A clean
AppIntentis automatically eligible for Vocal Shortcuts, gets an accessible UI surface through Shortcuts, and integrates with the system’s voice-driven and switch-driven control modes. The cluster’s argument for App Intents adoption applies to accessibility too.
The full Apple Ecosystem cluster: typed App Intents; MCP servers; the routing question; Foundation Models; the runtime vs tooling LLM distinction; three surfaces; the single source of truth pattern; Two MCP Servers; hooks for Apple development; Live Activities; the watchOS runtime; SwiftUI internals; RealityKit’s spatial mental model; SwiftData schema discipline; Liquid Glass patterns; multi-platform shipping; the platform matrix; Vision framework; Symbol Effects; Core ML inference; Writing Tools API; Swift Testing; Privacy Manifest; what I refuse to write about. The hub is at the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.
FAQ
Do I need to write any code to support Eye Tracking?
For most apps, no. Eye Tracking works automatically with any UI that follows standard accessibility conventions: proper hit targets (44pt minimum), useful accessibility labels, logical focus order, and standard system controls. Apps that draw their own UI (custom views, games, charts) need to populate the accessibility tree explicitly using UIAccessibilityElement or SwiftUI’s accessibility modifiers; that work is also what makes the app work for VoiceOver and Switch Control.
Can I use Personal Voice in a general-purpose voice-over app?
The system permits it via AVSpeechSynthesizer.requestPersonalVoiceAuthorization(), but Apple’s guidance and the App Store review process emphasize Personal Voice for assistive contexts (AAC, augmentative and alternative communication). General-purpose voice-over apps requesting Personal Voice authorization face two challenges: users are unlikely to grant authorization, and review may push back on the request as inappropriate use. If your use case is genuinely assistive, the integration is right; if it’s general-purpose narration, system voices are the right tool.
What’s the difference between Live Speech and Personal Voice?
Personal Voice is the on-device synthesized voice that sounds like the user. Live Speech is the system feature that lets the user type and have the device speak (using either a system voice or their Personal Voice). They are complementary: Personal Voice provides the voice, Live Speech provides the typing-to-speech UI. Apps integrate Personal Voice through the Speech Synthesizer API; Live Speech is invisible to apps and operates at the OS level.
How do I add Music Haptics to a music app that uses AVAudioEngine?
You can. Music Haptics is not scoped to a specific playback API. The integration is: add MusicHapticsSupported = YES to Info.plist, publish the playing track’s metadata through MPNowPlayingInfoCenter.default().nowPlayingInfo (so the system recognizes your app as the active Now Playing source), and include MPNowPlayingInfoPropertyInternationalStandardRecordingCode with the track’s ISRC. The system handles haptic synthesis from there. Tracks without ISRCs do not get haptics, but the rest of the now-playing integration works normally.
What’s the App Intent design that gives the best Vocal Shortcuts experience?
Four principles. First, declare an AppShortcutsProvider for the app and register AppShortcut entries for the intents you want voice-accessible. Without the provider, the intent only reaches Vocal Shortcuts via manual Shortcuts editing. Second, the title and shortTitle should be short verb phrases (“Add Todo,” “Start Timer”) rather than descriptions. Third, parameters should be optional or have defaults so the user can invoke the intent without specifying every field. Fourth, the description should be a single clear sentence explaining the intent’s effect; this surfaces as context when the user picks a phrase to assign.
References
-
Apple Developer: Extend Speech Synthesis with personal and custom voices (WWDC 2023 session 10033). The introduction of
requestPersonalVoiceAuthorizationand the.isPersonalVoicevoice trait. ↩↩ -
Apple Newsroom: Apple announces new accessibility features, including Eye Tracking. The iOS 18 accessibility feature announcement covering Eye Tracking, Music Haptics, and Vocal Shortcuts. ↩↩
-
Apple Developer Documentation:
MAMusicHapticsManagerin MediaAccessibility.framework, the iOS 18+ Music Haptics integration surface. The Info.plistMusicHapticsSupportedkey,MPNowPlayingInfoCenteractive source role, andMPNowPlayingInfoPropertyInternationalStandardRecordingCodetogether enable haptic synthesis for any music app that publishes the right metadata. ↩↩ -
Apple Support: Use Live Speech on your iPhone, iPad, and Mac. The user-facing Live Speech setup guide; the feature operates at the system level without third-party app integration. ↩
-
Apple Developer Documentation: App Intents. The framework that powers Vocal Shortcuts, Spotlight integration, and Apple Intelligence’s action surface for third-party apps. ↩