Accessibility As Platform: Personal Voice, Live Speech, Eye Tracking, Music Haptics

Q: Can I use Personal Voice in a general-purpose voice-over app?

The system permits it via AVSpeechSynthesizer.requestPersonalVoiceAuthorization(), but Apple&rsquo;s guidance and the App Store review process emphasize Personal Voice for assistive contexts (AAC, augmentative and alternative communication). General-purpose voice-over apps requesting Personal Voice authorization face two challenges: users are unlikely to grant authorization, and review may push back on the request as inappropriate use. If your use case is genuinely assistive, the integration is right; if it&rsquo;s general-purpose narration, system voices are the right tool.

Q: What&rsquo;s the App Intent design that gives the best Vocal Shortcuts experience?

Four principles. First, declare an AppShortcutsProvider for the app and register AppShortcut entries for the intents you want voice-accessible. Without the provider, the intent only reaches Vocal Shortcuts via manual Shortcuts editing. Second, the title and shortTitle should be short verb phrases (&ldquo;Add Todo,&rdquo; &ldquo;Start Timer&rdquo;) rather than descriptions. Third, parameters should be optional or have defaults so the user can invoke the intent without specifying every field. Fourth, the description should be a single clear sentence explaining the intent&rsquo;s effect; this surfaces as context when the user picks a phrase to assign.

Personal Voice (iOS 17), Live Speech (iOS 17), Eye Tracking (iOS 18), Music Haptics (iOS 18), Vocal Shortcuts (iOS 18). The arc of Apple’s recent accessibility releases is consistent: features that used to require third-party apps, dedicated hardware, or specialized integrations are becoming platform capabilities the OS handles. The result is fewer apps to install for the user and a different participation model for the developer: instead of building the feature, the developer either opts in to a system surface (Personal Voice authorization) or follows the standards every app should already meet (proper accessibility labels and hit targets for Eye Tracking).

The post walks the developer surface for each feature. The frame is “what does my app have to do to participate” rather than “how do I implement this feature.” Apple has built the feature; the question is whether the app is ready to use it.

TL;DR

Personal Voice (iOS 17+) lets a user record 15 minutes of audio to create an on-device synthesized voice for AAC and assistive communication apps. Apps integrate via AVSpeechSynthesizer.requestPersonalVoiceAuthorization() and check voiceTraits for .isPersonalVoice¹.
Live Speech (iOS 17+) is a system feature: the user types text and the device speaks it (optionally with their Personal Voice). Apps do not integrate Live Speech directly; the feature works at the OS level across calls, FaceTime, and in-person communication.
Eye Tracking (iOS 18+) controls the device via gaze + Dwell Control through the front camera. Apps participate by following accessibility standards (proper accessibility labels, hit-target sizing, focus order); no dedicated API is required for most apps².
Music Haptics (iOS 18+) translates music playback into Taptic Engine vibrations synchronized to audio via the MAMusicHapticsManager API in MediaAccessibility.framework. Any music app can integrate by setting MusicHapticsSupported in Info.plist, becoming the active Now Playing app, and supplying an ISRC³.
Vocal Shortcuts (iOS 18+) let users assign custom phrases to trigger Siri Shortcuts, including third-party AppIntent actions. The feature compounds with App Intents adoption (covered in App Intents Are Apple’s New API to Your App).

Personal Voice: The Authorization Pattern

Personal Voice is the accessibility feature with the most direct developer surface¹. The user opts in through Settings > Accessibility > Personal Voice, records about 15 minutes of audio reading randomized prompts, and the device generates a synthesized voice locally using on-device machine learning. The voice is private to the user; it does not leave the device unless the user explicitly shares it with iCloud-paired devices.

For an app to use the user’s Personal Voice in AVSpeechSynthesizer, it must:

Request authorization via AVSpeechSynthesizer.requestPersonalVoiceAuthorization(completionHandler:).
Wait for the user to grant permission through the system prompt.
On approval, query AVSpeechSynthesisVoice.speechVoices() and filter for voices whose voiceTraits contain .isPersonalVoice.
Use the resulting AVSpeechSynthesisVoice like any other voice in AVSpeechUtterance.

import AVFoundation

AVSpeechSynthesizer.requestPersonalVoiceAuthorization { status in
    guard status == .authorized else { return }

    let personalVoices = AVSpeechSynthesisVoice.speechVoices().filter { voice in
        voice.voiceTraits.contains(.isPersonalVoice)
    }

    if let voice = personalVoices.first {
        let utterance = AVSpeechUtterance(string: "Hello.")
        utterance.voice = voice
        synthesizer.speak(utterance)
    }
}

The authorization is sensitive. Apple’s guidance is that Personal Voice should primarily serve augmentative and alternative communication (AAC) apps and similar assistive contexts. A general-purpose voice-over app requesting Personal Voice authorization is likely to be denied by users and may face App Store review scrutiny.

The on-device-first architecture matters here. The user’s voice training data and the resulting voice model never leave the device’s secure enclave area unless the user explicitly opts into iCloud sharing. App Store privacy nutrition labels for apps using Personal Voice should reflect zero data collection, since the synthesis happens locally and the audio output goes to the speaker, not to the network.

Live Speech: The Zero-Integration System Feature

Live Speech is the consumer-facing pairing for Personal Voice⁴. The user types text, the device speaks it, optionally using their Personal Voice. Live Speech works during phone calls, FaceTime calls, Mac SharePlay, and in-person conversations through the device speaker.

Apps do not integrate Live Speech directly. The feature operates at the OS level, intercepting typed text from the system Live Speech UI and routing it through the audio stack. From an app’s perspective, Live Speech is invisible: the audio stream that comes through the call (or that plays from the device speaker for in-person use) sounds like the user, but no app code is involved.

The implication for app developers: if your app handles voice (a calling app, a video chat app, an accessibility helper), the app’s audio pipeline must respect the system audio routing so that Live Speech can output through the same channel. Apps that fight the audio session (claiming exclusive control without consideration for system-level overlay sounds) break Live Speech.

Eye Tracking: The Standards-Following Feature

Eye Tracking, introduced in iOS 18, lets users control iPhone and iPad through gaze direction plus Dwell Control². The user calibrates the front camera in a few seconds, then navigates the UI by looking at elements; holding gaze on an element for the configured Dwell timeout activates it (tap, swipe, or other gestures, configurable in Switch Control).

The implementation is on-device. The front camera processes gaze data through on-device machine learning; the data does not leave the device. No additional hardware is required.

For most apps, supporting Eye Tracking does not require dedicated code. The feature works with any UI that follows standard accessibility conventions:

Proper hit targets. Apple Human Interface Guidelines specify minimum 44pt by 44pt hit targets for tappable elements. Eye Tracking honors these. Buttons smaller than the minimum are harder to dwell-target accurately.
Accessibility labels. Every interactive element should have a useful accessibilityLabel (SwiftUI) or accessibilityLabel property (UIKit). Eye Tracking surfaces the label as a tooltip-equivalent when the user dwells near the element.
Logical focus order. The Tab key on Mac and the focus engine on tvOS surface the same focus order Eye Tracking uses to skip between elements. Apps that use SwiftUI’s standard layout primitives get this for free; apps that override focus behavior need to verify.
Dwell-friendly modal patterns. A modal that auto-dismisses on outside tap can frustrate Eye Tracking users whose dwell point may briefly leave the modal area. Apps with modal UI should provide explicit dismiss buttons.

Apps that want to opt out of Eye Tracking for specific views (sensitive content, complex gesture-based games) do not have a documented opt-out API for Eye Tracking specifically. The feature works on any visible content and the app’s responsibility is to ensure the standard accessibility surface is correct.

The post on Three Surfaces of an iOS App covers the broader pattern: the visible UI is one surface, App Intents are another, accessibility is the third. Eye Tracking participates in the visible UI surface; getting that surface right is what enables Eye Tracking, Switch Control, VoiceOver, and Voice Control simultaneously.

Music Haptics: The Audio-To-Haptic Bridge

Music Haptics translates music playback into Taptic Engine vibrations synchronized to the audio³. The feature is opt-in per-user (Settings > Accessibility > Music Haptics) and works for any music app that integrates the API correctly, not just Apple Music.

The developer surface lives in MediaAccessibility.framework’s MAMusicHapticsManager (iOS 18+). A music app integrates Music Haptics through three steps:

Declare support in Info.plist. Add the MusicHapticsSupported key with value YES. The system uses this to know whether the app participates in Music Haptics rendering.
Become the active Now Playing app. The app must publish playback metadata through MPNowPlayingInfoCenter.default().nowPlayingInfo and own the now-playing audio session. The system needs a known active Now Playing source to drive haptic synthesis.
Provide an ISRC for the playing track. The MPNowPlayingInfoPropertyInternationalStandardRecordingCode key (the International Standard Recording Code) lets the system look up the haptic track that pairs with the audio. Apple maintains a haptic asset library keyed by ISRC; tracks without an ISRC do not get haptics, but the rest of the now-playing integration still works.

import MediaPlayer
import MediaAccessibility

// Info.plist: MusicHapticsSupported = YES (boolean)

let info: [String: Any] = [
    MPMediaItemPropertyTitle: track.title,
    MPMediaItemPropertyArtist: track.artist,
    MPNowPlayingInfoPropertyInternationalStandardRecordingCode: track.isrc,
    // ... other now-playing properties
]
MPNowPlayingInfoCenter.default().nowPlayingInfo = info

The integration applies to any music app: a streaming client built on AVAudioEngine, a DJ app with custom decoders, a music-learning app with sample playback. The constraint is the ISRC and the active Now Playing role, not the underlying audio API. Apps that don’t have ISRCs (user-uploaded music with no metadata, generative music) simply don’t get haptics; the rest of the playback integration is unaffected.

For apps in adjacent spaces (rhythm games, music visualizations, sound-effects engines), the audio is not what Music Haptics is designed for. Those apps reach for CHHapticEngine directly with hand-authored haptic patterns synchronized to their audio source.

Vocal Shortcuts: Where Accessibility Meets App Intents

Vocal Shortcuts let users assign custom voice phrases to Siri Shortcuts, including those backed by third-party AppIntent types⁵. A user can configure “Marker” to trigger an AddTodoIntent registered by a to-do app; saying “Marker” wherever the user is, without invoking Siri’s wake phrase, triggers the intent.

The integration uses the App Intents framework the cluster has covered extensively, with one structural piece that’s easy to miss: the app must declare an AppShortcutsProvider that exposes AppShortcut entries with explicit phrases. A bare AppIntent exists in the system but is only invokable through the Shortcuts editor, where the user manually assembles a Shortcut. An AppShortcutsProvider registers system-visible shortcuts the user can immediately assign to a Vocal Shortcut, the Action Button, Siri, or Spotlight.

struct TodoShortcuts: AppShortcutsProvider {
    static var appShortcuts: [AppShortcut] {
        AppShortcut(
            intent: AddTodoIntent(),
            phrases: [
                "Add a todo in \(.applicationName)",
                "\(.applicationName) marker"
            ],
            shortTitle: "Add Todo",
            systemImageName: "checkmark.circle"
        )
    }
}

The phrases array is what the system surfaces to Siri and to Vocal Shortcuts. With the provider in place, the App Intent is immediately eligible for voice activation. Without it, the intent works through manual Shortcuts setup, but the path is longer and many users never reach it.

The pattern compounds with App Intents and App Intents vs MCP Tools. An App Intent that earns its place in the user’s Apple Intelligence surface, paired with an AppShortcutsProvider that declares how the user invokes it, also earns its place as a Vocal Shortcut target. The cluster’s argument that App Intents are the cross-system contract for “what an app can do” applies here: Vocal Shortcuts are another consumer of that same contract.

The Cross-Cutting Pattern: Standards Are The Integration

The accessibility features above share a structural property: each one is built on top of standards apps should already meet, with a small opt-in API surface for cases where the app must explicitly cooperate (Personal Voice authorization, Music Haptics through MPMusicPlayerController).

The implication for development teams: accessibility work is not a separate workstream done after the app ships. The app’s accessibility labels, hit targets, focus order, and standard system API usage are what make Eye Tracking work, Live Speech route correctly, Music Haptics activate, and Vocal Shortcuts surface the right intents. Apps that treat accessibility as a checkbox at the end of the cycle ship features that work for VoiceOver but not for Eye Tracking, or that route audio in ways Live Speech can’t follow.

The cluster’s What I Refuse to Write About post argues for refusal as a positioning move. Accessibility refusals are the inverse: not “I refuse to add this,” but “I refuse to ship something that fails the standards every iOS app should already meet.”

When Apps Need Custom Accessibility Code

Three cases where the standards-following pattern doesn’t cover everything:

Custom drawing surfaces. A drawing app, a chart, a custom-rendered game UI bypasses the SwiftUI/UIKit accessibility tree. The app must build its own accessibility tree using UIAccessibilityCustomAction, UIAccessibilityElement, and explicit accessibility properties for each meaningful element. Eye Tracking, VoiceOver, and Switch Control all rely on the accessibility tree being populated.

Real-time gestural interactions. A game with continuous gesture input (drawing, drag-to-aim) does not map naturally to dwell-based or switch-based input. The right approach is to provide alternative control schemes (button-based input as an option) rather than to fight the accessibility system.

Accessibility-specific features. AAC apps, voice-augmentation apps, sign-language interpretation apps. These apps are accessibility products in their own right and integrate deeply with system frameworks (Personal Voice, Speech framework, Vision framework for sign language detection). The integration work is real and intentional.

What This Pattern Means For iOS 26+ Apps

Three takeaways.

Accessibility participation is mostly standards-following, not feature-building. Apple has been moving accessibility into the platform layer. The work is making sure your app meets the standards Eye Tracking, Switch Control, VoiceOver, and Voice Control all rely on: proper labels, hit targets, focus order, system audio routing.
Personal Voice integration is sensitive. If your app has a real AAC use case (assistive communication, voice augmentation, accessibility tooling), Personal Voice authorization is the right integration. For general-purpose apps, requesting Personal Voice authorization is more likely to confuse users than to help them.
App Intents are accessibility infrastructure. A clean AppIntent is automatically eligible for Vocal Shortcuts, gets an accessible UI surface through Shortcuts, and integrates with the system’s voice-driven and switch-driven control modes. The cluster’s argument for App Intents adoption applies to accessibility too.

The full Apple Ecosystem cluster: typed App Intents; MCP servers; the routing question; Foundation Models; the runtime vs tooling LLM distinction; three surfaces; the single source of truth pattern; Two MCP Servers; hooks for Apple development; Live Activities; the watchOS runtime; SwiftUI internals; RealityKit’s spatial mental model; SwiftData schema discipline; Liquid Glass patterns; multi-platform shipping; the platform matrix; Vision framework; Symbol Effects; Core ML inference; Writing Tools API; Swift Testing; Privacy Manifest; what I refuse to write about. The hub is at the Apple Ecosystem Series. For broader iOS-with-AI-agents context, see the iOS Agent Development guide.

FAQ

Do I need to write any code to support Eye Tracking?

For most apps, no. Eye Tracking works automatically with any UI that follows standard accessibility conventions: proper hit targets (44pt minimum), useful accessibility labels, logical focus order, and standard system controls. Apps that draw their own UI (custom views, games, charts) need to populate the accessibility tree explicitly using UIAccessibilityElement or SwiftUI’s accessibility modifiers; that work is also what makes the app work for VoiceOver and Switch Control.

Can I use Personal Voice in a general-purpose voice-over app?

The system permits it via AVSpeechSynthesizer.requestPersonalVoiceAuthorization(), but Apple’s guidance and the App Store review process emphasize Personal Voice for assistive contexts (AAC, augmentative and alternative communication). General-purpose voice-over apps requesting Personal Voice authorization face two challenges: users are unlikely to grant authorization, and review may push back on the request as inappropriate use. If your use case is genuinely assistive, the integration is right; if it’s general-purpose narration, system voices are the right tool.

What’s the difference between Live Speech and Personal Voice?

Personal Voice is the on-device synthesized voice that sounds like the user. Live Speech is the system feature that lets the user type and have the device speak (using either a system voice or their Personal Voice). They are complementary: Personal Voice provides the voice, Live Speech provides the typing-to-speech UI. Apps integrate Personal Voice through the Speech Synthesizer API; Live Speech is invisible to apps and operates at the OS level.

How do I add Music Haptics to a music app that uses `AVAudioEngine`?

You can. Music Haptics is not scoped to a specific playback API. The integration is: add MusicHapticsSupported = YES to Info.plist, publish the playing track’s metadata through MPNowPlayingInfoCenter.default().nowPlayingInfo (so the system recognizes your app as the active Now Playing source), and include MPNowPlayingInfoPropertyInternationalStandardRecordingCode with the track’s ISRC. The system handles haptic synthesis from there. Tracks without ISRCs do not get haptics, but the rest of the now-playing integration works normally.

What’s the App Intent design that gives the best Vocal Shortcuts experience?

Four principles. First, declare an AppShortcutsProvider for the app and register AppShortcut entries for the intents you want voice-accessible. Without the provider, the intent only reaches Vocal Shortcuts via manual Shortcuts editing. Second, the title and shortTitle should be short verb phrases (“Add Todo,” “Start Timer”) rather than descriptions. Third, parameters should be optional or have defaults so the user can invoke the intent without specifying every field. Fourth, the description should be a single clear sentence explaining the intent’s effect; this surfaces as context when the user picks a phrase to assign.

References

Apple Developer: Extend Speech Synthesis with personal and custom voices (WWDC 2023 session 10033). The introduction of requestPersonalVoiceAuthorization and the .isPersonalVoice voice trait. ↩↩
Apple Newsroom: Apple announces new accessibility features, including Eye Tracking. The iOS 18 accessibility feature announcement covering Eye Tracking, Music Haptics, and Vocal Shortcuts. ↩↩
Apple Developer Documentation: MAMusicHapticsManager in MediaAccessibility.framework, the iOS 18+ Music Haptics integration surface. The Info.plist MusicHapticsSupported key, MPNowPlayingInfoCenter active source role, and MPNowPlayingInfoPropertyInternationalStandardRecordingCode together enable haptic synthesis for any music app that publishes the right metadata. ↩↩
Apple Support: Use Live Speech on your iPhone, iPad, and Mac. The user-facing Live Speech setup guide; the feature operates at the system level without third-party app integration. ↩
Apple Developer Documentation: App Intents. The framework that powers Vocal Shortcuts, Spotlight integration, and Apple Intelligence’s action surface for third-party apps. ↩

Accessibility As Platform: Personal Voice, Live Speech, Eye Tracking, Music Haptics

Personal Voice: The Authorization Pattern

Live Speech: The Zero-Integration System Feature

Eye Tracking: The Standards-Following Feature

Music Haptics: The Audio-To-Haptic Bridge

Vocal Shortcuts: Where Accessibility Meets App Intents

The Cross-Cutting Pattern: Standards Are The Integration

When Apps Need Custom Accessibility Code

What This Pattern Means For iOS 26+ Apps

FAQ

Do I need to write any code to support Eye Tracking?

Can I use Personal Voice in a general-purpose voice-over app?

What’s the difference between Live Speech and Personal Voice?

How do I add Music Haptics to a music app that uses `AVAudioEngine`?

What’s the App Intent design that gives the best Vocal Shortcuts experience?

References

Related Posts

HealthKit + SwiftUI on iOS 26: Authorization, Sample Types, and Cross-Platform Patterns

Liquid Glass in SwiftUI: Three Patterns From Shipping Return on iOS 26

The Design Engineer's Agent Stack

Personal Voice: The Authorization Pattern

Live Speech: The Zero-Integration System Feature

Eye Tracking: The Standards-Following Feature

Music Haptics: The Audio-To-Haptic Bridge

Vocal Shortcuts: Where Accessibility Meets App Intents

The Cross-Cutting Pattern: Standards Are The Integration

When Apps Need Custom Accessibility Code

What This Pattern Means For iOS 26+ Apps

FAQ

Do I need to write any code to support Eye Tracking?

Can I use Personal Voice in a general-purpose voice-over app?

What’s the difference between Live Speech and Personal Voice?

How do I add Music Haptics to a music app that uses AVAudioEngine?

What’s the App Intent design that gives the best Vocal Shortcuts experience?

References

Related Posts

HealthKit + SwiftUI on iOS 26: Authorization, Sample Types, and Cross-Platform Patterns

Liquid Glass in SwiftUI: Three Patterns From Shipping Return on iOS 26

The Design Engineer's Agent Stack

How do I add Music Haptics to a music app that uses `AVAudioEngine`?