AI as an Accessibility Layer: What Works Today on macOS

MacSmithAI
Apr 17
8 min read

Apple's built-in accessibility tools on macOS Tahoe 26 are better than they've ever been. VoiceOver has been mature for over a decade. Live Captions transcribe audio system-wide. Magnifier finally came to the Mac and works with your iPhone or any USB camera. Accessibility Reader reformats any text in any app into something easier to read. If you have visual or hearing impairments and you use a Mac, the baseline is solid.

What AI adds is a layer on top of that baseline — specifically, the parts of the daily experience where the built-in tools hit their limits. VoiceOver can tell you an image exists, but not what's in it. Live Captions transcribe audio, but don't summarize or answer follow-up questions. The Magnifier shows you a document, but doesn't read it back with context. Those are AI gaps, and Claude plus a few macOS features can fill them without replacing anything Apple already does well.

What follows is the practical combination — what the daily workflow looks like when you stack AI on top of the native accessibility tools. Visual impairments first, then hearing, then the caveats. I'll be direct about where the tools fall short, because pretending AI is a finished accessibility solution isn't useful to anyone trying to decide what to adopt.

Start with the baseline

Before layering on AI, make sure the native tools are doing what they can. The features worth knowing about on current macOS:

VoiceOver (⌘F5 to toggle) — full-featured screen reader, works with most apps, supports Braille displays including the new Braille Access workspace
Live Captions — system-wide real-time transcription of any audio, now supporting English variants, Mandarin, Cantonese, Spanish, French, Japanese, German, and Korean
Magnifier — new on Mac in Tahoe, uses Continuity Camera or any USB camera to zoom into documents, whiteboards, or anything in the physical world
Accessibility Reader — systemwide reading mode that reformats text with custom fonts, colors, spacing, and spoken output. Launches from any app.
Voice Control — full Mac operation by voice, for users who can't or don't want to use keyboard and mouse

All of these run on-device where possible, are free, and don't require a subscription. If you haven't explored them recently, the Tahoe updates are worth a pass through System Settings → Accessibility. Some of what AI tools promise is already there natively.

For visual impairments

Describing what's actually in an image

This is where AI earns its place fastest. VoiceOver will tell you an image is present and read its alt text — if the alt text exists, which in the real world, it often doesn't. Screenshots from a colleague, charts in a PDF, photos on a web page, social media images. None of it has alt text.

Claude's desktop app has a feature that makes this a one-keypress workflow. Open the app, go to Settings → General → Desktop app, and enable the Quick access shortcut. Set it to something you'll remember — I use ⌃⌥C. Then, anywhere on your Mac, trigger the shortcut and you get both a text field and a screenshot crosshair. Drag a rectangle around the image you want described, type "describe this image in detail, including any text visible in it," hit Enter.

Claude handles this well — it reads text in the image, describes visual elements, and can answer follow-up questions about the same image. For reading a chart or a graph, it'll describe both the structure and what the data shows, which is meaningfully better than "image of a chart."

Grant Claude the three permissions it needs in System Settings → Privacy & Security: Screen Recording, Accessibility, and Speech Recognition. Without those, the shortcut won't work.

Reading documents and dense formatting

VoiceOver reads text well but can struggle with documents that have complex layouts — multi-column PDFs, tables, scanned documents, forms. The flow that works: use Accessibility Reader for anything that's text-based and native; drop into Claude for anything where the structure matters.

Claude's chat interface accepts PDFs and images directly. Drag a document in, then ask for what you want: "summarize the key decisions in this contract" or "read me the content of table 3 row by row" or "describe the layout of this form and tell me what each field asks for." For form filling in particular, asking Claude to describe a form's structure before you try to interact with it saves a lot of trial and error.

Writing when typing is slow or difficult

Claude's desktop app supports voice input via Caps Lock — hold it to dictate, release to stop. The transcription is accurate enough for most drafting work, and the resulting text goes straight into a Claude conversation, where you can then iterate on it with follow-up voice prompts.

This is a different workflow than macOS's system-wide dictation, which types your words into whatever app is focused. Claude's version is built around the conversational loop — you speak, the model responds, you speak again, eventually you copy the result to where it needs to go. For composing emails, messages, or longer documents when physical typing is difficult, this works better than dictating directly into Mail.

Building your own accessibility tools

One workflow worth mentioning, because it's become more realistic in the past year: using Claude Code to build small personal tools that solve accessibility problems the mainstream tools don't address. Joe McCormick, a visually impaired engineer, has written publicly about using Claude Code to build a Chrome extension that describes images in Slack threads, an AI-powered spell checker, and a link-summarization extension. All built for his own use, all working around specific limitations he ran into.

The relevant thing isn't any particular tool he built — it's that the cost of building custom accessibility software has dropped a lot. If there's a specific daily annoyance that no existing tool addresses, describing it to Claude Code and iterating on a solution is now a realistic afternoon project, not a month-long engineering effort.

For hearing impairments

Live Captions plus summarization

macOS Live Captions do the real-time transcription well. Where they fall short is in giving you the summary or the analysis afterward — you get a scroll of text, not "here's what the meeting decided."

The workflow that fills this gap: run Live Captions during the call or meeting, then copy the transcript at the end and paste it into Claude. Ask for whatever you need — a summary of decisions made, action items by person, a list of open questions, whatever. Claude handles long transcripts well (200K token context, which is roughly 150,000 words of transcript), so even hour-long meetings fit in a single prompt.

For recorded audio — voicemails, audio messages, podcasts — the flow is similar. Transcribe first using macOS's built-in transcription (in the Notes app or via Live Captions), then hand the text to Claude for analysis.

Understanding tone and context in writing

A specific use case that comes up for Deaf and hard-of-hearing users: when most of your interpersonal communication happens in text, the tone and subtext of written messages becomes disproportionately important. Was that email passive-aggressive or just terse? Is this Slack message genuinely friendly or performatively so?

Claude is reasonably good at this — paste a message in and ask "what's the tone here, and what's the most likely intent behind it?" It won't always be right, but it's a second perspective when you don't have a hearing friend to sanity-check. The same selection-to-Claude shortcut covered earlier in this series works well for this.

Drafting voice-quality text

Going the other direction — writing text that will be read aloud or presented in a context where hearing users will judge its "spoken" quality — Claude is useful for checking whether your draft reads naturally. "Does this sound like something a person would say out loud, or does it read as written?" It catches the things that are fine on paper but awkward spoken.

Where the AI tools fall short

Being honest about what doesn't work yet, because most posts on this topic skip this part:

Screen reader compatibility with AI tools is uneven. There's an open GitHub issue on Claude Code from users with NVDA and JAWS reporting that the CLI's streaming output, spinners, and special characters cause screen readers to freeze or mispronounce content. Gemini CLI has added a --screen-reader flag; Claude Code hasn't yet. If you're a screen reader user, the web and desktop app versions of Claude work much better than the CLI tools. That may change — but it's the state today.

Voice mode is mobile-only so far. Claude's conversational voice mode, which is the most accessibility-friendly interaction pattern, exists on iOS and Android but not on the macOS desktop app at time of writing. You can use voice input on Mac via Caps Lock dictation, but the back-and-forth spoken conversation model isn't there yet.

Real-time description doesn't exist. For users who'd benefit from continuous audio description of their screen — "a notification just appeared in the top right, from Slack, saying..." — there's nothing that does this well today. VoiceOver describes what you navigate to; AI tools describe what you explicitly point them at. A middle ground where an AI layer watches your screen and tells you about important changes as they happen is technically possible but nobody has shipped it.

Hallucination matters more here. When an AI describes an image wrong, a sighted user will usually catch it — they can glance at the image and see the mismatch. A blind user depending on the description has no second source. This is a real limitation. For anything where accuracy matters (medication labels, legal documents, contracts), the AI description is a starting point, not the final word. Pair it with a human sanity check when the stakes are high.

Captions are never 100%. Live Captions miss words. Names, technical terms, and anything spoken quickly or with accent are where errors cluster. A transcript with a few wrong words is fine for catching the gist of a meeting and problematic for legally or medically important conversations. Know which category a given use case falls into.

Setting it up in an hour

If you want to try this on your own Mac, here's the setup that covers the most use cases with the least friction:

Update to macOS Tahoe 26 if you haven't. The Tahoe accessibility updates are substantial — Magnifier, Accessibility Reader, expanded Live Captions languages, Braille Access. Don't layer AI on top of an older OS when the native baseline has improved this much.
Install the Claude desktop app from claude.ai/download. The web version works, but the desktop app has the Quick Entry shortcut, screenshot integration, and Caps Lock voice input that matter for the workflows above.
Grant permissions. System Settings → Privacy & Security → enable Screen Recording, Accessibility, and Speech Recognition for Claude.
Set a Quick Entry shortcut you'll remember. In Claude → Settings → General → Desktop app. I use ⌃⌥C; use whatever doesn't conflict with your other shortcuts.
Tune your native accessibility settings. Spend 20 minutes in System Settings → Accessibility. Enable the specific features you want, assign hotkeys for toggling VoiceOver and Zoom, try the new Accessibility Reader on a random web page to see if you like the default settings.
Decide what's on device vs. in the cloud. Apple's on-device features (VoiceOver, Magnifier, Live Captions transcription) don't send your data anywhere. Claude processes data on Anthropic's servers. For anything sensitive — medical, legal, financial — use the on-device tools. For everything else, either is fine.

That's enough to cover the image description, document reading, voice drafting, and transcript summarization workflows. Additional tools (Raycast AI Commands with {selection}, macOS Shortcuts, Claude Code for custom scripts) fit on top of this base, but the base alone gets you most of the value.

A note on agency

A lot of writing about AI and accessibility has a vaguely savior-y framing that I don't think holds up. The tools don't "empower" anyone — they handle some mechanical work that would otherwise be slow or impossible, and that's useful. The person using them still has the expertise about their own needs, the final judgment on whether a description or summary is right, and the responsibility for the work being done. An AI that describes an image isn't doing the work of understanding the image — the user is, using the description as input.

This matters in a practical way: the useful workflows are the ones where you'd catch the AI getting something wrong. If Claude misdescribes a chart and you ask a follow-up that clarifies, that's the loop working. If an AI drafts and sends an email before you've read it, you've handed over something you shouldn't have. The test I use: if the AI got this wrong, would I know? If yes, it's a good tool for the task. If no, tighten the loop before relying on it — especially for anything with real consequences.