The terms "closed captions" and "subtitles" get used interchangeably almost everywhere — YouTube labels both as "CC", streaming services bundle them under one menu, and most viewers have never thought about the difference. They aren't the same thing. They serve different audiences and, in commercial or public-sector contexts, they're held to different legal standards.
The short version: subtitles are translation, captions are accessibility. The longer version is what this guide covers — what each contains, who it serves, when it's legally required, and why the auto-captions on your last YouTube upload almost certainly aren't real closed captions even though the platform calls them that.
Subtitles: dialogue translation for hearing audiences
Subtitles came out of cinema. The original problem was simple: a Swedish-language film needed to play in France, dubbing was expensive, so the dialogue got translated and printed along the bottom of the frame. The audience could hear the original audio — music, sound effects, the actors' tone — they just couldn't understand the words. Subtitles solved exactly that one problem.
That's still what subtitles do. A subtitle track contains:
- Spoken dialogue, transcribed or translated. Whatever the actors are saying, in whatever language the viewer reads.
- On-screen text that's narratively important. Signs, letters, text messages — anything the visual would otherwise lose someone who doesn't read the original-language script.
A subtitle track does not contain sound effects ([door slams]), music cues ([ominous music]), speaker IDs beyond what's visually obvious, or non-verbal vocalizations like laughter. The viewer hears those.
The audience is a hearing viewer who doesn't speak the source language — or a hearing viewer in a noisy environment (gym, bar, office) reading by choice. Subtitles assume a working ear and supply only the linguistic gap.
Most foreign-language films released theatrically use subtitles. Most Netflix titles in their original language with a "Subtitles" track use subtitles. Most YouTube auto-generated transcripts that appear via the "CC" button are functionally subtitles, regardless of the label.
Closed captions: full audio transcription for accessibility
Closed captions were invented for a different audience and a different problem. The U.S. Public Broadcasting Service piloted closed captioning in 1972 specifically to give deaf and hard-of-hearing viewers access to broadcast television. The premise was the inverse of subtitles: assume the viewer can see fine but can't hear at all, and transcribe the full audio track.
A closed caption track contains everything a subtitle track does, plus:
- Sound effects, in brackets.
[door slams],[phone ringing],[footsteps approaching]— anything audible that carries narrative weight. - Music cues.
[ominous music],[upbeat 80s pop], song titles when relevant:[♪ "Don't Stop Believin'" — Journey ♪]. - Speaker identification when the speaker isn't visually obvious — off-screen voice, character with their back to camera, multiple people talking at once:
Sarah: Wait, don't go. - Tone and delivery cues where they change meaning:
[whispering],[sarcastic],[shouting]. - Non-verbal vocalizations the soundtrack carries:
[laughs],[sighs],[groans].
The "closed" in closed captions means togglable — the viewer turns them on or off (see the next section for how that contrasts with open captions). Captions assume zero audio access and rebuild the soundtrack in text.
The audience is broader than just deaf and hard-of-hearing viewers. Captions also serve sound-off environments — most social media viewing, with Meta reporting 85%+ silent playback on autoplay — language learners, and anyone who processes written information faster than spoken. But the design target, and the legal one, is accessibility.
Open captions vs closed captions
A practical distinction sits inside the captioning world: open vs closed.
Closed captions are a separate text track delivered alongside the video. The viewer toggles them on or off in the player. Files are external (.vtt, .srt, .ttml, .scc) and the player overlays them at runtime. This is the default for broadcast TV, streaming, and modern web video.
Open captions are burned directly into the video frame as pixels. They cannot be turned off. The video and the caption are a single rendered file.
Closed captions win on flexibility. The viewer chooses whether to display them, what language they're in if multiple tracks ship, what color and size they prefer. A typo fix is one text edit, not a re-encode. Translation is an additional track, not a re-render. High-contrast captions and larger fonts are the player's job.
Open captions win on guarantee. They render on every platform, regardless of player support or whether the viewer knows the toggle exists. This matters for short-form social — TikTok, Reels, YouTube Shorts — where most playback is silent and many viewers never tap "CC" even when available. It also matters for archival deliveries to platforms with weak caption support.
Default closed for accessibility-grade work (editable, respects user preferences); default open for short-form social (the only way to guarantee captions appear). Many creators ship both — open burns on the social cut, closed track on the long-form version.
Legal and platform requirements
The legal floor for captions has tightened steadily, and not noticing has become expensive.
In the United States, the Americans with Disabilities Act (ADA) has been used since the late 2010s to require captions on public-facing video from any "public accommodation" — after a string of court decisions, that includes almost every commercial website with video. The DOJ formalized digital accessibility requirements for state and local government in 2024, requiring WCAG 2.1 AA conformance — which mandates captions (not just subtitles) for all pre-recorded audio in synchronized media. The CVAA has required captions on internet video that previously aired on US TV since 2012. Broadcast TV has been fully captioned by FCC mandate since 2006.
In the European Union, the European Accessibility Act (EAA) took effect in June 2025 and applies to a wide swath of digital services including commercial audiovisual content. The practical floor: captions for live and pre-recorded video, plus audio description, on most commercial services. The AVMSD has required progressive captioning targets from broadcasters and on-demand services since 2018.
Platform requirements layer on top of legal ones. Broadcast TV in the US, UK, and most of the EU requires captions on essentially all programming. Netflix, Disney+, Amazon, and Apple TV+ require captions at ingest. YouTube provides auto-generated captions but most experts (and most accessibility lawsuits) consider them insufficient — they routinely omit speaker IDs, sound effects, and music cues. TikTok, Instagram, and YouTube Shorts have built-in auto-caption tools but place the accuracy burden on creators.
Rule of thumb: if the video is hosted by a commercial entity, a public-sector organization, or any federally-funded educational institution, real closed captions (not auto-generated subtitles) are the legal floor. Human-quality captioning runs $1–$3 per video minute from vendors, less for edits to auto-generated drafts.
How to produce each
The production pipeline differs more than the file format.
For subtitles:
- Generate or obtain a dialogue transcript. Auto-generated transcripts (YouTube's, Whisper, AssemblyAI, Supadata) are usually accurate enough after a 5–10% cleanup pass.
- Translate if needed. Machine translation handles the first draft; a human pass catches idioms. DeepL outperforms Google Translate on most language pairs.
- Time and segment cues to dialogue. Auto-generated tracks usually get this right; Subtitle Edit, Aegisub, or your editor's caption panel handles fixes.
- Export as SRT or VTT. See the SRT vs VTT guide for which format your destination needs.
For pulling subtitles from existing YouTube videos, the download YouTube subtitles how-to covers extraction; the translate YouTube transcript how-to covers translation.
For closed captions, the workflow has more steps:
- Generate the dialogue transcript — same as above.
- Add sound effect annotations by re-watching and inserting
[door slams],[ominous music], etc., wherever non-dialogue audio carries narrative weight. This is the step auto-captions skip. - Add speaker IDs wherever the speaker isn't visually obvious — off-screen voices, characters with backs to camera, multi-person scenes.
- Add tone and delivery cues sparingly —
[whispering],[shouting]— where they change meaning. - Time everything to audio events, not just dialogue. A
[door slams]cue should appear when the door slams. - Verify against WCAG criteria — completeness, accuracy, synchronization, equivalence to the audio experience.
- Export as VTT (web), SCC (broadcast), or TTML (OTT/streaming).
The key thing: YouTube auto-captions are subtitles, not closed captions. They contain dialogue only — no sound effects, no speaker IDs, no music cues. Useful as a draft for caption work; treating them as compliance-grade closed captions is the kind of decision that ends in an accessibility lawsuit. Vendors like 3Play Media, Rev, and CaptionMax exist precisely because that gap is meaningful production work.
For the underlying file formats, the subtitle file formats guide covers VTT, SCC, TTML, and the rest.
Frequently asked questions
Are YouTube auto-captions closed captions? No. They're auto-generated subtitles labeled as "CC" by the platform. Real closed captions include sound effects, music cues, and speaker identification — none of which YouTube's auto-caption system inserts. For ADA, EAA, or WCAG 2.1 AA compliance, auto-captions alone are usually not sufficient. They're a useful first draft, not a deliverable.
Can I convert subtitles to closed captions? Yes, manually. Open the subtitle file in a caption editor (Subtitle Edit, Aegisub, or your video editor's caption panel), re-watch the video, and add the missing pieces — speaker IDs, sound effect annotations, music cues, and tone descriptors. There's no automatic conversion because the missing information isn't in the dialogue track. Plan on 30–60 minutes of editing per finished video minute for first-draft caption work.
Which one ranks better in the YouTube algorithm? Neither. YouTube uses caption text for transcript indexing and search relevance, but it doesn't differentiate between subtitles and closed captions as ranking signals — both feed the same searchable text. What matters more is having any caption track at all and having it be accurate. Closed captions help retention indirectly, because silent-playback viewers are likelier to keep watching when captions are present.
What about multilingual content?
Subtitles handle this natively — one track per language, viewer picks. Captions can also be multilingual; the convention is that each translated caption track preserves all the non-dialogue annotations of the source. A French closed caption of an English film should flag [musique inquiétante] where the original flags [ominous music]. For short-form social, open captions in the dominant language plus a closed caption track in additional languages is the common shape.
Next steps
If you're producing video for any commercial or public-facing context, default to real closed captions, not auto-generated subtitles. The legal floor is moving that direction and the production cost gap is small once a workflow is set up.
For format choices, see the SRT vs VTT guide and the broader subtitle file formats guide. For pulling existing captions out of YouTube as a starting point, the download YouTube subtitles how-to covers extraction. For multilingual work, the translate YouTube transcript how-to covers the translation step.