If you work with video long enough, you'll meet a subtitle file you don't recognize. Someone hands you a .ass and your editor refuses it. A vendor sends a .ttml and asks if you can ingest it. YouTube exports a .sbv and you wonder why it isn't an SRT. The list of subtitle formats is longer than most people realize, but three cover almost everything and a handful of others cover the rest.
This guide is reference-style. Read it top to bottom for context, or skim to the format on your desk.
Why so many subtitle formats exist
Subtitle formats accreted over thirty years of video tech, each wave solving a problem the previous one couldn't.
The first wave came out of the DVD era in the late 1990s. SubRip (producing SRT files) ripped subtitles off DVDs into plain text — no styling, no positioning, just numbered cues with timestamps. It became the universal format because it was readable in Notepad and trivial to author.
The second wave was the web. As HTML5 video replaced Flash, browsers needed a native subtitle format that could carry positioning, styling, and accessibility metadata. The WHATWG and W3C designed WebVTT — superficially like SRT but with cue settings, CSS hooks, voice tags, and track-kind metadata.
The third wave came from creative communities. Anime fansub groups in the early 2000s wanted typeset karaoke, multi-color text, animation, and pixel-perfect positioning. SubStation Alpha (SSA) and its successor Advanced SubStation Alpha (ASS) were built for that — closer to a stylesheet than a caption file.
The fourth wave was broadcast and OTT compliance. Streaming platforms and broadcasters needed an XML-based exchange format with speaker IDs, sound effects, language tags, and multiple presentations. TTML and its profile DFXP filled that role. Legacy broadcast still uses SCC for CEA-608 closed captions.
Most of the time you only need to know three: SRT, VTT, and ASS.
SRT (SubRip)
SRT is the lingua franca. If a tool reads exactly one subtitle format, it's SRT. It originated as the output of SubRip, a Windows app for ripping DVD subtitles, and became a de facto standard because nothing matched it for simplicity.
The structure is dead simple — each subtitle is a numbered cue:
1
00:00:00,500 --> 00:00:03,000
Welcome to the demo.
2
00:00:03,500 --> 00:00:06,000
Let's get started.
Three lines per cue: a sequential integer, a timestamp range using HH:MM:SS,mmm --> HH:MM:SS,mmm (the comma before milliseconds is a European-locale legacy), and the caption text. Blank line between cues. UTF-8 in modern tools; older tools sometimes default to local code pages and produce mojibake on non-Latin scripts.
Where you'll see it: every desktop player (VLC, MPV, QuickTime), every editor (Premiere, Final Cut, DaVinci Resolve, CapCut), every upload pipeline (YouTube, Vimeo, OTT), every consumer device that supports external subtitles.
Strengths: universal compatibility, trivially editable, tiny files, no spec ceremony.
Weaknesses: no native styling (some players honor non-standard <i>, <b>, <u>, <font color> tags but it's inconsistent), no positioning, no metadata fields. If your work needs styled or positioned captions, SRT cannot carry them.
For the head-to-head, see the SRT vs VTT guide. For extracting a YouTube video directly to SRT, the YouTube-to-SRT how-to. For picking output format upfront, the download YouTube subtitles how-to.
VTT (WebVTT)
VTT is the HTML5-native subtitle format — the official format for the <track> element, finalized as a W3C Recommendation in 2019. Where SRT is a community convention, VTT has a real spec.
A minimal VTT file looks superficially like SRT:
WEBVTT
1
00:00:00.500 --> 00:00:03.000
Welcome to the demo.
2
00:00:03.500 --> 00:00:06.000
Let's get started.
Three differences from SRT: the file must start with the literal WEBVTT header, timestamps use a period before milliseconds (not a comma), and the cue identifier is optional. The interesting features are what VTT lets you add on top:
- Cue settings — directives after the timestamp position cues:
line:0 position:50% align:centerpins to the top, horizontally centered. - Styling —
::cuepseudo-elements in your stylesheet, or inline<c.classname>tags mapped to CSS classes. - Voice tags —
<v Speaker Name>Hello there</v>tags speakers for accessibility and assistive tech. - Metadata —
NOTEcomments, regions, chapter markers. - Track kinds —
subtitles,captions,descriptions,chapters,metadatavia the<track>element'skindattribute.
Where you'll see it: HTML5 <video>, browser-based players, accessibility-graded web content.
Strengths: native browser parsing, richer features than SRT, accessibility-aware, real spec.
Weaknesses: less universal in offline players and consumer editors — many older NLEs don't support VTT and require a converter. Pre-2019 systems frequently lack native support.
For the SRT-vs-VTT decision and conversion mechanics, see the SRT vs VTT guide.
ASS / SSA (Advanced SubStation Alpha)
ASS and its predecessor SSA are the styling-rich formats. Built by and for anime fansub communities in the late 1990s and early 2000s, and adopted by professional subtitling shops who need precise typography. Where SRT and VTT are caption files, ASS is closer to a stylesheet plus a script.
An ASS file is structured like an INI file:
[Script Info]
Title: Demo
ScriptType: v4.00+
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, ...
Style: Default,Arial,24,&H00FFFFFF, ...
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.50,0:00:03.00,Default,,0,0,0,,Welcome to the demo.
Dialogue: 0,0:00:03.50,0:00:06.00,Default,,0,0,0,,Let's get started.
Named styles at the top (font, size, colors, outline, shadow), then dialogue events that reference them. Inline override codes change properties mid-cue: {\an8} pins to top, {\b1} bolds, {\c&H00FF00&} recolors, {\fad(500,500)} adds a fade. Karaoke timing is built in.
Where you'll see it: anime fansubs (universally), professional subtitling for theatrical and streaming releases when typesetting matters, lyric videos. The reference authoring tool is Aegisub.
Strengths: pixel-perfect positioning, multi-color and multi-font typesetting, animation and karaoke effects, layered cues, named styles you can edit globally.
Weaknesses: almost no consumer-facing player or editor supports ASS natively without a plugin. Streaming platforms generally won't accept it as a sidecar — typeset ASS gets burned into the video before delivery. Authoring complex files requires specialist tooling.
If you receive ASS and need to use it in a context that doesn't support it, the path is either burn into the video (ffmpeg with the subtitles filter) or strip styling and convert to SRT (losing everything except text and timing).
Other formats: SBV, TTML, DFXP, SCC
Beyond the big three, you'll meet four more in narrower contexts.
SBV (SubViewer / YouTube) is YouTube's legacy upload format. Same general idea as SRT but without cue numbers and with a period in timestamps:
0:00:00.500,0:00:03.000
Welcome to the demo.
0:00:03.500,0:00:06.000
Let's get started.
You'll see SBV when downloading captions from YouTube Studio's legacy subtitle editor. YouTube also accepts and exports SRT and VTT, so SBV is rarely the right deliverable. Conversion to SRT is mechanical: add cue numbers, change the , between start and end times to -->, and swap the millisecond separator from period to comma.
TTML (Timed Text Markup Language) and its profile DFXP are the XML-based broadcast and OTT exchange standards. TTML carries everything: styled text, positioning, regions, multiple track presentations, language tags, role metadata. Verbose enough that nobody hand-edits it:
<tt xmlns="http://www.w3.org/ns/ttml">
<body>
<div>
<p begin="00:00:00.500" end="00:00:03.000">Welcome to the demo.</p>
<p begin="00:00:03.500" end="00:00:06.000">Let's get started.</p>
</div>
</body>
</tt>
You'll see TTML/DFXP delivering to OTT platforms (Netflix, Disney+, Hulu), in IMSC profiles for broadcast TV, and in Apple HLS. If you're not delivering to one of those, don't author TTML by hand — convert from SRT or VTT using ffmpeg, Subtitle Edit, or a vendor's ingest tool.
SCC (Scenarist Closed Captions) is the legacy broadcast format for CEA-608 closed captions — the line-21 captions baked into NTSC broadcast video. SCC files are hex-encoded byte streams paired with SMPTE timecode, unreadable to humans:
Scenarist_SCC V1.0
01:00:00:00 9420 9420 9470 9470 5765 6c63 6f6d 6520 ...
You'll meet SCC working with US broadcast television, archival video restoration, or accessibility compliance for over-the-air content. For modern web and streaming it's largely irrelevant — TTML and VTT have replaced it. Conversion requires a specialist tool (Subtitle Edit handles it; ffmpeg does not natively).
Closed captions and subtitles are not the same thing — captions include sound effects and speaker IDs for deaf and hard-of-hearing viewers, while subtitles are translation-only. The terms are used interchangeably casually but mean different things in compliance contexts.
Conversion table and tools
Most subtitle conversions are mechanical. Going from a richer format to a poorer one drops styling. Going the other way doesn't add information that isn't in the source.
| From → To | SRT | VTT | ASS | SBV | TTML | SCC | | ---------- | ----- | ----- | ------------------------- | ----- | ----- | ------------------ | | SRT | — | Easy | Easy (no styling) | Easy | Easy | Specialist | | VTT | Easy | — | Easy (limited styling) | Easy | Easy | Specialist | | ASS | Lossy | Lossy | — | Lossy | Lossy | Specialist + lossy | | SBV | Easy | Easy | Easy (no styling) | — | Easy | Specialist | | TTML | Easy | Easy | Lossy (limited mapping) | Easy | — | Specialist | | SCC | Easy | Easy | Easy (no styling) | Easy | Easy | — |
"Easy" means a free desktop tool or a ffmpeg one-liner handles it. "Lossy" means the destination can't carry features the source has — usually styling and positioning.
Tools:
- Subtitle Edit (Windows, free, open source) is the swiss-army knife. Opens essentially every format ever invented and saves to any other.
- Aegisub (cross-platform, free) is the standard for ASS authoring. Also reads and writes SRT, VTT, SSA.
ffmpeghandles SRT, VTT, ASS, SBV, and TTML. The classic commandffmpeg -i input.srt output.vttworks for most simple cases.- Online converters (subtitleconverter.io, gotranscript.com) handle one-off jobs. Fine for non-sensitive content; don't upload client deliverables to free web tools.
The most reliable conversion is no conversion — extract directly in the format you need. The download YouTube subtitles how-to covers picking the output format upfront.
Frequently asked questions
Which format does YouTube use?
YouTube accepts SRT, VTT, SBV, and several others on upload, and exposes captions in SRT, VTT, JSON3, and TTML through its timedtext endpoint. From an end-user perspective you can pull or push either SRT or VTT and both round-trip cleanly. SBV shows up only in the legacy subtitle editor.
What does my video editor accept? Almost every NLE — Premiere Pro, Final Cut Pro, DaVinci Resolve, CapCut, Avid — accepts SRT as the universal sidecar. Many accept VTT but support is inconsistent across versions. ASS and TTML are rarely supported natively. If in doubt, deliver SRT.
Is there a "best" format? Best per use case, not best overall. Maximum compatibility: SRT. HTML5 web video and accessibility: VTT. Typeset creative work (anime, lyric videos, theatrical): ASS. OTT and broadcast delivery: TTML or whatever the platform's ingest spec demands.
How do I add styling to subtitles?
Depends on the playback target. Web video: VTT with ::cue CSS or inline <c.classname> tags. Creative typesetting: ASS with named styles and override codes (Aegisub is the authoring tool). Burned-in styled subs in a delivered video: author ASS, then burn in with ffmpeg's subtitles filter. SRT cannot reliably carry styling — non-standard <b>, <i>, <u> work in some players and silently fail in others.
Next steps
If someone handed you a file you didn't recognize, the relevant section above tells you what tools open it and what to convert it to. If you're producing subtitles for a specific platform, work backward from the platform's accepted formats: web video means VTT, video editing and most uploads mean SRT, typesetting means ASS, OTT and broadcast mean TTML.
The two formats worth knowing in depth are SRT and VTT — together they cover almost everything that isn't a creative-typography or compliance-broadcast workflow. The SRT vs VTT guide walks through the head-to-head decision. The download YouTube subtitles how-to covers picking output format at extraction time, and the YouTube-to-SRT how-to covers SRT specifically. For the broader picture, start with the YouTube transcripts guide.