SRT vs VTT: Subtitle Format Comparison and When to Use Each

Two subtitle formats dominate everything else combined: SRT and VTT. If you've ever downloaded captions from YouTube, exported subtitles from a video editor, or attached a <track> element to an HTML5 video, you've used one or both. They look almost identical at a glance — and almost nobody can articulate the actual differences without looking them up.

The choice matters more than it seems. Hand a VTT file to an old DVD authoring tool and it'll silently fail. Embed an SRT into a <track> element and modern browsers will refuse to render it cleanly. Most of the time you can use either; some of the time you can't. This guide walks through what each format actually is, where they differ, and a practical decision framework for picking one.

What is an SRT file?

SRT stands for SubRip Subtitle. It originated in the late 1990s as the output format of SubRip, a Windows tool for ripping subtitles off DVDs. It became a de facto standard simply because nothing else was as easy to write, parse, or share — a plain UTF-8 text file with no spec ceremony.

The structure is dead simple. Each subtitle is a numbered "cue" block:

1
00:00:00,500 --> 00:00:03,000
Welcome to the demo.

2
00:00:03,500 --> 00:00:06,000
Let's get started.

Three lines per cue: a sequential number, a timestamp range using HH:MM:SS,mmm --> HH:MM:SS,mmm (note the comma before milliseconds), and the caption text. Blank line between cues. That's the whole format.

Strengths:

Weaknesses:

For most of the work covered in the YouTube transcripts guide, SRT is the right answer. The YouTube-to-SRT how-to walks through extracting an SRT directly from a video URL.

What is a VTT file?

VTT stands for WebVTT — Web Video Text Tracks. It was designed by the WHATWG and W3C as the official subtitle format for HTML5 video, finalized as a W3C recommendation in 2019 and stable since. Where SRT is a community-grown convention, VTT is a real spec.

A minimal VTT file looks like this:

WEBVTT

1
00:00:00.500 --> 00:00:03.000
Welcome to the demo.

2
00:00:03.500 --> 00:00:06.000
Let's get started.

Three differences from SRT jump out immediately: the file must start with the literal WEBVTT header, timestamps use a period before milliseconds (not a comma), and the cue identifier line is optional (you can omit the 1, 2, etc. — VTT works without them).

That's the basic shape. The interesting part is what VTT lets you add on top:

Strengths:

Weaknesses:

For HTML5 web video and any workflow involving the browser's <track> element, VTT is the right choice. Detail on accessibility-specific use cases lives in the closed captions vs subtitles guide.

Side-by-side comparison

| Feature | SRT (SubRip) | VTT (WebVTT) | | ---------------------------- | --------------------------------------- | ------------------------------------------------------------- | | File extension | .srt | .vtt | | MIME type | application/x-subrip (de facto) | text/vtt (registered) | | Header line | None | WEBVTT required at top | | Timestamp punctuation | Comma: 00:00:01,500 | Period: 00:00:01.500 | | Cue identifier | Required (sequential integer) | Optional (any string) | | Encoding | UTF-8 (modern); legacy code pages exist | UTF-8 required | | Inline styling | Limited (non-standard <b>, <i>) | CSS via ::cue, <c.class>, voice tags | | Positioning | None | Cue settings (line, position, align, size) | | Metadata / regions | None | NOTE comments, regions, chapter markers | | Speaker tagging | None (free-text in caption only) | <v Speaker> voice tags | | Track kinds | N/A | subtitles, captions, descriptions, chapters, metadata | | HTML5 <track> support | Not in spec (some browsers tolerate it) | Native, official | | Video editor support | Universal | Limited — most NLEs prefer SRT | | Streaming platform input | Universal (YouTube, Vimeo, OTT) | Accepted by most modern platforms; some still convert to SRT | | Spec status | De facto, no formal spec | W3C Recommendation (2019) |

When to use SRT vs VTT

Two questions decide it.

Is this for a web page using HTML5 <video> and <track>? Use VTT. The <track> element officially expects WebVTT; SRT works in some browsers via leniency but isn't guaranteed and breaks features like accessibility metadata and chapter markers.

Anything else — a video editor, desktop player, YouTube/Vimeo upload, client deliverable? Use SRT. Universal compatibility, trivial to edit, supported everywhere.

That handles 95% of cases. Edge cases:

For other subtitle formats beyond these two — TTML, SCC, SBV, ASS — see the subtitle file formats guide.

Converting between SRT and VTT

Conversion is mostly mechanical. SRT → VTT:

  1. Add WEBVTT followed by a blank line at the top.
  2. Replace every comma in timestamps with a period: 00:00:01,500 becomes 00:00:01.500.
  3. (Optional) Strip the numeric cue identifiers.

VTT → SRT is the reverse:

  1. Remove the WEBVTT header and any NOTE/region/styling blocks.
  2. Replace periods with commas in timestamps.
  3. Add sequential integer cue identifiers if missing.
  4. Strip cue settings (line:0 position:50%) — SRT doesn't support them.
  5. Strip inline VTT tags (<v>, <c>) — they'll render as literal text in SRT players.

For one-off conversions, subtitle editors (Subtitle Edit, Aegisub) handle it via File → Save As. For bulk work, the easier path is to extract directly in the format you need — the download YouTube subtitles how-to covers picking output format upfront, and the YouTube-to-SRT how-to covers SRT specifically. For programmatic conversion, a 20-line script in any language handles both directions — no library required.

Frequently asked questions

Can I use an SRT file inside an HTML5 <video> tag? Technically, some browsers tolerate it — Chrome and Edge will render an SRT referenced from a <track> element if the MIME type and extension cooperate. Officially, no — the HTML5 spec only recognizes WebVTT for <track>. Don't rely on SRT for web video. Convert to VTT (it's a punctuation change and a header line) and you'll get reliable cross-browser behavior plus access to positioning, styling, and accessibility features.

Why are timestamps formatted differently between the two? Historical accident. SRT inherited the comma decimal separator from European locales (where SubRip originated). WebVTT followed JavaScript and most computing conventions by using a period. Neither is "correct" — they're just incompatible, and the first thing every conversion script handles.

Which format does YouTube use internally? YouTube's internal caption storage is proprietary — neither pure SRT nor pure VTT — but the platform exposes captions in both formats (and several others, including JSON3 and TTML) via its timedtext endpoint. Most extraction tools, including SubExtract, request the format you ask for and convert if needed. From an end-user perspective, you can pull either SRT or VTT for the same video; both round-trip cleanly.

Do screen readers care about SRT vs VTT? Yes, for accessibility-grade work. VTT carries explicit metadata that assistive technology can use: track kind (captions vs descriptions), voice tags identifying speakers, and chapter markers. SRT is plain text — a screen reader will read whatever's in the cue, but has no way to distinguish a sound effect label from dialogue, or to know which speaker is talking unless the caption text itself spells it out. For WCAG-compliant captioning and audio description tracks, VTT is the standard.

Next steps

If you came here trying to decide which format to extract from a YouTube video, the answer is usually SRT — it's universal, it edits cleanly, and you can convert to VTT in seconds if a web project later needs it. The video captions tool outputs both; the YouTube-to-SRT how-to and the download YouTube subtitles how-to cover the extraction step. For the broader picture — what a transcript actually is, all the formats you might encounter, and how to use them in real workflows — start with the YouTube transcripts guide.

Related tools & guides