Steps to convert a YouTube transcript to plain text

1. Get the YouTube URL

Copy the video URL from your browser's address bar. Regular videos, Shorts, and live replays all work. Private and age-gated videos won't extract.

2. Paste into a transcript extractor

Drop the URL into SubExtract's video captions tool and click Extract. The tool fetches the captions track from YouTube — auto-generated or human-uploaded, whichever exists.

3. Choose plain text output

Switch the output mode to plain text (sometimes labeled "TXT" or "no timestamps"). This strips the SRT block numbering and timestamps, leaving only the spoken content as continuous prose.

4. Copy or save as TXT

Click Copy to grab the text into your clipboard, or Download to save a .txt file. Use it directly in a doc, paste it into ChatGPT or Claude, or import it into your notes app.

Plain text vs SRT — when to use which

YouTube captions can be exported in two main formats. They serve different purposes.

Plain text (.txt) strips all timing metadata. You get continuous prose, paragraph-style. Use this when:

Reading or skimming a video as text
Feeding the content to an LLM (ChatGPT, Claude, Gemini)
Drafting a blog post, summary, or show notes
Quoting in research or journalism
Translating long-form content

SRT (.srt) preserves block numbering and HH:MM:SS,mmm timestamps. Use this when:

Adding subtitles to a video editor (Premiere, DaVinci, Final Cut)
Loading captions into a video player (VLC, MPV, web HTML5)
Re-uploading captions to YouTube
Translating subtitles for a video while keeping sync

Quick rule: if you're going to read or process the words, use plain text. If you're going to display them on top of video, use SRT.

Cleaning up the output

Plain text from auto-captions usually needs minor edits before publishing.

Music and sound markers — auto-captions may include [Music], [Applause], or [Laughter]. Find-and-replace these out if you want pure prose.
Sentence breaks — auto-captions don't punctuate well. Run the text through an LLM with a prompt like "fix punctuation and paragraph breaks, keep wording verbatim" if you need readable prose.
Speaker labels — only present if the original captions had them. Auto-generated captions don't separate speakers.

Common issues and fixes

Output still has timestamps — make sure you've selected plain text mode, not SRT. Some tools export both; pick the right tab before downloading.

Encoding looks wrong (mojibake / accented characters broken) — save as UTF-8. Most extractors default to UTF-8; if you opened the file in Excel or Notepad and see garbled accents, re-open with "Open as UTF-8" or use a modern editor (VS Code, Sublime).

One giant paragraph with no breaks — auto-captions are streamed without paragraph breaks. Either accept it for LLM input (LLMs don't care), or paragraph it yourself / via an LLM pass.

Frequently asked questions

Why use plain text instead of SRT? Plain text is faster to read, easier to paste into other tools, and doesn't confuse LLMs with timestamp noise. SRT is only useful when you need timing data — for video editing or subtitle display.

I downloaded the file and the accented characters look broken — what happened? That's an encoding mismatch. The file is UTF-8 but your viewer is interpreting it as Windows-1252 or similar. Open the file in VS Code, Sublime, or any modern editor and it will display correctly. Avoid opening UTF-8 text files in legacy Notepad on older Windows.

Can I get the transcript with line breaks per sentence? Most tools return continuous prose. To get one sentence per line, paste into an LLM with the prompt "split into one sentence per line, keep wording exact." Or use a regex like s/\. /.\n/g in a text editor with regex find-and-replace.

Can I edit the transcript before saving? Yes — when SubExtract returns the result, the text is shown in a panel where you can paste, edit, and re-copy. For bulk find-and-replace (e.g. removing [Music] markers), edit after download in any text editor.

Does plain text preserve formatting from human-uploaded captions? Mostly. If the creator added speaker labels (>> John:) or italic markers, those carry over as plain text. SRT-specific tags (like <i> or position codes) are stripped when converting to TXT.

How to Convert a YouTube Transcript to Plain Text