Why this works
LLMs like ChatGPT and Claude can't watch videos. But they can read text. So the trick is converting the video's spoken content into clean text the LLM can process.
YouTube already has captions for most videos (auto-generated or creator-uploaded). Extracting those captions gives you the full video as text — which the LLM treats as direct context.
Step-by-step
1. Get the video URL
Copy the URL of the public YouTube video.
2. Extract the transcript
Paste the URL into SubExtract. Click Extract Transcript. Wait a few seconds.
3. Choose plain text (no timestamps)
For LLM use, plain text without timestamps is cleanest. Timestamps consume tokens and don't help most analysis tasks.
4. Copy the transcript
Click "Copy" to get the full text on your clipboard.
5. Paste into ChatGPT or Claude
Open ChatGPT (chatgpt.com) or Claude (claude.ai). Start a new conversation.
Paste the transcript first, then ask your question:
[paste transcript here]
Summarize this video in 5 bullet points.
Or:
[paste transcript here]
What are the key arguments the speaker makes?
Or:
[paste transcript here]
Fact-check the claim in this video about [topic]. Is it accurate?
Token cost considerations
LLMs charge per token. Rough rule: 1 minute of speech ≈ 150 words ≈ 200 tokens.
| Video length | Approx tokens | Fits in | |---|---|---| | 5 minutes | ~1,000 | Any free LLM | | 30 minutes | ~6,000 | Most LLMs (GPT-4, Claude Sonnet, etc.) | | 90 minutes | ~18,000 | Claude (200k context) easily; GPT-4 needs 128k | | 3+ hours | ~36,000+ | Claude Opus / Gemini Pro 1M context |
For very long videos, split into sections or use Claude Projects / ChatGPT custom GPTs to handle iteratively.
Use cases
Summarization: "Summarize this video in 200 words."
Outline extraction: "Extract a hierarchical outline of this video's content."
Q&A: "Based on this video, what does the speaker say about [topic]?"
Fact-checking: "Identify any factual claims in this video and rate their accuracy."
Comparative analysis: Paste two transcripts and ask: "Compare these two perspectives on the same topic."
Translation: "Translate this transcript to Spanish, preserving speaker tone."
Repurposing: "Convert this video transcript into a 1500-word blog post."
Common pitfalls
Including timestamps: wastes tokens, doesn't help most tasks. Strip them.
Pasting without context: tell the LLM what kind of content it is. "This is a transcript from a YouTube video on [topic]." helps it ground responses.
Trusting LLM summaries blindly: LLMs can miss nuance and occasionally hallucinate. Always verify important claims against the original video for high-stakes uses.
Long videos in low-context LLMs: if you hit the context limit, summarize sections separately and combine, or use a model with longer context.
Frequently asked questions
Does the transcript include video descriptions or comments? No — the transcript is only the spoken/captioned content. To include descriptions or comments, extract them separately and concatenate.
Can I do this with private or members-only videos? No. Only public videos have accessible captions for extraction.
What's the most token-efficient format? Plain text without timestamps, single space between sentences, no special markup.
Can I automate this end-to-end? For programmatic LLM pipelines, an API-first transcript service is more appropriate. SubExtract is web-UI-first today; the manual paste workflow is fastest for one-off use.