Website Crawler
Crawl an entire website and extract the Markdown content from every page.
No crawl started yet
Enter a URL above to crawl its pages
Who crawls websites to text
- RAG and LLM training data
Crawl an entire docs site or knowledge base into clean Markdown chunks ready to feed into a vector store or fine-tuning pipeline.
- Documentation site to Markdown
Migrate a hosted documentation site to Markdown files for a new platform, static-site generator, or git-based workflow.
- Content audits at scale
Pull every blog post or article from a competitor's site to analyze topic depth, content patterns, and missed angles.
- Archival snapshots
Snapshot a website's full content for archival, citation, or before-and-after comparison after a redesign.
- Search index seeding
Crawl a site once to seed a custom search engine, knowledge graph, or content recommendation system.
How to crawl a website to clean Markdown
- Paste the starting URL
The crawler begins from this URL. Usually a docs root, blog index, or sitemap entry point.
- Set crawl depth and limits
Choose how deep to crawl from the starting URL and how many total pages to fetch. Defaults are sensible for most sites.
- Run the crawl
SubExtract walks internal links from the starting URL, extracting each page's main content as Markdown. Progress streams live.
- Download the bundle
Get all crawled pages as a single bundled file — one Markdown document per URL, with the source URL preserved as a header.
Frequently asked questions
Related tools & guides
Scrape any webpage to clean Markdown or text. Extract content from URLs in seconds.
Extract every video from a YouTube channel with views, likes, and publish dates. Export as CSV or TXT.
Extract every video from a YouTube playlist with metadata. Export the full playlist as CSV.