Website Crawler

Crawl an entire website and extract the Markdown content from every page.

max 100

Each page crawled costs 1 credit. Starting a job costs 1 additional credit.

No crawl started yet

Enter a URL above to crawl its pages

Who crawls websites to text

  • RAG and LLM training data

    Crawl an entire docs site or knowledge base into clean Markdown chunks ready to feed into a vector store or fine-tuning pipeline.

  • Documentation site to Markdown

    Migrate a hosted documentation site to Markdown files for a new platform, static-site generator, or git-based workflow.

  • Content audits at scale

    Pull every blog post or article from a competitor's site to analyze topic depth, content patterns, and missed angles.

  • Archival snapshots

    Snapshot a website's full content for archival, citation, or before-and-after comparison after a redesign.

  • Search index seeding

    Crawl a site once to seed a custom search engine, knowledge graph, or content recommendation system.

How to crawl a website to clean Markdown

  1. Paste the starting URL

    The crawler begins from this URL. Usually a docs root, blog index, or sitemap entry point.

  2. Set crawl depth and limits

    Choose how deep to crawl from the starting URL and how many total pages to fetch. Defaults are sensible for most sites.

  3. Run the crawl

    SubExtract walks internal links from the starting URL, extracting each page's main content as Markdown. Progress streams live.

  4. Download the bundle

    Get all crawled pages as a single bundled file — one Markdown document per URL, with the source URL preserved as a header.

Frequently asked questions

Related tools & guides

    Crawl Website to Text — Bulk URL Scraper for Any Site