By Sebastian Cochinescu · March 20, 2026 · 7 min read

Build-time markdown mirrors for agent readability: how they compare to Cloudflare's approach

When an AI agent visits your website, it gets HTML. On some sites that is fine. On JS-heavy or layout-heavy pages, the content is buried in noise. Build-time markdown mirrors can give agents a cleaner fetch target without changing the canonical HTML page.

Not every site needs a markdown mirror

If your pages already ship substantial, well-structured HTML, the raw page may already be a good enough fetch target for agents. Markdown mirrors are most useful when the raw HTML is thin, heavily templated, or dominated by layout chrome.

That is the more honest framing for this feature: markdown mirrors are an optional machine-facing artifact for the pages that benefit from them, not a blanket rule that every site should publish a public .md companion for every page.

The problem: some HTML is a bad fetch target

Many agents can extract useful text from HTML, but the quality of the result still depends on what your raw response looks like. A typical web page can be heavy with navigation, cookie banners, analytics tags, scripts, and layout wrappers that have nothing to do with the main body content.

When the raw HTML is mostly shell and very little body content, fetch-based agents either miss the important text or have to guess too much. That is the case markdown mirrors try to fix.

What are markdown mirrors?

A markdown mirror is a .md file that contains the same content as your HTML page, but stripped of layout, navigation, and scripts. Just the content, in clean markdown format.

For example, /blog/my-post/index.html gets a companion file at /blog/my-post.md. An AI agent can fetch the markdown version directly instead of parsing the HTML.

Your pages also get a<link rel="alternate" type="text/markdown"> tag in the HTML head, so crawlers can discover the markdown version automatically when you enable the feature.

How agentmarkup generates markdown mirrors

Enable the feature in your config and it runs at build time on every HTML page in your output:

12345678// vite.config.ts or astro.config.mjs
agentmarkup({
  site: 'https://example.com',
  name: 'My Site',
  markdownPages: {
    enabled: true,
  },
})

The converter:

Extracts the page title, meta description, and canonical URL from the HTML head
Finds the main content area (<main>, <article>, or <body>)
Strips navigation, headers, footers, sidebars, scripts, styles, SVGs, and forms
Converts headings, lists, links, bold, italic, code, and blockquotes to markdown syntax
Preserves code blocks intact
Normalizes whitespace and deduplicates the page title
Injects a <link rel="alternate"> tag into the HTML for discovery

The result is a clean markdown file that an agent can read without wading through layout chrome.

Cloudflare's approach: runtime readability extraction

Cloudflare offers a readability extraction feature that strips HTML to readable content at request time. It is based on Mozilla's Readability library and runs on Cloudflare's edge network.

The key difference is runtime versus build time. Cloudflare processes pages on every request. You do not control the exact output. The extraction algorithm decides what is content and what is noise using heuristics.

Build-time vs runtime: why it matters

	agentmarkup (build-time)	Cloudflare (runtime)
When it runs	Once, during build	Every request
Output control	You see the .md files in your build output	Opaque, algorithm decides
Consistency	Deterministic, same output every build	May vary with algorithm updates
Performance cost	Zero runtime cost	Added latency per request
Works with SPAs	Yes, uses noscript fallback or pre-rendered HTML	Depends on SSR availability
Discovery	Link tag in HTML head + static .md URL	Special URL parameter or header
Vendor lock-in	None, output is static files	Requires Cloudflare
Customization	Choose which pages, preserve existing .md files	All or nothing

Why build-time can be a good fit for your own content

Cloudflare's runtime extraction makes sense for consuming other people's content, like a reader mode. For your own website, build-time generation can be a better fit because:

You control the output. If the markdown is wrong, you can debug it. You see the actual .md files in your build directory.
It works with client-rendered apps. agentmarkup checks for noscript fallback content in SPAs and uses it when the rendered body is thin. Runtime extractors often get empty content from JavaScript-rendered pages.
No vendor dependency. The markdown files are static. Deploy them anywhere. They work on Cloudflare Pages, Netlify, Vercel, S3, or any static host.
Integrated with the rest of the stack. Markdown mirrors work alongside llms.txt, JSON-LD, and robots.txt. One config, one build, everything consistent.

How agentmarkup reduces the downside

Public markdown mirrors do create tradeoffs. The main risks are duplicate fetches, indexing ambiguity, and output drift if the markdown becomes a second source of truth.

agentmarkup tries to keep those risks contained by generating the mirrors from final built HTML, preserving HTML as the canonical page, and writing canonical headers from each .md file back to the HTML route. If your raw HTML is already substantial, you can also keep llms.txt pointing at HTML by settingllmsTxt.preferMarkdownMirrors to false.

What the output looks like

For a blog post with a title, description, headings, and paragraphs, the generated markdown looks like:

12345678910111213# Why llms.txt matters

> LLMs answer questions by synthesizing web content. llms.txt gives them a structured overview.

Source: https://example.com/blog/why-llms-txt-matters/

## The shift from search engines to AI answers

For two decades, the path to online visibility was clear: optimize for Google...

## What is llms.txt?

llms.txt is a proposed standard that gives LLMs a structured overview of your website...

Clean, readable, no HTML artifacts. An AI agent reading this file understands the page quickly.

Getting started

Add markdownPages: { enabled: true } to your agentmarkup config when your raw HTML needs a cleaner machine-facing fetch path. On the next build, every HTML page in your output gets a companion .md file. When markdown mirrors are enabled, same-site page entries in llms.txt also default to the generated markdown URLs so cold agents discover the cleaner fetch path first. Check the llms.txt guide for the opt-out if you want HTML-first links instead.

If your site already serves rich raw HTML, you do not need to treat markdown mirrors as mandatory. They are a tactical option, not the whole product.

1pnpm add -D @agentmarkup/vite  # or @agentmarkup/astro

Build-time markdown mirrors for agent readability: how they compare to Cloudflare's approach

Not every site needs a markdown mirror

The problem: some HTML is a bad fetch target

What are markdown mirrors?

How agentmarkup generates markdown mirrors

Cloudflare's approach: runtime readability extraction

Build-time vs runtime: why it matters

Why build-time can be a good fit for your own content

How agentmarkup reduces the downside

What the output looks like

Getting started

Make your website machine-readable

More from the blog

When markdown mirrors help, and when they do not

Is your website ready for AI? Free LLM discoverability checker

How to make your brand appear in AI conversations

Why LLM-optimized e-commerce websites sell more

Every AI crawler indexing your website in 2026

JSON-LD structured data: the complete guide for web developers

What is GEO? Generative Engine Optimization explained for developers

Why llms.txt matters: making your website discoverable by AI