By Sebastian Cochinescu · July 2, 2026 · 6 min read

See your website the way AI crawlers do

Most tools that grade a website fetch it once, as a browser, and score the HTML. But the systems that increasingly decide whether your brand shows up in an answer, ChatGPT, Claude, Perplexity, Google's AI surfaces, do not arrive as your browser. They arrive as GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended, and they can get a very different response. @agentmarkup/audit shows you that response.

The blind spot

A page can look perfect in your browser and still be a poor citation target for AI. The homepage might be an empty JavaScript shell that only fills in after a framework hydrates, so a crawler that does not run JavaScript sees nothing. A CDN or WAF rule might treat a crawler user-agent differently than a browser. There might be no llms.txt, or a malformed one. The JSON-LD that powers rich results and AI summaries might be missing or broken. None of that is visible from a single browser fetch.

What the audit does

It fetches your URL once as a normal browser to establish a baseline, then again as each major AI crawler, and diffs the responses. On top of that it checks the machine-readable surface: robots.txt intent, Content-Signal, llms.txt, JSON-LD, and whether the raw HTML is actually readable without JavaScript.

12345# Audit any live URL as the major AI crawlers
npx @agentmarkup/audit https://example.com

# JSON for CI or comparisons
npx @agentmarkup/audit https://example.com --json

A run reads like this:

12345678✓ OpenAI gptbot can reach the page
✓ Anthropic claudebot can reach the page
✓ Content is present without JavaScript
⚠ llms.txt is missing
✓ robots.txt does not block the expected AI crawlers
✓ JSON-LD structured data present

9/10 checks passed

Honest by design

Here is the part that makes the audit trustworthy rather than alarmist. It spoofs a crawler's user-agent from an ordinary IP, which is not what the real, verified bot does. So a 403 for a spoofed GPTBot user-agent is genuinely ambiguous: it could be a user-agent WAF rule that also blocks the real GPTBot, or it could be IP allowlisting where the verified GPTBot is let through just fine. The audit cannot tell those apart from a spoofed request, so it reports them as warnings with both explanations and the raw evidence, never as a bare "your site blocks AI" error.

Error-level findings, the ones that fail CI, are reserved for things provable from the response itself: a robots.txt that literally disallows the crawler, an empty JavaScript shell, or invalid llms.txt / JSON-LD. That is why the exit code is safe to gate a build on.

Where it fits

The agentmarkup adapters and the CLI generate machine-readable output at build time. @agentmarkup/audit verifies what a deployed site actually serves to AI crawlers. It is the command-line sibling of the hosted website checker: the checker is the quick browser lookup, the audit is the scriptable, CI-friendly version.

Read the audit guide for the full check list, then use the llms.txt, JSON-LD, and AI crawlers guides to fix whatever it surfaces.

Make your website machine-readable

agentmarkup is an open-source build-time toolkit for Vite, Astro, Next.js, and Nuxt (plus a framework-agnostic CLI) that generates llms.txt, injects JSON-LD structured data, creates optional markdown mirrors from final HTML when raw pages need a cleaner agent-facing fetch path, manages AI crawler robots.txt rules, patches optional Content-Signal and canonical mirror headers, and validates everything at build time. Zero runtime cost.

Learn more GitHub

pnpm add -D @agentmarkup/vite  # or @agentmarkup/astro, @agentmarkup/next, @agentmarkup/nuxt, @agentmarkup/cli

Written by

Sebastian Cochinescu · Developer of agentmarkup

Builder of developer tools for machine-readable websites. Developer of agentmarkup. Founder of Anima Felix.

See your website the way AI crawlers do

The blind spot

What the audit does

Honest by design

Where it fits

Make your website machine-readable

More from the blog

How to add llms.txt, JSON-LD, and AI crawler controls to Nuxt

Run agentmarkup on any static site with the CLI

How to add llms.txt, JSON-LD, and AI crawler controls to Next.js

When markdown mirrors help, and when they do not

Is your website ready for AI? Free LLM discoverability checker

Build-time markdown mirrors for agent readability: Cloudflare comparison

How to make your brand appear in AI conversations

Why LLM-optimized e-commerce websites sell more

Every AI crawler indexing your website in 2026

JSON-LD structured data: the complete guide for web developers

What is GEO? Generative Engine Optimization explained for developers

Why llms.txt matters: making your website discoverable by AI