# See your website the way AI crawlers do - agentmarkup

> Use @agentmarkup/audit to fetch any live URL as GPTBot, ClaudeBot, PerplexityBot, and other AI crawlers, diff each response against a browser, and catch machine-readability issues in CI.

Source: https://agentmarkup.dev/blog/audit-ai-crawler-access/

By [Sebastian Cochinescu](/authors/sebastian-cochinescu/) · July 2, 2026 · 6 min read

# See your website the way AI crawlers do

Most tools that grade a website fetch it once, as a browser, and score the HTML. But the systems that increasingly decide whether your brand shows up in an answer, ChatGPT, Claude, Perplexity, Google's AI surfaces, do not arrive as your browser. They arrive as GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended, and they can get a very different response. `@agentmarkup/audit` shows you that response.

## The blind spot

A page can look perfect in your browser and still be a poor citation target for AI. The homepage might be an empty JavaScript shell that only fills in after a framework hydrates, so a crawler that does not run JavaScript sees nothing. A CDN or WAF rule might treat a crawler user-agent differently than a browser. There might be no `llms.txt`, or a malformed one. The JSON-LD that powers rich results and AI summaries might be missing or broken. None of that is visible from a single browser fetch.

## What the audit does

It fetches your URL once as a normal browser to establish a baseline, then again as each major AI crawler, and diffs the responses. On top of that it checks the machine-readable surface: `robots.txt` intent, Content-Signal, `llms.txt`, JSON-LD, and whether the raw HTML is actually readable without JavaScript.

```
# Audit any live URL as the major AI crawlers
npx @agentmarkup/audit https://example.com

# JSON for CI or comparisons
npx @agentmarkup/audit https://example.com --json
```

A run reads like this:

```
✓ OpenAI gptbot can reach the page
✓ Anthropic claudebot can reach the page
✓ Content is present without JavaScript
⚠ llms.txt is missing
✓ robots.txt does not block the expected AI crawlers
✓ JSON-LD structured data present

9/10 checks passed
```

## Honest by design

Here is the part that makes the audit trustworthy rather than alarmist. It spoofs a crawler's **user-agent** from an ordinary IP, which is not what the real, verified bot does. So a `403` for a spoofed `GPTBot` user-agent is genuinely ambiguous: it could be a user-agent WAF rule that also blocks the real GPTBot, or it could be IP allowlisting where the verified GPTBot is let through just fine. The audit cannot tell those apart from a spoofed request, so it reports them as **warnings with both explanations and the raw evidence**, never as a bare "your site blocks AI" error.

Error-level findings, the ones that fail CI, are reserved for things provable from the response itself: a `robots.txt` that literally disallows the crawler, an empty JavaScript shell, or invalid `llms.txt` / JSON-LD. That is why the exit code is safe to gate a build on.

## Where it fits

The agentmarkup adapters and the [CLI](https://www.npmjs.com/package/@agentmarkup/cli) *generate* machine-readable output at build time. [@agentmarkup/audit](https://www.npmjs.com/package/@agentmarkup/audit) *verifies* what a deployed site actually serves to AI crawlers. It is the command-line sibling of the hosted [website checker](/checker/): the checker is the quick browser lookup, the audit is the scriptable, CI-friendly version.

Read the [audit guide](/docs/audit/) for the full check list, then use the [llms.txt](/docs/llms-txt/), [JSON-LD](/docs/json-ld/), and [AI crawlers](/docs/ai-crawlers/) guides to fix whatever it surfaces.

## Make your website machine-readable

agentmarkup is an open-source build-time toolkit for Vite, Astro, Next.js, and Nuxt (plus a framework-agnostic CLI) that generates llms.txt, injects JSON-LD structured data, creates optional markdown mirrors from final HTML when raw pages need a cleaner agent-facing fetch path, manages AI crawler robots.txt rules, patches optional Content-Signal and canonical mirror headers, and validates everything at build time. Zero runtime cost.

 Learn more GitHub
```
pnpm add -D @agentmarkup/vite # or @agentmarkup/astro, @agentmarkup/next, @agentmarkup/nuxt, @agentmarkup/cli
```

Written by

[Sebastian Cochinescu](/authors/sebastian-cochinescu/) · Developer of agentmarkup

Builder of developer tools for machine-readable websites. Developer of agentmarkup. Founder of Anima Felix.

## More from the blog

### How to add llms.txt, JSON-LD, and AI crawler controls to Nuxt

Use @agentmarkup/nuxt to generate llms.txt, inject JSON-LD, create markdown mirrors, and manage AI crawler rules from prerendered Nuxt output.

 June 21, 2026 · 7 min read

### Run agentmarkup on any static site with the CLI

Use @agentmarkup/cli to run llms.txt, JSON-LD, markdown mirrors, and AI crawler controls over any built static output, with a CI check command.

 June 21, 2026 · 6 min read

### How to add llms.txt, JSON-LD, and AI crawler controls to Next.js

Use @agentmarkup/next to generate llms.txt, inject JSON-LD, manage AI crawler rules, and understand the dynamic SSR boundary in Next.js.

 March 23, 2026 · 8 min read

### When markdown mirrors help, and when they do not

A practical guide to when generated markdown mirrors add signal, when HTML is already enough, and how to avoid unnecessary downsides.

 March 20, 2026 · 7 min read

### Is your website ready for AI? Free LLM discoverability checker

Audit your website for llms.txt, JSON-LD, robots.txt, markdown mirrors, and sitemap. Free tool for e-commerce and brand websites.

 March 20, 2026 · 8 min read

### Build-time markdown mirrors for agent readability: Cloudflare comparison

Build-time markdown generation for AI readability, including when it helps and how it compares to Cloudflare runtime extraction.

 March 20, 2026 · 7 min read

### How to make your brand appear in AI conversations

Organization schema, llms.txt, and FAQ markup make your brand visible in ChatGPT, Claude, and Perplexity answers.

 March 20, 2026 · 7 min read

### Why LLM-optimized e-commerce websites sell more

Product JSON-LD, llms.txt, and AI crawler access make your store visible in AI product recommendations.

 March 20, 2026 · 8 min read

### Every AI crawler indexing your website in 2026

Complete list: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, and more. What each does and how to control access.

 March 20, 2026 · 8 min read

### JSON-LD structured data: the complete guide for web developers

Schema types, JSON-LD vs microdata, common mistakes, and build-time validation.

 March 20, 2026 · 10 min read

### What is GEO? Generative Engine Optimization explained for developers

What is real, what is hype, and what you can do today to make your site citeable by AI.

 March 20, 2026 · 7 min read

### Why llms.txt matters: making your website discoverable by AI

LLMs answer questions by synthesizing web content. llms.txt gives them a structured overview of your site.

 March 20, 2026 · 6 min read
