Audit your site the way AI crawlers see it

Most SEO tools fetch a page once, as a browser, and grade the HTML. @agentmarkup/audit fetches the same URL as GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended, diffs each response against a normal browser, and reports where AI systems get a different, often worse, view than your human visitors. It is the command-line companion to the website checker, built for local runs and CI.

Usage

12345678# Audit any live URL
npx @agentmarkup/audit https://example.com

# JSON output for CI or league tables
npx @agentmarkup/audit https://example.com --json

# Bare domains are normalized to https://
npx @agentmarkup/audit example.com --timeout 15000

It is deterministic (pass / warn / error, no invented scores). The exit code is 1 when any error-level finding is present (a CI gate), 0 otherwise, and 2 on a usage error.

What it checks

Area	What it does
Crawler access	Fetches as each AI crawler user-agent and diffs against a browser control. Flags challenges, differential blocks, rate limits, origin errors, and when an accessible crawler gets materially less content than a browser (JS-gated or cloaked pages).
JS dependence	Measures whether the raw, un-executed HTML actually contains content, or is an empty shell that only fills in after JavaScript runs.
robots.txt	Detects whether the crawlers you likely want are shadowed by a wildcard `Disallow`, and whether a canonical Content-Signal policy is present.
llms.txt	Fetches `/llms.txt` (guarding against HTML soft-404s), validates it, and checks whether the homepage links it for discovery.
JSON-LD	Extracts the JSON-LD blocks and flags only unparseable or type-less ones; parseable structured data, including `@graph`, passes.
Markdown mirror	Detects a fetchable markdown mirror or a `text/markdown` alternate link, the clean low-noise version agents prefer.
Sitemap	Checks for `/sitemap.xml`, a `Sitemap:` directive in robots.txt, or common non-standard sitemap paths.
Page metadata	Checks for a title, meta description, and canonical link that AI systems use to attribute the page.

An honest note on "blocked" crawlers

The audit spoofs a crawler's user-agent from an ordinary IP. That is exactly what a browser extension or a curious developer can do, and it is not what the real, verified bot does. So a 403 for a spoofed GPTBot user-agent is genuinely ambiguous:

it can be a user-agent WAF rule, which also blocks the real GPTBot (a real problem), or
it can be IP allowlisting, where the verified GPTBot, coming from OpenAI's published IP ranges, is let through just fine (no problem at all).

From a spoofed request the tool cannot tell these apart, so it reports them as warnings with both explanations and the raw evidence, never as a bare "your site blocks AI" error. Error-level findings are reserved for things provable from the response itself: a robots.txt that literally disallows the crawler, an empty JavaScript shell, or invalid llms.txt / JSON-LD.

Use it as a CI gate

Because the exit code is non-zero only on provable errors, the audit is safe to run in CI without false failures from the ambiguous cases:

12# .github/workflows/ci.yml (excerpt)
- run: npx @agentmarkup/audit https://example.com

Programmatic use

The same audit is available as a library:

12345678import { audit, renderText } from '@agentmarkup/audit'

const report = await audit('https://example.com', {
  fetchedAt: new Date().toISOString(),
})

console.log(report.summary) // { pass, warn, error, checks, passed, worst }
process.stdout.write(renderText(report))

The exported analyzers (analyzeCrawlerAccess, analyzeRobots, analyzeJsDependence, analyzeMachineReadable) and the SSRF-safe safeFetch are available for building custom pipelines.

How it relates to the rest of agentmarkup

The build-time adapters and the CLI generate machine-readable output; @agentmarkup/audit verifies what a live site actually serves to AI crawlers. It pairs naturally with the llms.txt, JSON-LD, and AI crawler guides: use those to fix what the audit finds.

Frequently asked questions

Does a 403 for GPTBot mean my site blocks AI?

Not necessarily. The audit spoofs the user-agent from a generic IP, so a 403 can be a user-agent WAF rule (which does block the real bot) or IP allowlisting (where the verified bot, from the vendor's published IP ranges, is fine). The audit reports this as a warning with both explanations, not as a definitive block.

Is it safe to point at any URL?

Requests use an SSRF-safe fetch: localhost, private, loopback, link-local, CGNAT, and IPv6-bypass address forms are refused, redirects are followed manually and re-validated per hop, and responses are size- and time-bounded. The blocklist mirrors the hosted checker.

How is this different from the website checker?

They run the same idea. The checker is the hosted, browser-based version for a quick lookup; @agentmarkup/audit is the command-line version for local runs, scripting, and CI, with a non-zero exit code on provable errors.