Audit your site the way AI crawlers see it

Most SEO tools fetch a page once, as a browser, and grade the HTML. @agentmarkup/audit fetches the same URL as GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended, diffs each response against a normal browser, and reports where AI systems get a different, often worse, view than your human visitors. It is the command-line companion to the website checker, built for local runs and CI.

Usage

# Audit any live URL
npx @agentmarkup/audit https://example.com

# JSON output for CI or league tables
npx @agentmarkup/audit https://example.com --json

# Bare domains are normalized to https://
npx @agentmarkup/audit example.com --timeout 15000

It is deterministic (pass / warn / error, no invented scores). The exit code is 1 when any error-level finding is present (a CI gate), 0 otherwise, and 2 on a usage error.

What it checks

AreaWhat it does
Crawler accessFetches as each AI crawler user-agent and diffs against a browser control. Flags challenges, differential blocks, rate limits, origin errors, and when an accessible crawler gets materially less content than a browser (JS-gated or cloaked pages).
JS dependenceMeasures whether the raw, un-executed HTML actually contains content, or is an empty shell that only fills in after JavaScript runs.
robots.txtDetects whether the crawlers you likely want are shadowed by a wildcard Disallow, and whether a canonical Content-Signal policy is present.
llms.txtFetches /llms.txt (guarding against HTML soft-404s), validates it, and checks whether the homepage links it for discovery.
JSON-LDExtracts the JSON-LD blocks and flags only unparseable or type-less ones; parseable structured data, including @graph, passes.
Markdown mirrorDetects a fetchable markdown mirror or a text/markdown alternate link, the clean low-noise version agents prefer.
SitemapChecks for /sitemap.xml, a Sitemap: directive in robots.txt, or common non-standard sitemap paths.
Page metadataChecks for a title, meta description, and canonical link that AI systems use to attribute the page.

An honest note on "blocked" crawlers

The audit spoofs a crawler's user-agent from an ordinary IP. That is exactly what a browser extension or a curious developer can do, and it is not what the real, verified bot does. So a 403 for a spoofed GPTBot user-agent is genuinely ambiguous:

  • it can be a user-agent WAF rule, which also blocks the real GPTBot (a real problem), or
  • it can be IP allowlisting, where the verified GPTBot, coming from OpenAI's published IP ranges, is let through just fine (no problem at all).

From a spoofed request the tool cannot tell these apart, so it reports them as warnings with both explanations and the raw evidence, never as a bare "your site blocks AI" error. Error-level findings are reserved for things provable from the response itself: a robots.txt that literally disallows the crawler, an empty JavaScript shell, or invalid llms.txt / JSON-LD.

Use it as a CI gate

Because the exit code is non-zero only on provable errors, the audit is safe to run in CI without false failures from the ambiguous cases:

# .github/workflows/ci.yml (excerpt)
- run: npx @agentmarkup/audit https://example.com

Programmatic use

The same audit is available as a library:

import { audit, renderText } from '@agentmarkup/audit'

const report = await audit('https://example.com', {
  fetchedAt: new Date().toISOString(),
})

console.log(report.summary) // { pass, warn, error, checks, passed, worst }
process.stdout.write(renderText(report))

The exported analyzers (analyzeCrawlerAccess, analyzeRobots, analyzeJsDependence, analyzeMachineReadable) and the SSRF-safe safeFetch are available for building custom pipelines.

How it relates to the rest of agentmarkup

The build-time adapters and the CLI generate machine-readable output; @agentmarkup/audit verifies what a live site actually serves to AI crawlers. It pairs naturally with the llms.txt, JSON-LD, and AI crawler guides: use those to fix what the audit finds.

Frequently asked questions

Does a 403 for GPTBot mean my site blocks AI?

Not necessarily. The audit spoofs the user-agent from a generic IP, so a 403 can be a user-agent WAF rule (which does block the real bot) or IP allowlisting (where the verified bot, from the vendor's published IP ranges, is fine). The audit reports this as a warning with both explanations, not as a definitive block.

Is it safe to point at any URL?

Requests use an SSRF-safe fetch: localhost, private, loopback, link-local, CGNAT, and IPv6-bypass address forms are refused, redirects are followed manually and re-validated per hop, and responses are size- and time-bounded. The blocklist mirrors the hosted checker.

How is this different from the website checker?

They run the same idea. The checker is the hosted, browser-based version for a quick lookup; @agentmarkup/audit is the command-line version for local runs, scripting, and CI, with a non-zero exit code on provable errors.