Skip to main content
WhyIQ: AI-powered conversion rate optimisation tool
whyiq / blog/Research

69% of AI Crawlers Can't Read Your Website (And One Is Wearing a Disguise)

whyiq9 April 202611 min read

You spent six months building a SaaS landing page. Beautiful hero section. Social proof carousel. Pricing table that animates on scroll. You ship it. It looks great. Then ChatGPT visits your homepage, sees a white rectangle with a nav bar, and leaves.

That is not a metaphor. 69% of AI crawlers cannot execute JavaScript (searchVIU, 1.3 billion requests analyzed, November 2025). Your React app is a haunted house to them: they walk in, see nothing, and walk out. The nav bar survives. Congratulations.

This post covers what AI crawlers actually see on your site (not much), which one is impersonating a real browser to dodge your robots.txt, and why blocking them all will cost you 23% of your traffic while doing nothing to stop them citing you anyway.

The Uninvited Guests: Who Is Crawling Your Site Right Now?

You have more AI bots visiting your site than you have customers. Here is the roll call.

GPTBot, OpenAI's training crawler, grew 305% year-over-year and processes 569 million requests per month (Cloudflare, May 2025). PerplexityBot grew 157,490%. That is not a typo. ChatGPT-User, the bot that fires when someone tells ChatGPT to browse the web, grew 2,825%.

The part most people miss: OpenAI and Anthropic each run three separate bots. One collects training data. One builds a search index. One fetches pages live when a user asks for current information. Blocking one does nothing to the other two. You are not dealing with a single bot per company. You are dealing with a fleet.

Then there is the crawl-to-referral ratio: the number of pages a bot crawls for every click it sends back to your site. Google's ratio is 3:1 to 30:1. OpenAI's is 3,700:1. Anthropic's is 25,000:1 to 100,000:1 (Cloudflare, 2025). They crawl aggressively and send almost nothing back.

25,000:1

Anthropic's crawl-to-referral ratio. That is 25,000 pages crawled for every 1 click sent back to your site. Cloudflare, 2025

So: at least nine bots, three companies, and most of them are functionally illiterate.

What Do AI Crawlers Actually See on Your JavaScript Site?

GPTBot fetches your HTML. Your HTML says "load React." GPTBot does not load React. GPTBot sees nothing.

Handing someone a book where the words only appear when you plug in a reading lamp. They do not have a reading lamp. They are very enthusiastic about reading. They just cannot do it.

searchVIU tested 32 AI crawlers across 1.3 billion requests. Only Googlebot, Google-Extended (Gemini), and Applebot render JavaScript. GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Meta-ExternalAgent, Amazonbot, Bytespider: all static HTML only. They fetch your JavaScript files, look at them, and leave without executing a single line.

searchVIU also ran a price extraction test across five AI systems: could each one find pricing data that only appeared via JavaScript? Gemini: 50% success. ChatGPT: 37.5%. Google AI Mode: 25%. Perplexity: 12.5%. Claude: 0%. Not low. Zero. Claude looked at the pricing page, saw nothing, and left.

0%

Claude's success rate extracting JavaScript-rendered pricing data. searchVIU, October 2025

If you are testing AI visibility with Googlebot, you are benchmarking against the one bot that can actually read your site. Every other AI crawler cannot. That is like testing whether your house is fireproof by pouring water on it. WhyIQ's AI SEO audit tests against GPTBot and ClaudeBot specifically, not Googlebot.

Which Content Is Invisible to AI Bots?

Here is everything GPTBot and ClaudeBot walk right past on your site.

Your ContentWhat GPTBot SeesWhy
Pricing tableEmpty divJS-rendered
Testimonial carouselEmpty divClient-side rendered
G2/Capterra widgetNothingThird-party JS embed
Customer logo stripNothingClient-side rendered
Interactive demoNothingNo crawlable output
Scroll-triggered CTANothingNever loaded
Video contentNothingCannot process video
A/B test variantDefault onlySees server default

If you are counting, that is your pricing, your social proof, your conversion mechanism, your product demo, and your best customer evidence. Gone. The nav bar survived.

One ironic exception. Common Crawl, the training pipeline used by OpenAI, Google, and Anthropic, managed to scrape paywalled news content. Subscription paywalls render via JavaScript. The crawlers skip JavaScript, skip the paywall, and see the full article sitting in raw HTML before the gate appears. The bots that cannot read anything accidentally read the things they were not supposed to.

Does Schema Markup Help with GPTBot JavaScript Rendering?

If your SEO consultant told you to add JSON-LD schema to help AI crawlers understand your site, here is what happened next: nothing.

searchVIU tested five AI systems (ChatGPT, Claude, Gemini, Perplexity, Google AI Mode) against eight schema scenarios. Could they extract information that existed only in JSON-LD with no visible HTML equivalent? Zero of five extracted it. Not one. Not partially. Zero.

Hidden Microdata and concealed RDFa: also ignored universally. Search Atlas ran a separate study (December 2024) and found no correlation between schema coverage and AI citation rates. Sites with comprehensive schema did not outperform sites with none.

One exception: Google-Extended, the crawler powering Gemini, runs on Googlebot's infrastructure and renders JavaScript fully. Schema helps Google AI Overviews through that pipeline. For ChatGPT, Claude, and Perplexity, where most AI search growth is happening, it does nothing.

The only thing AI bots reliably extract is visible HTML text. Not metadata. Not structured data. Text on the page. Like a human, except worse at it.

One of the Bots Is Wearing a Fake Moustache

This is where it gets weird.

Perplexity operates a second, undeclared crawler. It impersonates Chrome on macOS. The user-agent string is indistinguishable from a real person on a MacBook. It generates 3 to 6 million additional requests per day on top of PerplexityBot's declared 20 to 25 million. It rotates IP addresses across multiple autonomous systems to avoid detection.

The part that made Cloudflare publish an entire blog post about it: the stealth crawler ignores robots.txt. Completely. Cloudflare tested this by creating unpublished domains with explicit disallow-all robots.txt files and WAF blocking rules. Perplexity's stealth crawler showed up anyway. Then surfaced the content in its answers.

For comparison, ChatGPT respects robots.txt. When a site blocks GPTBot, ChatGPT stops crawling. Perplexity shows up to the party wearing a fake moustache and eats all the food.

3-6M

undeclared daily requests from Perplexity's stealth crawler, on top of its declared 20-25M. Cloudflare, 2025

You cannot opt out of Perplexity's data collection through conventional means. That is not an opinion. That is a Cloudflare-documented fact. Remember this for the next section.

You lose a quarter of your traffic, and the bots still cite you anyway. From data they already have. Using a crawler you cannot block.

"Just Block Them" (And Lose 23% of Your Traffic)

The obvious response: block them all. Add GPTBot to robots.txt. Take back control.

GPTBot is already the most blocked AI bot on the internet. 5.89% of all websites block it, out of roughly 140 million analyzed (Ahrefs, 2025). ClaudeBot's block rate is growing the fastest, up 32.67% in one year. Among news publishers, 79% block AI training bots.

What blocking actually does: stops future training data collection. That is it. Does not stop citations from data already in the training set. Does not stop live retrieval bots (ChatGPT-User, Claude-User). Does not stop Perplexity's stealth crawler, which ignores your robots.txt anyway.

The cost. Rutgers Business School and Wharton (December 2025) found that publishers who blocked AI crawlers experienced a 23.1% decline in total monthly visits and a 13.9% decline in human-only browsing. Not bot traffic. Real people. Separately, BuzzStream analyzed 4 million citations across 3,600 prompts and found blocking had "little measurable effect" on whether a publisher appears in AI-generated responses.

23.1%

decline in total monthly visits for publishers who blocked AI crawlers. Rutgers/Wharton, December 2025

So: you lose a quarter of your traffic, and the bots still cite you anyway. From data they already have. Using a crawler you cannot block. This is, objectively, hilarious.

What Actually Gets You Cited by AI?

The same data that reveals the blind spots also shows what works. The practice of optimizing for AI search engines has a name now: generative engine optimization, or GEO.

GEO is not traditional SEO. Traditional SEO optimizes for Googlebot, which renders JavaScript, follows links, and ranks pages. GEO optimizes for GPTBot, ClaudeBot, and PerplexityBot, which cannot render JavaScript, extract text from static HTML, and decide whether to cite you. Different crawlers, different rules, different game (Aggarwal et al., KDD 2024).

The Digital Bloom analyzed 2.6 billion AI citations. Brand search volume is the strongest predictor of whether AI cites you: 0.334 correlation. Backlinks, the traditional SEO golden metric, showed weak or neutral correlation. GEO runs on different rules than SEO.

Multi-platform presence matters. Sites mentioned on 4 or more distinct platforms (Reddit, YouTube, LinkedIn, forums) are 2.8x more likely to appear in ChatGPT responses. If you only exist on your own domain, you are a single-source claim. AI systems triangulate.

Content format matters. Comparative listicles account for 32.5% of all AI citations, the dominant format by a wide margin. Opinion and analysis pieces follow at 9.91%. Product descriptions: 4.73%.

Statistics in content: +22% AI visibility (The Digital Bloom, 2025)

Direct quotations: +37% AI visibility

Citing sources within your content: +115.1% visibility increase

Content published within the past year: 65% of AI bot traffic targets it

Best paragraph length for AI snippet extraction: 40-60 words. Self-contained units. One question answered completely per paragraph. If your last blog post was published in 2023, you are furniture. AI systems have moved on.

32.5%

of all AI citations come from comparative listicles. The dominant format. The Digital Bloom, 2025

The 5-Point Survival Checklist

None of this is complicated. It is just not what traditional SEO advice covers. This is GEO: generative engine optimization.

1. Audit your JavaScript rendering. Run curl -s https://yoursite.com | head -100 on your own homepage. If the body is empty, every AI crawler except Googlebot sees empty too. Or run a free WhyIQ scan to see exactly what each AI crawler picks up.

2. Move core content to server-side rendered HTML. Headline, pricing description, key features, top three testimonials: rendered on the server, present in the initial HTML. Not hydrated. Not lazy-loaded. There on first fetch.

3. Add statistics and citations to your copy. Not for Google. For the AI that is going to extract a 40-word paragraph and present it as an answer. Stats increase AI visibility by 22%. Citing sources increases it by 115%.

4. Write comparative content. "X vs Y" formats drive a third of all AI citations. If you are not comparing, you are not getting cited. This is the single highest-leverage content format for AI visibility.

5. Publish something this month. 65% of AI traffic targets content from the past year. Freshness is not optional. If your blog has been quiet since last year, AI systems treat your site as static. You are a snapshot, not a source.

Your schema can wait. Your JSON-LD can wait. Making your actual content visible to the bots that are visiting 3,700 times per referral click cannot wait.

WhyIQ's AI SEO audit does not check what Googlebot sees. It checks what GPTBot sees. What ClaudeBot sees. What the bot wearing a fake moustache sees. Most SaaS sites fail this check without knowing it.

Check what AI crawlers see on your site

Frequently Asked Questions

Does GPTBot execute JavaScript?

No. GPTBot fetches JS files in 11.5% of requests but executes none (searchVIU, November 2025). Static HTML only. If your content renders via React, Vue, or Angular, GPTBot sees a blank page.

Can I stop AI bots from crawling my site?

Partially. GPTBot and ClaudeBot respect robots.txt. Perplexity runs a stealth crawler that ignores it entirely (Cloudflare, 2025). Blocking only stops future training data collection, not citations from existing data or live retrieval bots.

Why is my site not being cited by ChatGPT?

Brand search volume is the strongest citation predictor (0.334 correlation), not backlinks. Sites on 4+ platforms are 2.8x more likely to appear. 65% of AI traffic targets content from the past year.

Does schema markup help with AI search?

Not for ChatGPT, Claude, or Perplexity. searchVIU tested 5 AI systems against 8 schema scenarios: zero extracted JSON-LD-only data. Schema helps Google AI Overviews only. Visible HTML is the only reliable source.

How is Perplexity different from ChatGPT for indexing?

Perplexity retrieves content in real time for every query. ChatGPT answers 60% of queries from training data alone. Perplexity also runs a stealth crawler generating 3-6 million undeclared daily requests.

What content format gets cited by AI the most?

Comparative listicles: 32.5% of all AI citations (The Digital Bloom, 2025). Statistics boost AI visibility by 22%, direct quotations by 37%, and citing sources by 115.1%.

What is generative engine optimization (GEO)?

GEO is the practice of optimizing content for AI search engines like ChatGPT, Perplexity, and Gemini. Unlike traditional SEO, GEO focuses on static HTML visibility, source citations, statistical density, and self-contained answer sections that AI can extract and cite.