Why ChatGPT Cites Some Sources and Ignores Others

Type "best landing page analyzer" into ChatGPT. It hands back Hotjar, VWO, Wynter, Unbounce. The list does not include most of the tools that actually exist in that category. The user buys whoever was on the list.

To measure how often that gap shows up, we ran 482 AI search queries across Perplexity, ChatGPT, Claude, Gemini, and Google AI Overviews. The headline number: 0.6% of non-brand queries cited any specific source we could trace (WhyIQ citation panel, April 2026). Three runs out of 482. All three on Perplexity. All three on a phrase we had to invent before the engine had something to retrieve.

Brand mentions get a domain cited about ten times more often than backlinks do (Digital Bloom, 2025, n=75K brands). Backlinks, the thing every SEO playbook still tells you to chase, barely register. The gap that finding sits inside has a name: the brand citation moat. The brand citation moat is the structural advantage in AI search that comes from third-party brand mentions across review sites, listicles, and Reddit, rather than from on-page SEO or backlinks. Almost nobody is building one. Including, until recently, us.

The brand citation moat. The thing the LLM trusts is the crowd, not the chain.

This post covers what the panel ran, what the data says about why most domains do not get cited, what actually predicts citation, and the five moves that close the gap. None of it is what traditional SEO advice covers. ChatGPT Search and Google Search retrieve from different indexes, weight different signals, and reward different content. The substrate underneath the search box has changed and most blog posts about "AI SEO" have not noticed.

What We Actually Ran: 482 Queries Across 5 Platforms

Imagine setting up an experiment specifically to measure your own irrelevance. That is what a citation panel is.

We built one. 56 buyer-intent queries, running weekly against Perplexity, ChatGPT Search, Claude Search, Google AI Overviews via DataForSEO, and Gemini grounding. The bank is segmented: 35% CRO agency queries, 20% marketing agency, 15% web designer, 30% founder. Stage split: 15% problem-aware, 25% solution-aware, 30% vendor-comparing, 30% ready-to-buy. Each query carries a target page, so we can score not just whether WhyIQ got cited, but whether the right page got cited. Full-win versus partial-win.

Six weeks in, 482 runs deep. Three cites. All three on Perplexity. All three on a phrase we coined ourselves: "pre-traffic CRO." On every other platform, on every other phrase, our name was either absent or replaced by a competitor. ChatGPT Search: zero non-brand cites. Claude: zero. Gemini: zero. Google AIO: zero.

Meanwhile, the brand-defence panel (queries that already contain a known brand name) cites that brand 94% of the time. People who already typed the name find the source. People searching the category never see it at all.

0.6%

Non-brand AI citation rate across 482 runs (3 of 482). Only 1 of 5 platforms cite-bearing. WhyIQ citation panel, Apr 2026

Comic panel: an exhausted founder stands in a dark control room holding a clipboard reading '482 RUNS / 3 CITES'. Behind him, five large CRT monitors are mounted on the wall labelled Perplexity, ChatGPT, Claude, Gemini, and Google AIO. Four monitors are dark or showing static. Only the Perplexity monitor glows cyan with the words 'pre-traffic CRO' on screen. — One green pixel on Perplexity. The rest of the wall is static.

The on-page citation signals scored fine in independent audits. The page is not the problem. The problem is the rest of the internet, which has not heard of us.

Why Is the Non-Brand Citation Rate So Low?

There are two kinds of AI search query. The kind that pretends to know you, and the kind that does not.

Brand-defence queries pretend. Type "Acme vs Hotjar" or "Acme pricing" and the engine cites Acme, of course it does, the user just typed the name. The retrieval engine is confirming what the query already specified. That is not a signal. That is a comfort metric, like a horoscope written about you, by you. A 94% brand-defence cited rate tells you the engine knows you exist. It does not tell you anyone else does.

Non-brand queries are the ones that matter. "Landing page analyzer." "CRO audit tool." "How to test a landing page before launch." For an LLM to cite a specific domain on a query like that, the domain has to surface from third-party content that mentions it by name in the same context. If the brand is not in the third-party content, it is not in the answer. The user never learns it exists. They buy whoever was in the answer.

Once you see the gap, the pattern shows up everywhere. The top quartile of brands by web mentions earn 10x more AI citations than the next quartile, across a sample of 75,000 brands (Digital Bloom, 2025). It is not a small effect. It is the dominant effect. The brands that win AI search are the brands that are mentioned by other people. Most companies have no moat at all.

Key takeaway

Brand-defence cited rate (94%) and non-brand cited rate (0.6%) are two different problems. Brand-defence is solved by existing. Non-brand is solved by being mentioned by other people.

What Predicts AI Citation: Backlinks or Brand Mentions?

AI search engines do not read backlinks. They listen to gossip.

Two large public studies landed on the same finding. Brands in the top quartile by web mentions get cited about 10x more often than brands one quartile below, across 75,000 brands (Digital Bloom 2025; the underlying correlation is r=0.334-0.664 across the two studies, in case you wanted the receipts). Backlinks, the SEO-era proxy for authority, do not register at any meaningful level. Earned media, separately, drives 325% more AI citations than owned content (AuthorityTech, 2025). The fight isn't close. It isn't really a fight.

The reason is mechanical. AI retrieval systems were trained on text where entity X gets discussed alongside topic Y, in the wild, by other people. The training signal is "people writing about Y mention A by name." A backlink is one site choosing to acknowledge another, formally, in a footer. A brand mention is the broader conversation already including you, casually, mid-sentence. Models learned to trust the casual one. We are training search engines the way you train a dog: with the words we use most often around the things we mean.

10x

Brands in the top quartile of web mentions get cited about 10x more often than the next quartile. Backlinks barely register. Digital Bloom, n=75K, 2025

Mentions weigh ten times more than links. The scale is not subtle.

This is the brand citation moat. It is not a content strategy. It is a presence strategy. The companies winning AI search are not the ones with the best blog posts. They are the ones whose names appear in the same paragraphs as the topic, on sites they do not control, in conversations they did not start.

Where Do AI Citations Actually Come From?

AI engines visit three places more than they visit your homepage. None of them are your homepage.

Reddit accounts for 40.1% of all LLM citations and 46.7% of Perplexity citations specifically (Wellows, 2025). Comparative listicles, the "Top 10 X tools" format you possibly think is dead SEO bait, drive 21.9-46% of AI citations across platforms. 80.9% of B2B SaaS citations come from third-party sources, not from the company's own site (Goodie, 2025). The thing you spent six months building is the smallest piece of why you do not get cited.

Review platforms are the next surprise. G2, Capterra, TrustRadius, AlternativeTo, GetApp. Both ChatGPT Search and Perplexity weight them disproportionately. Review-platform presence alone lifts cited rate from 1.8% to 4.6-6.3%, regardless of domain size (SE Ranking, 2025). A 100-employee SaaS without a G2 listing gets cited less than a 10-person company that claimed its listing and gathered 50 reviews. The G2 listing is doing more work for the smaller company than its entire engineering team.

Then there is the cross-platform problem. ChatGPT and Perplexity overlap on only 11% of cited domains. Each platform pulls from a different substrate. ChatGPT Search re-ranks Bing's index (87% citation overlap with Bing top 10). Perplexity is Reddit-heavy and recrawls every 48 hours. Google AI Overviews lean on top organic plus structured data, but 47% of cited pages sit below position 5 in regular results. Gemini grounds in Google Search plus the Knowledge Graph and leans toward listicles and affiliate sites.

"Optimizing for AI" without specifying which AI is hand-waving. The platforms do not agree with each other and have no plans to start.

40.1%

of all LLM citations come from Reddit. 46.7% of Perplexity citations specifically. Wellows, 2025

If you are not in any of these places, you are not in AI search. Your sitemap, schema, and lovingly hand-tuned URLs are doing the work of a guard dog at an empty house. The party is happening on someone else's lawn.

Why Does the First 30% of Your Page Matter So Much?

LLMs do not read your blog. They scan it for the first quotable sentence, take it, and leave.

44.2% of all LLM citation extractions come from the opening 30% of body text (SE Land, AirOps 548K-page study). The lede has stopped being a stylistic choice. It is the citation surface, the part of your page the model actually quotes. If the answer to the user's question lives in section 4 of an 8-section article, the LLM never gets there. It quotes whatever you put in your intro instead. "Here at Acme we believe customers come first" is not the data the engine wanted, but it is the data it can reach.

The fix is structural and a bit boring. Every page top has to carry a self-contained, sourced, named answer in the first paragraph. Not a hook. Not a clever tease. The literal answer. The pattern looks like this: a specific stat with a number, source attribution, named concept. Write a sentence Wikipedia would let stand and the LLM has nothing to argue with. It just quotes you.

The numbers reinforce it. Statistics in body copy lift AI citation rates by 22%. Citing sources lifts them by 115% (The Digital Bloom, 2025). Direct quotation of original research lifts them by 37%. None of these are about ranking. They are about whether the LLM trusts the sentence enough to put it in front of a human user.

The lede is no longer a stylistic choice. It is the citation surface.

Why Do AI Citations Decay Every Three Months?

Your top-ranking page from 2024 is quietly dying. Nobody is going to email you about it.

AI citations have roughly a three-month half-life. 93% of cited pages get re-shuffled by the next major model update. Pages refreshed quarterly are 3x less likely to lose their citations than pages left static for 12+ months (AirOps 548K-page study, 2025). 65% of AI traffic targets content from the past year (Cloudflare, 2025). Freshness is not a quality signal here. It is a retrieval cutoff. Old pages stop surfacing entirely.

This breaks every habit traditional SEO trained you with. A 2018 post that ranked in 2018 can still rank in 2026 with minimal effort, and Google will keep sending it traffic for free. In AI search, that same post drops out of the citation pool the next time the model updates. Your dashboard does not flag it. The traffic just stops. Your only signal is silence.

The operational fix is boring. Pick the 8-12 pages you most want cited and refresh them on a calendar. Last-updated date in the byline, not just buried in metadata. Fresh stat. Fresh example. Fresh internal link. One hour per page, every quarter. The alternative is watching every citation you earned dissolve every 90 days and rebuilding from zero.

What About llms.txt and Schema Markup?

Two things you almost certainly tried in 2024. Two things that did not work.

llms.txt was proposed as a robots.txt analog for LLM crawlers. The idea was that you could declare which content the model should pay attention to. Three independent studies tested it in 2025: Otterly, SE Ranking, and Generix, with combined sample sizes over 300,000 pages plus a 90-day field test. None found a measurable lift in AI citation rate from publishing one. The major LLM crawlers do not read it. The file sits on your server doing exactly nothing.

Schema markup is more nuanced and quietly worse. Structured JSON-LD helps on Google AI Overviews, which inherits Google's structured-data pipeline. A real, narrow win. But schema is invisible to ChatGPT Search, Perplexity, and Claude when those engines fetch your page directly. And worse: an incomplete or generic schema block carries an 18-point citation penalty in one large study (Growth Marshal, 730 cited pages analyzed). Bad schema is measurably worse than no schema. The "I will add some markup later" tab on your roadmap is not a half-finished improvement. It is an active liability. Either commit and finish it, or close the tab.

The biggest LLM-only-content trap is mass-generated copy. AI-written articles published at high velocity show up to 60% factual inaccuracy in citation extractions, and the same pages tend to trigger spam-velocity flags that downrank the entire domain (ImageWorks, 2025). The blog you cannot write fast enough to be useful is also the blog AI engines stop trusting. The shortcut to citation eats the citation.

Measurable AI citation lift from publishing an llms.txt file (3 independent studies, n>300K pages). Otterly, SE Ranking, Generix, 2025

The Brand Citation Moat: 5-Point Checklist

Five moves. Zero paid spend. All of them are presence work that lives on other people's domains. This is what GEO looks like when you do it on purpose.

1. Claim and populate review-site listings. G2, Capterra, AlternativeTo, GetApp, SaaSHub. Even a thin profile lifts cited rate from 1.8% to 4.6%. Aim for 50+ reviews on the platforms your category lives in. This is the highest-leverage single move and the one most teams skip because it feels like homework. It is homework. Do the homework.

2. Show up in Reddit conversations in your category. Not as link-drop spam. As a genuine reply, with a stat, with your name attached. Perplexity weighs Reddit at 46.7% of its citations, and the up-voted reply with your name in it is the unit of currency. Treat it like a 90-day warm-up, not a launch tactic. The first month feels useless. The second month is when the engine starts to notice.

3. Get included in third-party listicles. "Top 10 X tools," "Best X for Y." Yes, the format you thought was dead. It drives 80.9% of B2B SaaS citations. The cold pitch that works is a 100-word email to the author with one stat about your tool and one stat about a category competitor. It works because you are doing the writer's job for them.

4. Rewrite your top 10 pages so the first paragraph is the answer. Specific stat. Source. Named concept. 44.2% of LLM extractions come from the first 30% of body. If your hero copy is "Welcome to Acme, where we believe customers come first," the LLM has nothing to quote and your competitor's first paragraph wins instead.

5. Refresh those same pages on a quarterly calendar. Last-updated date in the byline, not just in metadata. Fresh stat. Fresh example. Fresh internal link. Pages refreshed quarterly are 3x less likely to lose AI citations. One hour per page, every 90 days. Put it in your calendar like a dentist appointment. Less optional than the dentist.

Comic panel: a founder in a yellow hard hat kneels and lays a glowing cyan brick onto a half-built wall around a small softly-lit house labelled 'YOUR SITE'. The bricks are labelled with the five-point checklist: G2 REVIEW, REDDIT REPLY, LISTICLE, FIRST-PARAGRAPH STAT, QUARTERLY REFRESH. A pile of unused labelled bricks sits next to him. His speech bubble reads '...the moat isn't backlinks'. Banner above reads 'BUILDING THE MOAT'. — Five bricks. Half a wall. Nothing here is paid spend. All of it is presence work.

The citation panel that produced these numbers, including the KPI gates and the per-segment query bank, is documented in our public methodology. The broader research domain that this kind of work sits inside, running citation diagnostics on a page before it has any traffic at all, is what we call pre-traffic CRO.

The 482-query data set behind this post was a one-off snapshot. WhyIQ AI Radar runs the same experiment continuously for any tracked domain: see who AI cites in your category, weekly, across ChatGPT, Perplexity, Claude, Gemini, and Google AI. The cited rate, the competing third-party sources, and the gap between you and the brands that get named all become a tracked number, not a one-time finding.

Frequently asked questions

What predicts whether ChatGPT cites your site?

Brand mention volume across third-party sources. Brands in the top quartile by mentions get cited about 10x more often than brands in the next quartile (Digital Bloom 2025, n=75K). Backlinks, the thing the SEO industry has obsessed over for two decades, barely move the needle. Earned media drives 325% more AI citations than owned content (AuthorityTech, 2025). LLMs trust people writing about you, not people linking to you.

What is the brand citation moat?

The structural advantage in AI search that comes from third-party brand mentions across review sites, listicles, and Reddit, rather than from on-page SEO or backlinks. The top quartile of brands by web mentions earn 10x more AI citations than the next quartile (Digital Bloom, n=75K brands).

Where do AI citations actually come from?

Reddit alone is 40.1% of LLM citations and 46.7% of Perplexity citations (Wellows, 2025). Comparative listicles drive 21.9-46% of AI citations across platforms. 80.9% of B2B SaaS citations come from third-party sources, not from the company's own site (SE Land, Goodie 2025).

Does the first part of my page matter more for AI citation?

Yes. 44.2% of LLM citation extractions come from the opening 30% of body text (SE Land, AirOps 548K-page study). LLMs extract sentences, not full pages. The lede is no longer a stylistic choice. It is the citation surface.

Does llms.txt help with AI citation?

No. Three independent studies (Otterly, SE Ranking, Generix; combined n=300K+ pages) found no measurable lift in AI citation rate from publishing an llms.txt file. The major LLM crawlers do not read it.

How quickly do AI citations decay?

AI citations have roughly a 3-month half-life. 93% of cited pages get re-ranked by the next major model update. Pages refreshed quarterly are 3x less likely to lose their citations (AirOps 548K-page study, 2025).

Why is review-site presence so important for AI citation?

Review-platform presence alone lifts cited rate from 1.8% to 4.6-6.3%, regardless of domain size (SE Ranking, 2025). G2, Capterra, and TrustRadius are heavily weighted by ChatGPT Search and Perplexity. AI engines treat these as third-party corroboration signals.

What is generative engine optimization (GEO)?

GEO is the practice of optimizing for citation by AI search engines like ChatGPT, Perplexity, Claude, and Gemini, rather than for ranking on Google. It focuses on third-party brand presence, passage-level answer quality, statistical density, and refresh cadence. It is structurally different from traditional SEO.

Why ChatGPT cites some sources and ignores others. Our 482-run data set.