Skip to main content
§01 · Blog / AEO

How AI search decides what to cite. And how we measure it the same way.

Ben LittleFounder, WhyIQPublished 26 June 2026Last updated 26 June 202612 min read

AI search engines do not answer from memory. They answer by retrieving live web pages from a search index and reading them on the spot, a technique Google calls grounding and the field calls retrieval-augmented generation, or RAG. Then they fan a single question out into several related sub-queries and pull sources for each. So a page can be cited for a question its author never literally wrote. Across a 57.2-million-citation study, brands earned only 10.15 percent of all AI citations; the rest went to third-party sources the engine retrieved and trusted (Foundation and AirOps, 2026). Here is how that retrieve-and-read actually decides what gets cited, and how WhyIQ AI Radar measures it by running the real buyer queries through the real engines every week.

In June 2026, Google published an official guide to optimizing for its generative AI features. It is unusually blunt for a search company. AI Overviews and AI Mode, it says, are rooted in the core Search ranking systems, so the work of being cited by AI is, in Google's words, still SEO. There is no secret AI index. There is no magic file you can publish. The page that gets cited is the page that was retrievable, relevant, and worth quoting at the moment the engine ran the query.

That last clause matters more than it looks. A citation is not a property your page has. It is an event that happens when an engine runs a query and reads a result. Which means the only honest way to know whether you are cited is to do what the engine does: run the query and read what it cited. Most tools predict. We will get to why measuring is the harder, and the truer, thing to do.

How Do AI Search Engines Actually Find Sources?

The model does not browse. A retrieval layer does, and the model reads what it brings back.

When you ask ChatGPT, Perplexity, Claude, or Google AI a question that needs current information, the system first retrieves a set of web pages from a search index, then generates an answer grounded in what those pages say. Google describes its own version plainly: grounding improves the quality, accuracy, and freshness of AI responses by relying on its core Search ranking systems to retrieve relevant, up-to-date web pages, and the answer surfaces prominent, clickable links to them. That is RAG. The answer is assembled from retrieved sources, not recited from training data.

The practical consequence is a prerequisite chain. To be a grounding source, your page has to be crawlable, indexed, and eligible to be quoted. There is no separate pipeline that scores you "for AI" while ignoring whether Google can read you. For AI Overviews specifically, the overlap with classic ranking is high: roughly 76 percent of the URLs cited in AI Overviews also rank in Google's top 10 organic results for the underlying query (Semrush, 2026, for AI Overviews specifically; standalone answer engines like Perplexity diverge more). The blunt version: if you cannot be retrieved, you cannot be cited, and retrievability is mostly the same work it always was.

Key takeaway

A citation starts with retrieval. Crawlable, indexed, snippet-eligible pages are the input to grounding. There is no AI-only index that rewards a page Google cannot read.

What Is Query Fan-Out, and Why Does It Matter?

You ask one question. The engine quietly asks five.

Query fan-out is the second half of how AI search decides what to cite. Instead of retrieving sources for the literal sentence a user typed, the engine generates several related sub-queries and pulls sources for each, then synthesizes one answer. Google's own example: the query "how to fix a lawn that's full of weeds" fans out into "best herbicides for lawns," "remove weeds without chemicals," and "how to prevent weeds in lawn." The user typed one thing. The engine searched several.

This is why exact-match keyword thinking quietly stopped working. A page that comprehensively covers a topic can be cited for sub-questions the author never wrote, because the engine generated those sub-questions itself and went looking. Depth and coverage beat phrase-matching. A thin page targeting one keyword competes against the fan-out; a thorough page answering the whole question cluster gets pulled in for several of the sub-queries at once.

It also reframes what a "good answer" is. The engine is not looking for the page that contains your keyword. It is looking for the page that best answers each sub-question it generated. Write for the question behind the query, in depth, and you become retrievable for a fan-out you never see.

So What Actually Decides Which Page Gets Cited?

Retrieval gets you into the room. Evidence density and third-party presence decide whether you get quoted.

Once a page is retrievable, the question becomes which retrieved sources the engine trusts enough to quote. The peer-reviewed answer is substance. In the Princeton study that named the field, Generative Engine Optimization (Aggarwal et al., KDD 2024), adding real statistics, source citations, and direct quotations to content lifted source visibility in AI answers by up to roughly 40 percent. Keyword stuffing, the SEO-era reflex, did not help. The engines reward content that carries verifiable evidence, not content that repeats a phrase.

Placement compounds it. Across a 548,000-page study, 44.2 percent of AI citation extractions came from the opening 30 percent of body text (AirOps, 2026). Engines quote the first self-contained, sourced sentence they can reach. Freshness compounds it again: AI citations have roughly a three-month half-life, and pages updated within three months are about three times more likely to be cited than stale ones (AirOps, 2026). A page that ranked in 2024 and was never touched quietly drops out of the citation pool.

Then there is the part most on-page checklists miss: where the citation actually comes from. In a 57.2-million-citation analysis across ChatGPT, Gemini, Perplexity, and Google AI, brands earned only 10.15 percent of citations from their own domains. Reddit alone accounted for 20.8 percent of external citations, rising to 30.9 percent on unbranded discovery queries; YouTube 13 percent, LinkedIn 11 percent, G2 4 percent (Foundation and AirOps, 2026). On branded queries, 77.6 percent of citations were brand-owned; on unbranded discovery, only 2.2 percent. Brand mentions across third-party sources are the strongest single correlate of AI visibility, at roughly 0.67 (Semrush, 2026). Translation: the engines mostly cite other people writing about you, not you writing about you.

44.2%

of AI citation extractions come from the first 30% of body text. The lede is the citation surface. AirOps, 548K-page study, 2026

A citation is not a property your page has. It is an event that happens when an engine runs a query and reads a result.

What Does Not Work (Google's Own Myth-Busts)?

The most useful part of Google's guide is the list of things it tells you to stop doing.

Google's June 2026 guide is explicit about the tactics that do not move AI citation, and it is worth quoting a search company against the grey-hat playbook the category sells. It states that Google Search ignores llms.txt and other special AI files. Three independent studies across 300,000-plus pages found the same: no measurable lift in citation rate from publishing one. It is a hygiene file, not a lever.

On structured data, Google is equally direct: it is not required to appear in AI features. Schema remains valuable for rich-results eligibility and entity clarity, and it helps most on Google AI Overviews, moderately on ChatGPT and Claude, and barely on Perplexity. But "add schema and AI will cite you" is not a claim the evidence or Google's own guidance supports. Same for "chunking" content into tiny fragments, which Google says is unnecessary because its systems understand multiple topics on a page and there is no ideal page length. And same for rewriting copy specifically for machines: the engines understand synonyms and meaning, so chasing long-tail keyword variants is wasted effort.

There is one nuance worth drawing sharply, because it is the line between a real strategy and spam. Google warns against seeking inauthentic mentions. That is not the same as building authentic third-party presence. The 57.2-million-citation data is unambiguous that Reddit threads, review-site listings, and real editorial coverage are where most citations come from. The honest version of the work is to earn real presence where buyers actually are. The dishonest version is to manufacture mentions, and that is the version Google's spam systems are built to catch. Build the first. Never the second.

0%

Measurable AI citation lift from publishing an llms.txt file. Google Search ignores it. Google Search Central guidance + 3 independent studies, n>300K, 2026

Why Measuring Citation Is Harder Than Predicting It

AI engines are probabilistic. Ask the same question twice and you can get two different answers.

This is the part that breaks most measurement. A search ranking is roughly stable: the page at position three is at position three when you check again an hour later. An AI answer is generated fresh each time, from a fan-out that can vary, over a retrieval set that can shift. Run the identical prompt through the identical engine twice and the cited sources can differ. A single snapshot is a sample of one from a noisy distribution.

That splits the tooling into two honest jobs, and they should never be confused. The first job is prediction: score a page for the on-page signals that make it retrievable and quotable, and estimate how citation-ready it is. WhyIQ's page-scan AI Citability Index does exactly this, and we are careful about what it claims. In our own words, on our methodology page: it is not a count of actual citations in production LLM responses. We do not claim it is. It predicts readiness from structural signals. It is a blueprint inspection, not a walk through the finished building.

The second job is measurement: actually run the buyer's real questions through the real engines and record what they cited. That is the only way to turn a probabilistic process into a number you can trust, and it is harder, because you cannot do it from the page alone. You have to ask the engines, repeatedly, and average the noise out. Which is what the second product does.

Key takeaway

The page-scan AI Citability Index predicts readiness from on-page signals. WhyIQ AI Radar measures the outcome by running the real queries. The Index predicts; Radar measures. Two jobs, opposite sides of the line.

How WhyIQ Radar Measures It the Same Way the Engines Produce It

The engines retrieve and cite. Radar reads back exactly what they cited.

Every week, WhyIQ AI Radar runs your real buyer-intent prompts through all five engines, ChatGPT, Perplexity, Claude, Gemini, and Google's real AI Mode (English queries today), and records the ordered list of sources each engine actually cited. It is the same retrieve-and-cite step the engines run to build an answer. We do not scrape a cache, query a proxy, or predict a likelihood. We ask the engine the buyer's real question and read back its real answer. The reading is the engine's own, not our estimate of it.

Because the engines are probabilistic, a single read is a single sample. On the Agency tier, Radar runs every prompt three times per engine each week and averages, so the dashboard shows a confidence band, cited in two of three passes, instead of presenting one noisy snapshot as the truth. Single-shot tracking treats that noise as a feature. We treat it as the known measurement problem it is. When we genuinely cannot measure, the dashboard shows zero rows, never a hallucinated zero.

This is where the bright line earns its keep. The AI Citability Index predicts whether a page is ready to be cited, from its on-page signals. Radar records whether it actually was, by which engines, and for which prompts. We never present readiness as a citation count, and we never present Radar's measured citations as a readiness guess. They answer different questions. Together they cover the whole arc: predict the page, then measure the outcome.

We ran the one-off version of this experiment first: 482 queries across five engines, documented in why ChatGPT cites some sources and ignores others. Radar runs that same experiment continuously for any tracked domain. The full calibration and the per-segment query bank live in our public methodology.

The Bottom Line

The mechanic is simple once you see it. The implications are not.

AI search retrieves and reads live pages (grounding) and fans one question into many (fan-out), then cites the sources it trusts for each. So the page that gets cited is the page that was retrievable, carried real evidence in its first 30 percent, stayed fresh, and was backed by authentic third-party presence where buyers actually are. The tactics that do not work, llms.txt, chunking, rewriting for machines, manufactured mentions, do not work because they do not touch any part of that mechanic, and Google says so in its own guide.

And because a citation is an event, not a property, the only honest way to know whether you are cited is to do what the engine does: run the real query and read what it cited. Readiness scores predict that outcome. Real-query tracking measures it. WhyIQ runs both, and keeps the line between them sharp, because the day you confuse a forecast with a measurement is the day your number stops meaning anything.

Frequently asked questions

How do AI search engines decide what to cite?

They retrieve live web pages from a search index and read them on the spot, a technique Google calls grounding (and the field calls retrieval-augmented generation, or RAG). Then they fan one question out into several sub-queries and pull sources for each. The page is cited if it surfaces as a relevant, trusted source for one of those sub-queries. Being indexed, ranking well, and being snippet-eligible is the prerequisite; there is no separate AI index to optimize for.

What is query fan-out?

Query fan-out is when an AI engine generates several related sub-queries from the one question a user typed, then retrieves and synthesizes sources for each. Google's own example: 'how to fix a lawn that's full of weeds' fans out into 'best herbicides for lawns,' 'remove weeds without chemicals,' and 'how to prevent weeds in lawn.' Because of fan-out, a page can be cited for a question its author never literally wrote, so topical depth beats exact-match keywords.

What is RAG or grounding in AI search?

Retrieval-augmented generation (RAG), which Google calls grounding, is the technique of improving an AI answer's accuracy and freshness by retrieving relevant, up-to-date web pages from a search index and reading them before generating the answer. The answer is built from sources the engine retrieved at that moment, not only from training data, which is why fresh, retrievable pages get cited.

Does structured data or schema get my page cited by AI?

Not on its own. Google's official guidance states structured data is not required to appear in its AI features; it remains valuable for rich-results eligibility and entity clarity. Schema helps most on Google AI Overviews, moderately on ChatGPT and Claude, and barely on Perplexity. Treat it as a hygiene and rich-results signal, not a citation lever.

Does llms.txt help with AI citation?

No. Google's own guidance is explicit that Google Search ignores llms.txt, and independent studies across 300,000-plus pages found no measurable lift in citation rate from publishing one. It is a hygiene file, not a citation lever.

How do you measure AI citation rather than predict it?

You run the real query through the real engine and read back what it actually cited. WhyIQ AI Radar does this weekly across ChatGPT, Perplexity, Claude, Gemini, and Google's AI Mode, recording the ordered list of sources each engine cited. The separate page-scan AI Citability Index predicts readiness from on-page signals; it does not count live citations. Different jobs: the Index predicts, Radar measures.

Why does the first part of my page matter so much for citation?

Across a 548,000-page study, 44.2 percent of AI citation extractions came from the opening 30 percent of body text (AirOps). Engines quote the first self-contained, sourced answer they can reach. If the answer lives in section four of an eight-section article, the engine often never gets there, so the lede is the citation surface.

Read next

For the category framing, see answer engine optimization and whether AEO and GEO replace SEO. For how WhyIQ measures the outcome continuously, see WhyIQ AI Radar.

Stop predicting your AI citations. Measure them.

WhyIQ AI Radar runs your real buyer questions through ChatGPT, Perplexity, Claude, Gemini, and Google's AI Mode every week, and reads back exactly what each engine cited. All 5 engines flat from $29/mo, no per-engine add-ons. One free check, no account.

See who AI cites in your category