What is the single most important GEO technical fix?

Ensuring AI crawlers are explicitly allowed in robots.txt. The AI search engines that matter, ChatGPT (GPTBot, OAI-SearchBot, ChatGPT-User), Perplexity (PerplexityBot, Perplexity-User), Claude (ClaudeBot, anthropic-ai, claude-web), and Google (Google-Extended), all respect robots.txt. If your site uses a broad Disallow rule or blocks unknown bots by default, these crawlers may have no access to your content regardless of its quality. Check robots.txt first.

How many schema types do I actually need to implement for GEO?

For most sites, five schema types cover the majority of GEO value: Organization (entity identity), WebSite (site identity), Article (for every piece of content), FAQPage (for pages with Q&A sections), and BreadcrumbList (for every page below the homepage). HowTo schema adds value on tutorial and step-by-step content. Product schema is essential for e-commerce. The remaining dozens of schema types are situational; do not add them unless they accurately describe real content on the page.

Does llms.txt actually help with AI search visibility?

An llms.txt file signals to AI crawlers which pages are your canonical resources and provides a structured entry point for LLM indexing tools. However, it is not a confirmed ranking signal for any major AI engine. Its primary value is as a crawl-prioritisation aid, telling AI systems which pages matter most when crawl budgets are limited. Most major platforms (OpenAI, Anthropic, Google) have not officially documented llms.txt support, though the file is widely adopted as industry practice. Implement it, but do not rely on it as a substitute for proper crawl access and content quality.

How do I measure whether GEO improvements are working?

Measurement sits at three layers. First, citation tracking: manually query your target topics in ChatGPT, Perplexity, and Google AI Mode and check whether your domain appears as a cited source. Second, traffic attribution: use GA4 to segment sessions by LLM referrers (chat.openai.com, perplexity.ai, claude.ai), these are real, attributable traffic sources. Third, branded query trends: monitor Google Search Console for branded query impressions, which correlate with AI-surface mentions. Dedicated tools like Profound and Peec AI are building systematic citation-share tracking.

Should every page on my site be optimised for GEO?

No. Prioritise pages that target informational queries where AI systems are most likely to synthesise an answer: guides, explainers, comparison pages, and FAQ resources. Product and transaction pages are rarely cited in AI-generated answers because AI systems do not typically recommend specific purchase URLs. Apply the three-pass approach: fix crawlability blockers across all pages first, then improve content structure on your top informational pages, then build a measurement and refresh cadence.

The GEO implementation checklist: crawlability, citability, structure, and governance

This is the checklist we would use if we were handed a site and asked to improve its AI-search readiness without wasting weeks on speculation.

It is not a list of magic tricks. It is a list of the conditions that make citations and supporting links more likely by reducing crawl friction, ambiguity, and trust gaps.

If you want the conceptual version first, read our complete guide to generative engine optimization.

How to use this checklist

Do not try to score all 100 items at once.

Use it in three passes:

Fix all blockers that keep important pages from being crawled, indexed, or surfaced.
Improve the structure and evidentiary quality of your top commercial and informational pages.
Build an update and measurement workflow so the work compounds instead of decaying.

Phase 1: crawlability and discovery

Robots and bot access

Check the following:

Your robots.txt file exists at the root and loads with a 200 response.
High-value directories are not accidentally blocked.
Search-specific bots are intentionally allowed or intentionally disallowed.
CDN, WAF, or bot-management rules are not stricter than the robots.txt policy you think you have.

For Google AI features, Google says Search access is managed through Googlebot. For ChatGPT Search, OpenAI says it is important to allow OAI-Searchbot. Anthropic separately documents ClaudeBot, Claude-User, and Claude-SearchBot.

A simple pattern looks like this:

User-agent: Googlebot
Allow: /

User-agent: OAI-Searchbot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Only use allow rules you actually intend to support. The point is not to allow every bot blindly. The point is to make sure your policy is deliberate and consistent across infrastructure.

Sources:

Sitemap health

Confirm:

the sitemap includes every page you actually want discovered
deleted or redirected pages are removed quickly
canonical URLs in the sitemap match your intended canonical targets
priority revenue pages are not buried in orphaned sections

If the site changes frequently, consider implementing IndexNow for participating engines so newly updated URLs are announced quickly.

Internal discovery

Review:

orphan pages
thin hub pages with no meaningful contextual links
overly deep content trees
pagination or JavaScript patterns that make important links hard to crawl

Generative systems do not rescue weak architecture. If your own site barely explains how pages relate, you should expect retrieval to be inconsistent.

Phase 2: indexing and eligibility

Indexability

For every priority page, verify:

no accidental noindex
the canonical URL resolves cleanly
there are no soft-404 behaviors
the rendered page contains the actual answer content
the page is available without login or fragile client-side hydration

For Google's AI features, the page must be indexed and eligible to show a snippet. That makes basic Search eligibility non-negotiable.

Source:

Google Search Central: AI features and your website

Preview and snippet controls

Audit:

nosnippet
data-nosnippet
max-snippet
max-image-preview
noindex

Do not assume these tags are harmless defaults. If you over-restrict previews, you may limit what Search and AI surfaces can show from the page.

Use strict preview controls only when you truly mean it.

Duplicate and conflicting versions

Resolve:

HTTP versus HTTPS duplication
subdomain duplication
faceted URLs accidentally indexed
parameterized duplicates
near-identical versions competing for the same query space

AI retrieval is already probabilistic enough. Duplicating equivalent pages just gives the system more ways to misunderstand which URL matters.

Phase 3: structured understanding

Structured data

Implement the schema types that naturally fit the page:

Article or BlogPosting for editorial content
FAQPage for legitimate question-and-answer sections
Organization for company identity
BreadcrumbList for hierarchy
Product, Service, or SoftwareApplication where relevant

Guidelines:

make sure structured data matches visible content
prefer complete, accurate properties over bloated markup
validate with the Rich Results Test during development
use JSON-LD where possible, since Google recommends it

Do not add schema for content that is not really there. Bad structured data increases ambiguity rather than reducing it.

Sources:

Entity clarity

Review whether every important page clearly states:

who wrote it
what product, service, or topic it is about
what company owns it
when it was published
when it was updated

If you have to infer the entity, so does the machine.

Helpful additions:

author bio with expertise
reviewer where appropriate
organization page
contact and policy pages that prove the site is real
consistent brand naming across the site

Phase 4: content architecture

Answer-first structure

Each priority page should pass this test:

could a reader identify the direct answer within the first screen?
is the scope obvious?
do major subheadings map to real follow-up questions?
are comparisons and definitions easy to extract?

Good GEO pages reduce the amount of inference required.

Heading hierarchy

Check for:

one clear H1
meaningful H2s that reflect subtopics or user questions
H3s used for drill-down, not decoration
headings that describe the section content honestly

Avoid vague headings like "Why this matters" if ten different pages on the site use them to mean ten different things.

Text availability

Important details should exist in text, not only in:

images
tabs that never render server-side
accordions with inaccessible markup
PDFs without supporting HTML summaries
video-only explanations

Google explicitly advises making important content available in textual form.

Source:

Google Search Central: AI features and your website

FAQ blocks

Use FAQs when they are real.

Good FAQ sections:

answer common objections or clarifications
add precise wording users actually ask
reduce ambiguity around scope, pricing, implementation, or edge cases

Bad FAQ sections:

repeat the same keyword three ways
answer invented questions nobody asks
exist only to stuff markup onto the page

Phase 5: evidence and trust

Source quality

For every important page, ask:

Are there named sources?
Are dates provided?
Is the scope of each claim clear?
Is the page honest about what is sourced versus what is interpretation?

The Princeton GEO paper found that citations, statistics, and quotations can improve visibility in generative responses. That should not surprise anyone. Sourced claims are safer to quote than unsupported assertions.

Source:

GEO: Generative Engine Optimization

Originality

Check whether the page adds anything beyond a generic summary.

Stronger assets include:

internal process detail
firsthand examples
original screenshots
implementation checklists
unique comparisons
proprietary data
expert commentary tied to real experience

If the page could be replaced by a competent AI summary with no loss of value, your GEO problem is not technical. It is editorial.

Freshness and maintenance

For pages that compete on recency, confirm:

update timestamps are visible
stale references are replaced
broken citations are removed
major platform changes trigger review

Freshness is not universal, but decay is real. An unmaintained guide becomes a bad citation candidate over time.

Phase 6: commercial and local completeness

For product, service, and local-intent pages, verify:

pricing or pricing logic is clear where appropriate
business profile details are current
merchant or ecommerce data is current where relevant
contact information is consistent
location and service-area details are explicit

Google specifically calls out keeping Merchant Center and Business Profile information up to date for AI features.

Source:

Google Search Central: AI features and your website

Phase 7: measurement and operations

Search Console

Use Search Console to monitor:

clicks and impressions for priority pages
branded versus non-branded query changes
changes after major content rebuilds

Google says AI-feature traffic is included in Search Console's overall Web search reporting, so do not expect a separate neat bucket that solves attribution for you.

Source:

Google Search Central: AI features and your website

Citation tracking

Create a recurring prompt set for your most important categories, then track:

whether your brand appears
which page gets cited
what type of page wins
whether the answer includes a supporting link
how often the same competitor appears instead

You are not looking for perfection. You are looking for patterns.

Governance

Assign owners for:

technical controls
source review
update cadence
high-value prompt library
reporting

GEO becomes unreliable fast when content, engineering, and analytics each assume someone else owns it.

The minimum viable GEO scorecard

If you need a simple starting scorecard, track:

percent of priority pages fully crawlable
percent of priority pages fully indexable
percent with relevant structured data
percent with author, update, and source sections
answer-engine citation coverage across priority prompts
assisted conversions on pages rebuilt for GEO

That is enough to move from opinion to operating discipline.

Final reminder

Most GEO failures are not caused by a missing AI trick.

They are caused by:

blocked bots
unclear pages
weak sourcing
derivative content
no operational follow-through

Fix those first.

The GEO implementation checklist: crawlability, citability, structure, and governance

How to use this checklist

Phase 1: crawlability and discovery

Robots and bot access

Sitemap health

Internal discovery

Phase 2: indexing and eligibility

Indexability

Preview and snippet controls

Duplicate and conflicting versions

Phase 3: structured understanding

Structured data

Entity clarity

Phase 4: content architecture

Answer-first structure

Heading hierarchy

Text availability

FAQ blocks

Phase 5: evidence and trust

Source quality

Originality

Freshness and maintenance

Phase 6: commercial and local completeness

Phase 7: measurement and operations

Search Console

Citation tracking

Governance

The minimum viable GEO scorecard

Final reminder

Sources and further reading

Frequently asked questions

What is the single most important GEO technical fix?

How many schema types do I actually need to implement for GEO?

Does llms.txt actually help with AI search visibility?

How do I measure whether GEO improvements are working?

Should every page on my site be optimised for GEO?

We can simply do it for you