Google SAGE research: what agentic AI reveals about your content strategy

Google published a research paper on January 26, 2026 that barely made a ripple in the mainstream SEO press. The paper is called SAGE, which stands for Steerable Agentic Data Generation for Deep Search with Execution Feedback. The title is academic jargon, the paper itself reads like a PhD thesis, and most SEO professionals I know scrolled past it. That is a mistake.

SAGE is not a new ranking factor. It is not a product announcement. It does not introduce any immediate change to how Google Search works. What it does is pull back the curtain on how Google is training the AI agents that will power the next generation of search. And if you pay attention to what those agents do, how they search, what they prioritize, where they struggle, and what makes a question easy or hard for them, you start to see a roadmap for content strategy that is very different from what most people are executing right now.

I have spent the last two months reading and re-reading this paper, discussing it with other SEO researchers, and testing some of the hypotheses it implies. Here is what I think it means for how you create and structure content.

The dual-agent system, explained without the jargon

At its core, SAGE is a system for creating test questions that are genuinely hard for AI agents to answer. Why does Google need hard test questions? Because the AI agents they are building, the ones that power deep research features in Gemini and Google Search, need to be trained on problems that require multiple steps, multiple searches, and real reasoning to solve.

The problem Google faced was that existing training datasets were too easy. Previous benchmark datasets like Musique and HotpotQA required an average of 2.7 and 2.1 searches per question, respectively. That is not much. A question that takes two searches to answer does not test whether an AI agent can handle the kind of complex, multi-step research that humans actually need help with.

So Google built a dual-agent system. The first agent, which I will call the question-writer, generates complex questions that should require many steps and many searches to answer. The second agent, the search agent, tries to actually answer those questions using web search. The feedback loop between them is what makes SAGE interesting: the question-writer learns what kinds of questions the search agent finds hard, and it keeps generating harder ones.

This matters for SEO because the search agent in SAGE uses the same fundamental architecture as the AI agents that will power Google's deep research products. Understanding what makes questions easy or hard for this agent tells you something meaningful about how AI-powered search evaluates and retrieves information from the web.

The four shortcuts that AI agents exploit (and what they mean for your content)

The most useful part of the SAGE paper, from a content strategy perspective, is its analysis of "shortcuts." These are the patterns that allow an AI agent to answer a seemingly complex question without actually doing complex reasoning. Google identified four primary shortcuts, and each one has direct implications for how you should think about content.

The first shortcut is information co-location. This happens when multiple pieces of information needed to answer a question exist on the same page. Instead of having to search multiple sources and combine information, the agent finds everything it needs in one place. From Google's testing perspective, this makes a question easier than intended. From a content strategy perspective, this is a signal that comprehensive, consolidated content is what AI agents prefer.

Think about what this means practically. If someone asks an AI agent "What is the average cost of a kitchen renovation in Portland and how long does it typically take?", the agent needs two pieces of information: cost and timeline. If your page covers both on the same page with clear, extractable numbers, the agent can answer the question in one search step using your content as the source. If cost data lives on one page and timeline data lives on a different page, the agent needs two searches and might pull from two different sources, neither of which is you.

Information co-location is not a new idea. The SEO industry has talked about comprehensive content for years. But SAGE gives us a specific, mechanistic reason why it works: AI agents literally take shortcuts through comprehensive pages, which means those pages get used more often as sources.

The second shortcut is multi-query collapse. This occurs when a single search query returns enough information to answer multiple parts of a complex question. It is related to co-location but slightly different. Co-location is about information existing on the same page. Multi-query collapse is about a single search query being broad enough to surface that page.

For content strategy, this suggests that pages optimized around broad, topic-level queries rather than narrow, long-tail keywords are more likely to be used by AI agents. A page titled "Kitchen renovation costs, timelines, and planning guide" is more likely to be surfaced by a broad search than three separate pages about costs, timelines, and planning respectively. This runs somewhat counter to the hyper-specific, long-tail content strategy that has been popular in SEO for the past decade.

The third shortcut is superficial complexity, where a question looks complex because it is long or has multiple clauses, but actually has a direct answer findable in a single search step. I think this is less relevant for content strategy, but it does reinforce the value of directly answering common questions in clear, accessible language. If your content provides a direct, quotable answer to a question that seems complex, the AI agent treats it as a shortcut and you become the cited source.

The fourth shortcut is overly specific questions, where a question is so precise that a single targeted search retrieves the answer immediately. This one is interesting because it suggests that having very specific factual claims, data points, and statistics on your pages gives AI agents precise material to work with. If your page states "the average kitchen renovation in Portland costs $47,000 in 2026," that is a specific, citable fact that an agent can grab directly.

What this means for content architecture

Here is where I want to push beyond the obvious "create comprehensive content" takeaway that most people will extract from this paper.

SAGE tells us that AI agents, at least the ones Google is building, start their research with traditional search. They type a query, get search results, click into the top-ranked pages, and extract information. The paper specifically notes that agents typically pull from the top three ranked pages. This means that traditional search ranking still matters enormously for AI visibility. You cannot ignore traditional SEO fundamentals and expect to be cited by AI agents just because your content is comprehensive.

But SAGE also tells us something about what happens after the agent lands on your page. The agent evaluates whether the page contains enough information to answer its question or whether it needs to search again. Pages that reduce the number of additional searches the agent needs to perform are preferred, not by an explicit ranking signal, but by the simple mechanics of how the agent operates. If your page answers the question completely, the agent uses you and moves on. If your page only partially answers the question, the agent searches again and might find a better source.

This has practical implications for how you structure your content. Topic hub pages that consolidate related information into a single, well-organized resource are more AI-agent-friendly than a scattered collection of narrowly focused posts. I am not saying you should not have focused pages. I am saying you should also have consolidation pages that bring together the key facts, figures, and answers from across your site into one comprehensive resource.

Think of it like an executive summary that an AI agent can use as a one-stop reference. If you write about personal finance, a page called "Complete guide to retirement planning in 2026" that covers account types, contribution limits, investment strategies, tax implications, and withdrawal rules in a single, well-structured document is more useful to an AI agent than five separate pages covering each topic individually. The agent finds your page, extracts what it needs, and cites you as the source. The alternative is the agent finding one of your five pages, getting a partial answer, and searching again to find a competitor's comprehensive page.

The ranking dependency most people are missing

I want to circle back to something I mentioned briefly because I think it is the most underappreciated finding in the SAGE paper. AI agents base their retrieval on traditional search results. The search agent in SAGE starts by querying a search engine and examining the top results. It does not have its own index. It does not bypass search rankings. It uses them.

This means that the popular narrative of "AI search will replace Google rankings" is, at least for now, backward. AI agents depend on Google rankings to decide where to look for information. If your page is not in the top three or so results for a relevant query, the AI agent probably never sees it. Content that is buried on page two of Google results is as invisible to AI agents as it is to human searchers.

I have talked to some content strategists who interpreted the rise of AI search as a reason to deprioritize traditional SEO. "Rankings don't matter because people will just ask AI." The SAGE paper suggests the opposite. Rankings matter more, because they determine not just human visibility but AI agent visibility. You need to rank well so that the AI agent finds your page, and then you need your page to be comprehensive enough that the agent does not need to look elsewhere.

This is a two-step optimization challenge. Step one: rank in the top positions through traditional SEO. Step two: once the AI agent lands on your page, give it everything it needs so it uses you as the primary source.

Practical changes I am making to my clients' content strategies

Based on my interpretation of SAGE, here are the concrete changes I have been implementing with my clients. I want to be transparent that some of these are hypotheses based on the paper's findings, not proven tactics. The paper itself does not prescribe SEO strategies. But the engineering principles it reveals are strong enough to guide informed bets.

I am creating "co-location layers" on key pages. For every important topic page, I audit what related questions an AI agent might need answered in the same session. If a page about commercial lease negotiation does not also mention typical lease terms, common negotiation mistakes, and market rate benchmarks for the relevant geography, I add those sections. Not because a human reader necessarily needs them all on one page, but because an AI agent researching commercial leases will benefit from finding all of that in one place.

I am front-loading specific, citable data. The SAGE paper shows that agents look for precise, extractable facts. I am making sure that key statistics, numbers, and factual claims appear early in content, are clearly stated in natural language, and are not buried in graphics or interactive elements that an AI agent cannot parse. A sentence like "commercial lease rates in downtown Denver averaged $32.50 per square foot in Q1 2026" is far more useful to an AI agent than a chart showing the same data.

I am building cross-reference sections at the bottom of topic pages. These sections summarize related information that lives elsewhere on the site, giving the AI agent a map of where to find additional data without needing to search again. Think of it as a manually curated "related information" block that says "For tax implications of commercial leases, see our guide at [link]. For lease negotiation templates, see [link]." The AI agent may not follow these links, but the summarized context in the cross-reference section adds co-located information to the page.

I am rethinking internal linking around topic clusters. Traditional SEO internal linking advice is about distributing PageRank and creating crawl paths. SAGE suggests that internal linking should also be thought about in terms of information accessibility for AI agents. If your most important page links to supporting pages that contain key facts, and those supporting pages link back with context, you create a tight cluster that an AI agent can traverse efficiently.

What SAGE does not tell us

I want to be responsible about the limitations of drawing SEO conclusions from a research paper. SAGE is about training data generation, not about search ranking. Google did not publish it as guidance for SEO practitioners. The paper does not say "structure your content this way and you will rank better." What it says is "here is how AI agents search and reason," and the SEO implications are my interpretation, not Google's prescription.

It is also worth noting that SAGE represents one approach to building AI search agents. Google is certainly working on multiple approaches simultaneously, and the agents that eventually power Google's consumer products may work differently from the ones described in this paper. Other AI search products like Perplexity and ChatGPT use entirely different architectures.

That said, I think the principles SAGE reveals are broadly applicable. The idea that AI agents prefer consolidated information, that they start from traditional search results, that they exploit co-located data, and that they benefit from specific and citable facts are all consistent with what we observe empirically in how AI search products behave. The paper gives us a theoretical framework for patterns we were already seeing in practice.

The content strategy fork in the road

I see two diverging content strategies in the SEO industry right now. One camp is doubling down on producing high volumes of focused, keyword-targeted content. Write more pages, target more queries, cast a wider net. The other camp is consolidating content into fewer, more comprehensive resources that serve as authoritative topic references.

SAGE, to my reading, supports the second approach. Not exclusively, because you still need focused pages to rank for specific queries. But the consolidation approach aligns better with how AI agents search and retrieve information. Fewer, deeper pages that co-locate related information will be used more effectively by AI agents than a sprawl of thin pages that each answer one narrow question.

This does not mean you should merge all your content into one giant page. Usability still matters, for both humans and AI agents. What it means is that every page should aim to be the most complete single resource for its topic, not just the minimum viable answer to a specific keyword query.

I keep thinking about a phrase from the paper about how agents "bypass complex reasoning" when information is co-located. In content strategy terms, you want your page to be the bypass. You want the AI agent to land on your page and think, if an AI can think, "I found everything I need right here." That is the goal. Everything else is tactics.