Google’s token auction: When LLMs write the ads in real time

Google’s token auction: When LLMs write the ads in real time

The world of PPC advertising is heading toward one of its most profound shifts. 

Until now, advertisers competed for slots on search results pages, placing ads that a platform simply displayed. 

But a new generation of large language models (LLMs) introduces a radical alternative: ads that aren’t chosen, but written in real time, based on an auction for the very words being generated.

That idea, which sounds like science fiction, is now backed by real research. 

Google Research and the University of Chicago published a paper outlining a theoretical and practical framework for how ad auctions could work in the age of generative AI.

The concept of embedding ads into AI-generated outputs isn’t new. 

Back in 2018, Google filed a patent titled “Using various AI entities as advertising mediums,” exploring how virtual assistants and chatbots could integrate sponsored messaging into conversations. 

But with today’s advanced LLMs, that early vision becomes far more dynamic: not just inserting ads into conversations, but letting the ad become the conversation.

Token auction: A generative ad model

In traditional Google Ads, advertisers bid on keywords. 

  • The system then selects a matching pre-written ad, based on price and quality. 
  • The creative is fixed, and the auction is about placement.

In the Token Auction model proposed by the research paper, the paradigm flips. 

Advertisers don’t bid for slots; they bid to shape the very words the LLM will generate.

Here’s how it works according to research:

  • Each advertiser submits a single bid.
  • Alongside the bid, they provide a language model representing their brand’s voice, tone, and messaging preferences.
  • The system generates the response token by token, weighing the influence of each advertiser’s model according to their bid.

Rather than picking a winner, the system blends multiple advertisers’ influences into the output. The higher the bid, the more the generated language shifts toward that advertiser’s voice.

To aggregate the competing models, the researchers explore two strategies:

  • Linear aggregation: A weighted average that maintains incentive compatibility and bid responsiveness.
  • Log-linear aggregation: A more complex method that can break incentive alignment under certain conditions.

Understanding the difference is key: advertisers who don’t grasp which aggregation model is used might overspend with minimal impact, a costly mistake in a new kind of auction economy.

The token auction model architecture. Source: Google Research
The token auction model architecture, Source: Google Research

Redefining brand preferences

In this framework, advertisers no longer submit static ad copy or landing pages. Instead, they “teach” an LLM to speak in their brand’s voice. 

This model acts as a dynamic representation of what the brand would say in any given context.

It’s no longer about crafting a single great headline; it’s about engineering a probabilistic system that reliably outputs brand-aligned language in real-time conversations.

Preserving privacy, reducing friction

Another key aspect of the proposed system is technical decoupling. 

The main generative model (e.g., the one responding to the user) does not directly access the internal logic of each advertiser’s LLM. 

Instead, each advertiser privately computes the probabilities of next tokens and submits them to the auction system.

This means brands can participate without revealing proprietary models or logic, while the central engine remains efficient and modular.

Get the newsletter search marketers rely on.


Business model: Paying for real influence

In a classic Vickrey auction, you only pay if your bid changes the outcome. 

The same applies here: an advertiser pays only when their influence causes the system to generate a token they prefer over what would have otherwise been selected.

The measure of this influence? 

A statistical metric called total variation distance (TVD), which quantifies how much the final output deviates from the default due to a given advertiser’s input.

It’s a shift from clicks and impressions to token-level ROI. 

For the first time, brand impact can be measured at the granularity of individual words.

The simulation: It actually works

The researchers tested the model using Gemma 7B, an open-source LLM, with two dummy advertisers: one formal, one casual. 

Each advertiser submitted different stylistic preferences through their model, and the results showed a clear correlation between higher bids and stronger influence over tone and wording.

Sample prompt: “What’s a good weekend activity?”

By adjusting the bid ratio between the two advertisers, the generated text shifted predictably toward one tone or the other. 

Graphs and tables in the paper illustrate how this influence can be modeled, tracked, and monetized.

Output generated by the two distribution aggregation functions, as a function of the relative weight of the bid by Alpha Airlines. Source: Google Research
Output generated by the two distribution aggregation functions, as a function of the relative weight of the bid by Alpha Airlines, Source: Google Research

This model doesn’t just apply to ads, it mirrors a broader shift from retrieval-based systems (like SEO) to generation-based visibility (aka GEO – generative engine optimization).

In SEO, you optimize a page to appear when searched. In GEO, you optimize your language to appear as part of the generated answer.

What token auction suggests is that paid GEO may soon follow, where the response itself is shaped in real time by the brands that pay to participate.

A glimpse at the ad system of the future

Hard to predict exactly what this will look like? Here’s one possible workflow:

  • A brand fine-tunes its own LLM to reflect its messaging style.
  • The campaign manager sets objectives (e.g., “be mentioned when someone asks about romantic getaways”) and places a bid.
  • When a user query triggers a generative response, the auction evaluates token-by-token, which advertisers have relevant models and active bids.
  • The final answer is written collaboratively by the system, shaped by the brands that influence its language.
  • In the advertiser dashboard, there are no clicks. Instead: token-level heatmaps, influence scores, and true cost per generated impression.

What it means for marketers

This isn’t about choosing which of your five ad versions performs best. It’s about influencing what the AI actually says.

  • No more static creatives: The system generates messaging live.
  • Language engineering beats copywriting: Success depends on probabilistic language modeling, not catchy taglines.
  • Multi-brand responses become standard: Multiple advertisers can appear in the same answer.
  • Presence over placement: Brands aim to shape the message, not just appear beside it.
  • A new model of ROI: Impact isn’t clicks or views, it’s influence over output.

Already starting: Google’s AI ads today

This isn’t just theoretical. 

In May 2025, Google began testing search and shopping ads inside AI Overviews and AI Mode. These sponsored messages now appear within generative results on both mobile and desktop.

More ads are now embedded inside AI answers than ever before, according to The Verge

They’re still traditional in format (labeled, clickable, and visually distinct), but the direction is clear: ads are moving into the content.

The future of paid advertising

It’s not just Google. Meta has announced that by the end of 2026, it aims to roll out fully automated paid ad campaigns, where businesses won’t even write ads. 

The advertiser simply provides a goal (“e.g., sell green running shoes”), and the system handles creative, targeting, testing, and optimization.

Taken together, the shifts we’re seeing from both Google and Meta reveal a future where paid media is no longer just about targeting or placement. 

It’s about collaboration with machines that generate, optimize, and deliver brand messaging on the fly.

We’re moving from:

  • Pre-written ads to AI-generated responses.
  • Manual optimization to real-time probabilistic influence.
  • Click-through rates to token-level brand presence.

Whether through Google’s token-level auctions or Meta’s fully automated campaign flow, the common thread is clear: paid advertising is becoming generative.

For marketers, the path forward will require new skills, new strategies, and a deep understanding of how to shape AI outputs without ever writing a traditional ad.