What is Google-Extended and what is it for?

The separate user-agent from Googlebot — and what allowing or blocking it means for Gemini

Google-Extended is a separate user-agent created by Google in 2023, used specifically to collect training data for Google's AI models — including Gemini (formerly Bard). It's distinct from Googlebot, which is used for Google Search indexing. This means a company can block Google-Extended without affecting its Google Search ranking — or allow Google-Extended without guaranteeing its site will appear in Gemini's responses.

The difference between Googlebot and Google-Extended

This distinction is one of the most important — and least known — in the AIO universe:

	Googlebot	Google-Extended
Purpose	Google Search indexing	Training Google AI models (Gemini, Bard)
Impact if blocked	Site disappears from Google Search	Site doesn't feed Gemini training
Impact on Gemini responses	Indirect (via indexing)	Direct (via training data)
Created	1990s	September 2023

Googlebot is still critical for appearing in Gemini with real-time search: when Gemini uses Google Search to answer questions with current information, it queries the Google index — which is fed by Googlebot. Therefore, blocking Googlebot affects Gemini visibility for real-time queries.

Google-Extended, in turn, affects the knowledge Gemini incorporates during training — which can influence responses that don't depend on real-time search.

Why Google created a separate user-agent

The separation was a response to growing demands from publishers and content creators who wanted to control how their content is used to train AIs — without needing to exit the Google Search index. With Google-Extended, a company can say: "you can index me for Google Search, but don't use my content to train Gemini."

This was especially relevant for: - Press organizations and publishers concerned about copyright - Subscription content platforms - Companies that prefer to negotiate the use of their data directly with Google

How to configure in robots.txt

To allow Google-Extended (default recommendation for most companies):

User-agent: Google-Extended
Allow: /

To block Google-Extended without affecting Google Search:

User-agent: Google-Extended
Disallow: /

User-agent: Googlebot
Allow: /

If robots.txt doesn't mention Google-Extended, the default behavior is that it follows the same rules as User-agent: *. If the generic rule allows crawling, Google-Extended can also crawl.

Does blocking Google-Extended affect Gemini responses?

Partially. The impact is mainly on Gemini's training knowledge — which affects responses based on the model's internal knowledge. For Gemini responses that use Google Search in real time, what matters is Googlebot and Google Search indexing.

For most companies, the recommendation is to allow Google-Extended — unless there's a specific reason (copyright, paid content, strategic decision). The cost of blocking is reduced Gemini knowledge about your business; the cost of allowing is practically zero for public content.

Examples of who might prefer to block:

News portal with subscription model and ongoing licensing negotiations with Google
Investment analysis platform with proprietary reports as the core product
Company with sensitive legal content that prefers to control distribution

Examples of who should allow:

Clinic with health blog — wants Gemini to learn from its specialized content
School with public educational material — wants to be cited as an educational reference
IT company with technical documentation — wants visibility in technical Gemini queries

FRT Digital includes Google-Extended analysis in the technical audit of the AIO Score audit. Learn about the AIO service.

Ready to take the next step?