Article
What is Google-Extended and what is it for?
The separate user-agent from Googlebot — and what allowing or blocking it means for Gemini
Google-Extended is a separate user-agent created by Google in 2023, used specifically to collect training data for Google's AI models — including Gemini (formerly Bard). It's distinct from Googlebot, which is used for Google Search indexing. This means a company can block Google-Extended without affecting its Google Search ranking — or allow Google-Extended without guaranteeing its site will appear in Gemini's responses.
The difference between Googlebot and Google-Extended
This distinction is one of the most important — and least known — in the AIO universe:
| Googlebot | Google-Extended | |
|---|---|---|
| Purpose | Google Search indexing | Training Google AI models (Gemini, Bard) |
| Impact if blocked | Site disappears from Google Search | Site doesn't feed Gemini training |
| Impact on Gemini responses | Indirect (via indexing) | Direct (via training data) |
| Created | 1990s | September 2023 |
Googlebot is still critical for appearing in Gemini with real-time search: when Gemini uses Google Search to answer questions with current information, it queries the Google index — which is fed by Googlebot. Therefore, blocking Googlebot affects Gemini visibility for real-time queries.
Google-Extended, in turn, affects the knowledge Gemini incorporates during training — which can influence responses that don't depend on real-time search.
Why Google created a separate user-agent
The separation was a response to growing demands from publishers and content creators who wanted to control how their content is used to train AIs — without needing to exit the Google Search index. With Google-Extended, a company can say: "you can index me for Google Search, but don't use my content to train Gemini."
This was especially relevant for: - Press organizations and publishers concerned about copyright - Subscription content platforms - Companies that prefer to negotiate the use of their data directly with Google
How to configure in robots.txt
To allow Google-Extended (default recommendation for most companies): `` User-agent: Google-Extended Allow: / ``
To block Google-Extended without affecting Google Search: ``` User-agent: Google-Extended Disallow: /
User-agent: Googlebot Allow: / ```
If robots.txt doesn't mention Google-Extended, the default behavior is that it follows the same rules as User-agent: *. If the generic rule allows crawling, Google-Extended can also crawl.
Does blocking Google-Extended affect Gemini responses?
Partially. The impact is mainly on Gemini's training knowledge — which affects responses based on the model's internal knowledge. For Gemini responses that use Google Search in real time, what matters is Googlebot and Google Search indexing.
For most companies, the recommendation is to allow Google-Extended — unless there's a specific reason (copyright, paid content, strategic decision). The cost of blocking is reduced Gemini knowledge about your business; the cost of allowing is practically zero for public content.
Examples of who might prefer to block: - News portal with subscription model and ongoing licensing negotiations with Google - Investment analysis platform with proprietary reports as the core product - Company with sensitive legal content that prefers to control distribution
Examples of who should allow: - Clinic with health blog — wants Gemini to learn from its specialized content - School with public educational material — wants to be cited as an educational reference - IT company with technical documentation — wants visibility in technical Gemini queries
FRT Digital includes Google-Extended analysis in the technical audit of the AIO Score audit. Learn about the AIO service.