Article

AIOrobots.txttechnical - 2026-02-11

What is robots.txt and how to configure it for AI bots?

The file that controls which systems can crawl your site — and how to ensure AI bots have access

 
 
 
 

robots.txt is a text file placed at the root of the site (yourdomain.com/robots.txt) that informs crawling robots (bots) which pages can or cannot be accessed. It's the first file any bot reads before crawling a site. For AIO, the correct robots.txt configuration is one of the most critical — and most frequently overlooked — steps: an error in this file can make the site completely invisible to ChatGPT, Perplexity, Gemini, or Google AI Overview, regardless of content quality.

How robots.txt works

The structure is simple: the file lists pairs of User-agent (which bot) and Disallow or Allow (what it can or cannot access). Basic example:

``` User-agent: * Disallow: /admin/ Disallow: /checkout/

User-agent: Googlebot Allow: / ```

The * represents all unspecified bots. More specific rules (with the bot's name) override the generic rule for that bot.

The AI bots that need to be allowed

For the site's content to be accessible to the main AI platforms, the following user-agents need to be allowed (or not blocked by a generic rule):

BotCompanyPurpose
GPTBotOpenAICrawling for ChatGPT training
OAI-SearchBotOpenAIChatGPT real-time search (Browse)
PerplexityBotPerplexityCrawling for Perplexity responses
BingbotMicrosoftBing indexing (ChatGPT Browse + Copilot)
BingPreviewMicrosoftPage previews in Bing
Google-ExtendedGoogleGemini/Bard training
ClaudeBotAnthropicCrawling for Claude
anthropic-aiAnthropicAlternative Claude agent

The most common errors

Generic block of all bots: `` User-agent: * Disallow: / `` This pattern blocks absolutely all bots — including Googlebot, Bingbot, and all AI bots. It's common in development environments accidentally published to production, or in migrations where the old robots.txt was preserved.

Block to prevent generic scraping: some companies add blocks for bots that have scraped their content, but end up blocking legitimate AI bots in the process. Periodically checking that GPTBot, PerplexityBot, and Bingbot aren't on the blacklist is a good practice.

Outdated robots.txt from SEO plugins: SEO plugins like Yoast, Rank Math, and others generate robots.txt automatically. If plugin settings were changed without review, they may have created unintentional blocking rules.

Configuration examples by site type

E-commerce (allow AI bots, block only admin area): ``` User-agent: * Disallow: /admin/ Disallow: /cart/ Disallow: /checkout/ Disallow: /account/

User-agent: GPTBot Allow: /

User-agent: PerplexityBot Allow: /

User-agent: Bingbot Allow: / ```

Clinic site (allow AI bots, protect patient area): ``` User-agent: * Disallow: /patient-portal/ Disallow: /test-results/ Disallow: /appointment/confirmation/

User-agent: GPTBot Allow: / Allow: /specialties/ Allow: /blog/

User-agent: PerplexityBot Allow: / ```

Editorial content site (allow everything for AI bots): ``` User-agent: * Allow: /

Sitemap: https://yoursite.com/sitemap.xml ```

How to check the current robots.txt

Access yourdomain.com/robots.txt directly in the browser. If the file doesn't exist, it will return a 404 error — which means all bots have unrestricted access (default behavior when the file doesn't exist).

To check how a specific bot interprets the file, Google Search Console has the "robots.txt tester" tool under Settings > robots.txt.

FRT Digital audits robots.txt as part of the technical diagnosis of the AIO Score audit. It's one of the first items checked because a block here invalidates all other optimizations. Learn about the complete AIO service.

Enjoyed it? Then read more on the topic:

AIOSchema.orgstructured data - 2026-02-11

What is Schema.org and why does it matter for AIO?

The shared vocabulary that helps AIs understand what your content really is

Read
 
 
 
 
AIOllms.txttechnical - 2026-02-11

What is llms.txt and how to create it for my site?

The new file that instructs AI systems about what your site contains — and how to implement it for any industry

Read