What is robots.txt and how to configure it for AI bots?

The file that controls which systems can crawl your site — and how to ensure AI bots have access

robots.txt is a text file placed at the root of the site (yourdomain.com/robots.txt) that informs crawling robots (bots) which pages can or cannot be accessed. It's the first file any bot reads before crawling a site. For AIO, the correct robots.txt configuration is one of the most critical — and most frequently overlooked — steps: an error in this file can make the site completely invisible to ChatGPT, Perplexity, Gemini, or Google AI Overview, regardless of content quality.

How robots.txt works

The structure is simple: the file lists pairs of User-agent (which bot) and Disallow or Allow (what it can or cannot access). Basic example:

User-agent: *
Disallow: /admin/
Disallow: /checkout/

User-agent: Googlebot
Allow: /

The * represents all unspecified bots. More specific rules (with the bot's name) override the generic rule for that bot.

The AI bots that need to be allowed

For the site's content to be accessible to the main AI platforms, the following user-agents need to be allowed (or not blocked by a generic rule):

Bot	Company	Purpose
`GPTBot`	OpenAI	Crawling for ChatGPT training
`OAI-SearchBot`	OpenAI	ChatGPT real-time search (Browse)
`PerplexityBot`	Perplexity	Crawling for Perplexity responses
`Bingbot`	Microsoft	Bing indexing (ChatGPT Browse + Copilot)
`BingPreview`	Microsoft	Page previews in Bing
`Google-Extended`	Google	Gemini/Bard training
`ClaudeBot`	Anthropic	Crawling for Claude
`anthropic-ai`	Anthropic	Alternative Claude agent

The most common errors

Generic block of all bots:

User-agent: *
Disallow: /

This pattern blocks absolutely all bots — including Googlebot, Bingbot, and all AI bots. It's common in development environments accidentally published to production, or in migrations where the old robots.txt was preserved.

Block to prevent generic scraping: some companies add blocks for bots that have scraped their content, but end up blocking legitimate AI bots in the process. Periodically checking that GPTBot, PerplexityBot, and Bingbot aren't on the blacklist is a good practice.

Outdated robots.txt from SEO plugins: SEO plugins like Yoast, Rank Math, and others generate robots.txt automatically. If plugin settings were changed without review, they may have created unintentional blocking rules.

Configuration examples by site type

E-commerce (allow AI bots, block only admin area):

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Bingbot
Allow: /

Clinic site (allow AI bots, protect patient area):

User-agent: *
Disallow: /patient-portal/
Disallow: /test-results/
Disallow: /appointment/confirmation/

User-agent: GPTBot
Allow: /
Allow: /specialties/
Allow: /blog/

User-agent: PerplexityBot
Allow: /

Editorial content site (allow everything for AI bots):

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

How to check the current robots.txt

Access yourdomain.com/robots.txt directly in the browser. If the file doesn't exist, it will return a 404 error — which means all bots have unrestricted access (default behavior when the file doesn't exist).

To check how a specific bot interprets the file, Google Search Console has the "robots.txt tester" tool under Settings > robots.txt.

FRT Digital audits robots.txt as part of the technical diagnosis of the AIO Score audit. It's one of the first items checked because a block here invalidates all other optimizations. Learn about the complete AIO service.

Ready to take the next step?