Article
What is robots.txt and how to configure it for AI bots?
The file that controls which systems can crawl your site — and how to ensure AI bots have access
robots.txt is a text file placed at the root of the site (yourdomain.com/robots.txt) that informs crawling robots (bots) which pages can or cannot be accessed. It's the first file any bot reads before crawling a site. For AIO, the correct robots.txt configuration is one of the most critical — and most frequently overlooked — steps: an error in this file can make the site completely invisible to ChatGPT, Perplexity, Gemini, or Google AI Overview, regardless of content quality.
How robots.txt works
The structure is simple: the file lists pairs of User-agent (which bot) and Disallow or Allow (what it can or cannot access). Basic example:
``` User-agent: * Disallow: /admin/ Disallow: /checkout/
User-agent: Googlebot Allow: / ```
The * represents all unspecified bots. More specific rules (with the bot's name) override the generic rule for that bot.
The AI bots that need to be allowed
For the site's content to be accessible to the main AI platforms, the following user-agents need to be allowed (or not blocked by a generic rule):
| Bot | Company | Purpose |
|---|---|---|
GPTBot | OpenAI | Crawling for ChatGPT training |
OAI-SearchBot | OpenAI | ChatGPT real-time search (Browse) |
PerplexityBot | Perplexity | Crawling for Perplexity responses |
Bingbot | Microsoft | Bing indexing (ChatGPT Browse + Copilot) |
BingPreview | Microsoft | Page previews in Bing |
Google-Extended | Gemini/Bard training | |
ClaudeBot | Anthropic | Crawling for Claude |
anthropic-ai | Anthropic | Alternative Claude agent |
The most common errors
Generic block of all bots: `` User-agent: * Disallow: / `` This pattern blocks absolutely all bots — including Googlebot, Bingbot, and all AI bots. It's common in development environments accidentally published to production, or in migrations where the old robots.txt was preserved.
Block to prevent generic scraping: some companies add blocks for bots that have scraped their content, but end up blocking legitimate AI bots in the process. Periodically checking that GPTBot, PerplexityBot, and Bingbot aren't on the blacklist is a good practice.
Outdated robots.txt from SEO plugins: SEO plugins like Yoast, Rank Math, and others generate robots.txt automatically. If plugin settings were changed without review, they may have created unintentional blocking rules.
Configuration examples by site type
E-commerce (allow AI bots, block only admin area): ``` User-agent: * Disallow: /admin/ Disallow: /cart/ Disallow: /checkout/ Disallow: /account/
User-agent: GPTBot Allow: /
User-agent: PerplexityBot Allow: /
User-agent: Bingbot Allow: / ```
Clinic site (allow AI bots, protect patient area): ``` User-agent: * Disallow: /patient-portal/ Disallow: /test-results/ Disallow: /appointment/confirmation/
User-agent: GPTBot Allow: / Allow: /specialties/ Allow: /blog/
User-agent: PerplexityBot Allow: / ```
Editorial content site (allow everything for AI bots): ``` User-agent: * Allow: /
Sitemap: https://yoursite.com/sitemap.xml ```
How to check the current robots.txt
Access yourdomain.com/robots.txt directly in the browser. If the file doesn't exist, it will return a 404 error — which means all bots have unrestricted access (default behavior when the file doesn't exist).
To check how a specific bot interprets the file, Google Search Console has the "robots.txt tester" tool under Settings > robots.txt.
FRT Digital audits robots.txt as part of the technical diagnosis of the AIO Score audit. It's one of the first items checked because a block here invalidates all other optimizations. Learn about the complete AIO service.