Article
What is GPTBot and should I allow it in robots.txt?
OpenAI's crawler that feeds ChatGPT — and the decision every company needs to make
GPTBot is OpenAI's crawling agent — the robot that traverses web pages to collect content used both in training GPT models and in ChatGPT's real-time searches (when web browsing is active). If your site's robots.txt blocks GPTBot, ChatGPT has no access to your content to cite in responses — regardless of how well-structured or relevant that content is. The decision to allow or block depends on the type of content and the business's strategic objectives.
What GPTBot does exactly
GPTBot operates in two main contexts:
Training crawling: OpenAI uses GPTBot to collect public web data that feeds future versions of GPT models. Content collected during this phase may be incorporated into the model's internal knowledge — available in responses even without real-time search.
Real-time search (OAI-SearchBot): when the user activates web browsing in ChatGPT or the model determines it needs current information, the OAI-SearchBot (a separate OpenAI agent) searches via Bing and accesses pages in real time. For this function, Bingbot also needs to be allowed — because ChatGPT queries the Bing index, not direct crawling.
Why most companies should allow GPTBot
For most businesses — especially those whose site has educational, informational, or public commercial content — allowing GPTBot is the decision that maximizes ChatGPT visibility. Examples:
Accounting firm: articles about tax regimes, business formation, and tax planning are exactly the type of content ChatGPT uses to answer small business questions. Blocking GPTBot means the firm doesn't appear when an entrepreneur asks ChatGPT about these topics.
Building materials store: guides on calculating cement quantity, comparisons between types of paint, or installation tutorials are highly citable content. Blocked, they miss the visibility opportunity.
HR platform: articles about labor law, mandatory benefits, vacation and severance calculations are researched by managers in ChatGPT daily. Being present in those results has direct value for lead generation.
When it may make sense to block GPTBot
There are legitimate cases where blocking GPTBot is a strategic decision:
- Paid proprietary content: if the site has subscription-based courses, reports, or analyses, it may not want that content incorporated into OpenAI's training without compensation
- Sensitive data: even if pages with sensitive data are protected by login, it's a security best practice to explicitly exclude those sections in robots.txt
- Legal or privacy concerns: in some regulated industries (healthcare, finance), there are concerns about how content is reused by third-party systems
OpenAI allows site owners to block GPTBot without penalty — the company respects the robots.txt file. But the cost is ChatGPT invisibility.
How to configure in robots.txt
To allow GPTBot completely: `` User-agent: GPTBot Allow: / ``
To allow only public sections and block paid content: `` User-agent: GPTBot Allow: /blog/ Allow: /services/ Allow: /about/ Disallow: /exclusive-content/ Disallow: /reports/ ``
To block completely: `` User-agent: GPTBot Disallow: / ``
How to check if GPTBot is currently being blocked
Access yourdomain.com/robots.txt and check if there's any rule with User-agent: GPTBot and Disallow. If there's no specific rule, check if the generic rule User-agent: * has a Disallow: / — which would block all bots, including GPTBot.
FRT Digital checks GPTBot and other AI bot status as part of the AIO Score audit. An unintentional block at this level eliminates all ChatGPT citation potential before any content optimization. Learn about the AIO service.