The robots.txt file is a crucial technical file that determines which crawlers can access your website. With the rise of AI search, it's become even more important.
Basic robots.txt Structure
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /private/
Common Directives
User-agent:- Specifies which crawler the rule applies toAllow:- Explicitly permits crawling of specified pathsDisallow:- Blocks crawling of specified pathsSitemap:- Points to your sitemap location
AI Crawlers to Know
| Crawler | Company | Purpose |
|---|---|---|
| GPTBot | OpenAI | ChatGPT training & search |
| OAI-SearchBot | OpenAI | ChatGPT web search |
| PerplexityBot | Perplexity | Perplexity AI search |
| ClaudeBot | Anthropic | Claude AI |
| Google-Extended | Gemini AI training | |
| Applebot-Extended | Apple | Apple Intelligence |
The AI Visibility Problem
Over 40% of websites accidentally block AI crawlers. If your robots.txt blocks these bots, your content cannot appear in AI search results—ever.
Checking Your robots.txt
Visit yoursite.com/robots.txt to see your current configuration. Use tools like Rankwise's AI Visibility Checker to test whether AI crawlers can access your content.
Best Practice
Unless you have specific reasons to block AI crawlers, allow them access:
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
Why this matters
robots.txt influences how search engines and users interpret your pages. When robots.txt is handled consistently, it reduces ambiguity and improves performance over time.
Common mistakes
- Applying robots.txt inconsistently across templates
- Ignoring how robots.txt interacts with canonical or index rules
- Failing to validate robots.txt after releases
- Over-optimizing robots.txt without checking intent
- Leaving outdated robots.txt rules in production
How to check or improve robots.txt (quick checklist)
- Review your current robots.txt implementation on key templates.
- Validate robots.txt using Search Console and a crawl.
- Document standards for robots.txt to keep changes consistent.
- Monitor performance and update robots.txt as intent shifts.
Examples
Example 1: A site standardizes robots.txt and sees more stable indexing. Example 2: A team audits robots.txt and resolves hidden conflicts.
FAQs
What is robots.txt?
robots.txt is a core concept that affects how pages are evaluated.
Why does robots.txt matter?
Because it shapes visibility, relevance, and user expectations.
How do I improve robots.txt?
Use the checklist and verify changes across templates.
How often should I review robots.txt?
After major releases and at least quarterly for critical pages.
Related resources
- Guide: /resources/guides/robots-txt-for-ai-crawlers
- Template: /templates/definitive-guide
- Use case: /use-cases/saas-companies
- Glossary:
- /glossary/canonical-url
- /glossary/crawl-budget