Technical

robots.txt

A text file placed in a website's root directory that instructs web crawlers which pages or sections of the site they can or cannot access, controlling how search engines and AI bots crawl your content.

Quick Answer

  • What it is: A text file placed in a website's root directory that instructs web crawlers which pages or sections of the site they can or cannot access, controlling how search engines and AI bots crawl your content.
  • Why it matters: Ensures search engines can crawl, index, and trust your site at scale.
  • How to check or improve: Check crawling directives, canonical tags, and response codes.

When you'd use this

Ensures search engines can crawl, index, and trust your site at scale.

Example scenario

Hypothetical scenario (not a real company)

A team might use robots.txt when Check crawling directives, canonical tags, and response codes.

Common mistakes

  • Confusing robots.txt with Canonical URL: The preferred version of a web page specified using the rel=canonical tag, telling search engines which URL to index when duplicate or similar content exists.
  • Confusing robots.txt with Crawl Budget: The number of pages a search engine crawler will visit on your website within a given timeframe, influenced by site size, server capacity, and content freshness.

How to measure or implement

  • Check crawling directives, canonical tags, and response codes

Check your site's indexability with Rankwise

Start here
Updated Jan 11, 2026·3 min read

The robots.txt file is a crucial technical file that determines which crawlers can access your website. With the rise of AI search, it's become even more important.

Basic robots.txt Structure

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /private/

Common Directives

  • User-agent: - Specifies which crawler the rule applies to
  • Allow: - Explicitly permits crawling of specified paths
  • Disallow: - Blocks crawling of specified paths
  • Sitemap: - Points to your sitemap location

AI Crawlers to Know

CrawlerCompanyPurpose
GPTBotOpenAIChatGPT training & search
OAI-SearchBotOpenAIChatGPT web search
PerplexityBotPerplexityPerplexity AI search
ClaudeBotAnthropicClaude AI
Google-ExtendedGoogleGemini AI training
Applebot-ExtendedAppleApple Intelligence

The AI Visibility Problem

Over 40% of websites accidentally block AI crawlers. If your robots.txt blocks these bots, your content cannot appear in AI search results—ever.

Checking Your robots.txt

Visit yoursite.com/robots.txt to see your current configuration. Use tools like Rankwise's AI Visibility Checker to test whether AI crawlers can access your content.

Best Practice

Unless you have specific reasons to block AI crawlers, allow them access:

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

Why this matters

robots.txt influences how search engines and users interpret your pages. When robots.txt is handled consistently, it reduces ambiguity and improves performance over time.

Common mistakes

  • Applying robots.txt inconsistently across templates
  • Ignoring how robots.txt interacts with canonical or index rules
  • Failing to validate robots.txt after releases
  • Over-optimizing robots.txt without checking intent
  • Leaving outdated robots.txt rules in production

How to check or improve robots.txt (quick checklist)

  1. Review your current robots.txt implementation on key templates.
  2. Validate robots.txt using Search Console and a crawl.
  3. Document standards for robots.txt to keep changes consistent.
  4. Monitor performance and update robots.txt as intent shifts.

Examples

Example 1: A site standardizes robots.txt and sees more stable indexing. Example 2: A team audits robots.txt and resolves hidden conflicts.

FAQs

What is robots.txt?

robots.txt is a core concept that affects how pages are evaluated.

Why does robots.txt matter?

Because it shapes visibility, relevance, and user expectations.

How do I improve robots.txt?

Use the checklist and verify changes across templates.

How often should I review robots.txt?

After major releases and at least quarterly for critical pages.

  • Guide: /resources/guides/robots-txt-for-ai-crawlers
  • Template: /templates/definitive-guide
  • Use case: /use-cases/saas-companies
  • Glossary:
    • /glossary/canonical-url
    • /glossary/crawl-budget

Put GEO into practice

Generate AI-optimized content that gets cited.

Try Rankwise Free
Newsletter

Stay ahead of AI search

Weekly insights on GEO and content optimization.