Technical

Robots.txt Test

A robots.txt test validates that a website's robots.txt file correctly allows or blocks specific web crawlers from accessing pages, ensuring search engines and AI bots can crawl the content you want indexed.

Quick Answer

  • What it is: A robots.txt test validates that a website's robots.txt file correctly allows or blocks specific web crawlers from accessing pages, ensuring search engines and AI bots can crawl the content you want indexed.
  • Why it matters: A misconfigured robots.txt can block Google from crawling your entire site or prevent AI crawlers from citing your content. Testing catches these mistakes before they cost traffic.
  • How to check or improve: Use Google Search Console's robots.txt Tester, paste your file into a validator, and test specific URLs against specific user agents to confirm the rules work as intended.

When you'd use this

A misconfigured robots.txt can block Google from crawling your entire site or prevent AI crawlers from citing your content. Testing catches these mistakes before they cost traffic.

Example scenario

Hypothetical scenario (not a real company)

A team might use Robots.txt Test when Use Google Search Console's robots.txt Tester, paste your file into a validator, and test specific URLs against specific user agents to confirm the rules work as intended.

Common mistakes

  • Confusing Robots.txt Test with robots.txt: A text file placed in a website's root directory that instructs web crawlers which pages or sections of the site they can or cannot access, controlling how search engines and AI bots crawl your content.
  • Confusing Robots.txt Test with Crawlability: Crawlability is a core SEO concept that influences how search engines evaluate, surface, or interpret pages.
  • Confusing Robots.txt Test with Indexability: The ability of a web page to be added to a search engine's index, determined by technical factors like robots directives, canonical tags, and crawlability.

How to measure or implement

  • Use Google Search Console's robots
  • txt Tester, paste your file into a validator, and test specific URLs against specific user agents to confirm the rules work as intended

Check your AI search visibility with Rankwise

Start here
Updated Apr 8, 2026·4 min read

What Is a Robots.txt Test?

A robots.txt test checks whether your website's robots.txt file correctly instructs web crawlers on which pages they can and cannot access. It validates the file's syntax, confirms that important pages aren't accidentally blocked, and verifies that the rules apply correctly to specific crawlers like Googlebot, GPTBot, and PerplexityBot.

This matters because robots.txt errors are invisible. Your site looks fine to visitors, but search engines and AI tools may be blocked from crawling your content — silently destroying your search visibility.

How to Test Your Robots.txt

Method 1: Google Search Console Robots.txt Tester

The most reliable method for Google-specific validation:

  1. Open Google Search Console for your property
  2. Go to the robots.txt Tester (under Legacy tools and reports)
  3. Your current robots.txt loads automatically
  4. Enter a URL in the test field and select a user agent
  5. Click Test — it reports "Allowed" or "Blocked"

Method 2: Direct Browser Check

The simplest check — visit your robots.txt file directly:

  1. Open https://yoursite.com/robots.txt in a browser
  2. Verify the file loads (not a 404 or redirect)
  3. Read the rules — are important sections like /blog/ or /products/ allowed?
  4. Check for common mistakes: accidental Disallow: / blocking everything

Method 3: Command Line Check

For developers who prefer terminal:

curl -s https://yoursite.com/robots.txt

This shows the raw file content. Pipe to a validator or review manually.

What to Check in a Robots.txt Test

Critical Checks

CheckWhat to Verify
File existsyoursite.com/robots.txt returns 200, not 404
Not blocking everythingNo Disallow: / under User-agent: * (unless intentional)
Googlebot allowedGooglebot can access all important sections
Sitemap declaredSitemap: directive points to your XML sitemap
No syntax errorsCorrect User-agent:, Allow:, Disallow: format

AI Crawler Checks

With AI search growing, verify these crawlers too:

CrawlerPurposeShould You Allow?
GPTBotOpenAI training and ChatGPTAllow for AI visibility
ChatGPT-UserChatGPT real-time browsingAllow for citations
PerplexityBotPerplexity searchAllow for citations
Google-ExtendedGemini trainingAllow for AI visibility
ClaudeBotAnthropic trainingYour choice

See our full guide on robots.txt for AI crawlers for detailed configuration options.

Common Robots.txt Mistakes

Accidentally Blocking Your Entire Site

# WRONG — blocks all crawlers from everything
User-agent: *
Disallow: /

This removes your site from Google entirely. One of the most common and devastating mistakes.

Typos in Crawler Names

# WRONG — "GPT-Bot" doesn't match OpenAI's crawler
User-agent: GPT-Bot
Disallow: /

The correct name is GPTBot (no hyphen). Typos mean the rule is ignored.

Blocking CSS and JavaScript

# WRONG — prevents Google from rendering your pages
Disallow: /css/
Disallow: /js/

Google needs CSS and JS to render pages correctly. Blocking them causes indexing issues.

Missing Trailing Slash

# These are different
Disallow: /private    # blocks /private but also /privately, /privates
Disallow: /private/   # blocks only the /private/ directory

Be specific with your paths to avoid blocking unintended URLs.

FAQs

How often should I test my robots.txt?

Test after any change to the file, after site migrations, and at least quarterly as a routine check. Also test whenever you notice unexpected drops in search traffic or indexing issues in Search Console.

What happens if my robots.txt has errors?

Depends on the error. If it blocks Googlebot from important pages, those pages won't be indexed or ranked. If it blocks AI crawlers, your content won't be cited in AI search results. If it has syntax errors, crawlers may ignore the entire file or interpret rules incorrectly.

Does robots.txt affect page ranking?

Indirectly. Robots.txt controls crawling, not ranking. But if a page is blocked from crawling, it can't be indexed, and if it's not indexed, it can't rank. So a robots.txt blocking error effectively removes pages from search results.

Can robots.txt block AI from using my content?

Robots.txt can block AI crawlers from accessing your content going forward. However, AI models may already have your content from previous crawls or from other sources like Common Crawl. Robots.txt is a request, not enforcement — well-behaved crawlers respect it, but it's not a guarantee.

Put GEO into practice

Generate AI-optimized content that gets cited.

Try Rankwise Free
Newsletter

Stay ahead of AI search

Weekly insights on GEO and content optimization.