What Is a Robots.txt Test?
A robots.txt test checks whether your website's robots.txt file correctly instructs web crawlers on which pages they can and cannot access. It validates the file's syntax, confirms that important pages aren't accidentally blocked, and verifies that the rules apply correctly to specific crawlers like Googlebot, GPTBot, and PerplexityBot.
This matters because robots.txt errors are invisible. Your site looks fine to visitors, but search engines and AI tools may be blocked from crawling your content — silently destroying your search visibility.
How to Test Your Robots.txt
Method 1: Google Search Console Robots.txt Tester
The most reliable method for Google-specific validation:
- Open Google Search Console for your property
- Go to the robots.txt Tester (under Legacy tools and reports)
- Your current robots.txt loads automatically
- Enter a URL in the test field and select a user agent
- Click Test — it reports "Allowed" or "Blocked"
Method 2: Direct Browser Check
The simplest check — visit your robots.txt file directly:
- Open
https://yoursite.com/robots.txtin a browser - Verify the file loads (not a 404 or redirect)
- Read the rules — are important sections like
/blog/or/products/allowed? - Check for common mistakes: accidental
Disallow: /blocking everything
Method 3: Command Line Check
For developers who prefer terminal:
curl -s https://yoursite.com/robots.txt
This shows the raw file content. Pipe to a validator or review manually.
What to Check in a Robots.txt Test
Critical Checks
| Check | What to Verify |
|---|---|
| File exists | yoursite.com/robots.txt returns 200, not 404 |
| Not blocking everything | No Disallow: / under User-agent: * (unless intentional) |
| Googlebot allowed | Googlebot can access all important sections |
| Sitemap declared | Sitemap: directive points to your XML sitemap |
| No syntax errors | Correct User-agent:, Allow:, Disallow: format |
AI Crawler Checks
With AI search growing, verify these crawlers too:
| Crawler | Purpose | Should You Allow? |
|---|---|---|
| GPTBot | OpenAI training and ChatGPT | Allow for AI visibility |
| ChatGPT-User | ChatGPT real-time browsing | Allow for citations |
| PerplexityBot | Perplexity search | Allow for citations |
| Google-Extended | Gemini training | Allow for AI visibility |
| ClaudeBot | Anthropic training | Your choice |
See our full guide on robots.txt for AI crawlers for detailed configuration options.
Common Robots.txt Mistakes
Accidentally Blocking Your Entire Site
# WRONG — blocks all crawlers from everything
User-agent: *
Disallow: /
This removes your site from Google entirely. One of the most common and devastating mistakes.
Typos in Crawler Names
# WRONG — "GPT-Bot" doesn't match OpenAI's crawler
User-agent: GPT-Bot
Disallow: /
The correct name is GPTBot (no hyphen). Typos mean the rule is ignored.
Blocking CSS and JavaScript
# WRONG — prevents Google from rendering your pages
Disallow: /css/
Disallow: /js/
Google needs CSS and JS to render pages correctly. Blocking them causes indexing issues.
Missing Trailing Slash
# These are different
Disallow: /private # blocks /private but also /privately, /privates
Disallow: /private/ # blocks only the /private/ directory
Be specific with your paths to avoid blocking unintended URLs.
FAQs
How often should I test my robots.txt?
Test after any change to the file, after site migrations, and at least quarterly as a routine check. Also test whenever you notice unexpected drops in search traffic or indexing issues in Search Console.
What happens if my robots.txt has errors?
Depends on the error. If it blocks Googlebot from important pages, those pages won't be indexed or ranked. If it blocks AI crawlers, your content won't be cited in AI search results. If it has syntax errors, crawlers may ignore the entire file or interpret rules incorrectly.
Does robots.txt affect page ranking?
Indirectly. Robots.txt controls crawling, not ranking. But if a page is blocked from crawling, it can't be indexed, and if it's not indexed, it can't rank. So a robots.txt blocking error effectively removes pages from search results.
Can robots.txt block AI from using my content?
Robots.txt can block AI crawlers from accessing your content going forward. However, AI models may already have your content from previous crawls or from other sources like Common Crawl. Robots.txt is a request, not enforcement — well-behaved crawlers respect it, but it's not a guarantee.