Block AI Bots in WordPress: Safe robots.txt Templates & Guide. Copy-paste robots.txt templates to keep open for AI citations.
TL;DR:
There are two types of AI bots: AI search/browse crawlers (index content for citations) and AI training crawlers (collect data to train models). They should be treated differently.
Blocking all AI bots indiscriminately will likely reduce your visibility in AI-generated answers.
This guide gives you safe, ready-to-use robots.txt templates and explains the tradeoffs.
What You Can Block vs. What You Should Block
Types of AI bots
AI search/browse crawlers — crawl the web to retrieve content for real-time AI assistant answers (ChatGPT with search, Perplexity, Bing Copilot). If you block these, AI assistants likely cannot cite your site.
AI training crawlers — crawl the web to collect data for training future AI models. Blocking these may limit future AI familiarity with your brand but does not directly affect current AI search citations.
General bots (scrapers, aggregators, shadow crawlers) — crawl content without a clear public use case. These are reasonable to block.
Decision matrix
| Bot type | Block? | Impact |
|---|---|---|
| AI search/browse crawlers (GPTBot, OAI-SearchBot, PerplexityBot) | No (unless specific reason) | Blocking = no AI citations |
| AI training crawlers (CCBot, Common Crawl) | Optional | No direct impact on current citations |
| Generic scrapers / shadow crawlers | Yes | Minimal legitimate use |
| Googlebot / Bingbot | Never block | Core search traffic |
Up-to-Date AI Bot User-Agent List
For the full, maintained list with descriptions of each bot’s purpose, see the AI Crawler User-Agent Reference.
Key bots relevant to robots.txt decisions:
Allow (recommended for AI visibility):
GPTBot— OpenAI’s search and browse crawlerOAI-SearchBot— OpenAI’s SearchGPT crawlerPerplexityBot— Perplexity AI crawlerApplebot-Extended— Apple AI featuresanthropic-ai/ClaudeBot— Anthropic crawler
Block (if you want to limit AI training only):
CCBot— Common Crawl (used by many training datasets)Bytespider— ByteDance crawler (TikTok parent company)PetalBot— Huawei crawler
Robots.txt Templates
Template A — Allow AI search, block training crawlers (recommended)
User-agent: *
Disallow:
# Allow AI search/browse crawlers explicitly
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
# Block AI training crawlers
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: PetalBot
Disallow: /Template B — Block all AI bots (not recommended for most sites)
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: ClaudeBot
Disallow: /Warning: Template B will likely prevent AI assistants from citing your content in answers. Only use this if you have a specific legal or business reason to prevent AI access entirely.
Template C — Block all AI bots except specific pages
User-agent: GPTBot
Disallow: /
Allow: /wordpress-ai-analytics/
Allow: /aeo-geo-wordpress-checklist/
User-agent: PerplexityBot
Disallow: /
Allow: /wordpress-ai-analytics/
Allow: /aeo-geo-wordpress-checklist/Use this if you want to allow AI crawlers only on specific high-value pages.
How to Edit robots.txt in WordPress
Method 1 — Via Yoast SEO or Rank Math
- Go to SEO → Tools → File editor (Yoast) or General Settings → Edit robots.txt (Rank Math).
- Paste the template you selected above.
- Save. Changes take effect immediately (no server restart needed).
Method 2 — Via FTP or file manager
- Connect via FTP or open your hosting panel’s file manager.
- Navigate to the root of your WordPress installation (same folder as
wp-config.php). - Open or create
robots.txt. - Paste your chosen template. Save.
Method 3 — Via wp-admin custom code
Add to functions.php or a custom plugin:
add_action('do_robots', function() {
// robots.txt is served statically; use the file editor method instead.
// This hook is rarely needed unless you have a custom setup.
});For most WordPress sites, the Yoast/Rank Math file editor is the cleanest option.
Beyond robots.txt: Additional Controls
HTTP headers
You can send X-Robots-Tag: noindex headers for specific file types (PDFs, images) without modifying robots.txt:
# Apache .htaccess
<FilesMatch "\.(pdf|xlsx)$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>Per-page meta tags
For pages you want indexed by Google but not used in AI answers, try:
<meta name="robots" content="noai, noimageai">Note:
noaiandnoimageaiare not universally honored by all AI crawlers yet. Check the AI Crawler User-Agent Reference for current compliance status.
Implementation Checklist
- Decide which template (A, B, or C) fits your goals.
- Open your WordPress robots.txt editor (Yoast, Rank Math, or FTP).
- Paste the chosen template.
- Verify the file is live at
yoursite.com/robots.txt. - Check your
sitemap.xmlline is present:Sitemap: https://yoursite.com/sitemap.xml - Monitor AI crawler activity in PulseRank to confirm allowed bots are visiting.
- Review every 3–6 months as new AI crawlers emerge.
Common Mistakes
- Using
Disallow: /for all user-agents — this blocks everything, including Google. Always specify theUser-agentbeforeDisallow. - Copying an old robots.txt template — the list of AI crawler user-agents changes frequently. Check the current list before implementing.
- Blocking bots and expecting citations — if
GPTBotis disallowed, OpenAI’s search product cannot index your pages and is unlikely to cite them. - Not adding your sitemap —
Sitemap: https://yoursite.com/sitemap.xmlshould be present in everyrobots.txt. - Assuming robots.txt is enforced — robots.txt is advisory. Malicious scrapers ignore it. It is not a security measure.
Risks and Tradeoffs
| Action | Risk | Benefit |
|---|---|---|
| Allow all AI search crawlers | Scrapers may mimic allowed crawlers | Maximum AI citation potential |
| Block all AI bots | Reduced or no AI citations | Full content control |
| Allow selectively by page | Higher maintenance | Control + some visibility |
| Block training crawlers only | Marginal direct impact | Limits data training use |
From Our Testing: Crawl Behavior After robots.txt Changes
After making robots.txt changes on live WordPress sites and monitoring via server logs and PulseRank:
- GPTBot responded to explicit
Allow: /directives within 2–7 days. After adding the directive on three test sites, crawl frequency measurably increased within the same week each time. - Blocking CCBot had no measurable impact on AI referral traffic from ChatGPT or Perplexity during the 90 days following the change on any site we tested.
- PerplexityBot showed the fastest robots.txt compliance of all crawlers tested — changes were observed within 48 hours consistently.
- Some crawlers in our logs did not honor robots.txt. User-agents without official documentation should be treated as scrapers, not compliant partners.
- Sites that blocked GPTBot stopped appearing in new ChatGPT answers within approximately 4–6 weeks, confirmed by manual prompt testing before and after.
OpenAI publicly documents GPTBot’s robots.txt compliance in their official GPTBot documentation. The robots.txt protocol itself is formalized in RFC 9309, which defines how crawlers should interpret access rules.
See which AI crawlers are already visiting your site—and turn that data into visibility and AI optimization with PulseRank, your WordPress AI Analytics Plugin.
Frequently Asked Questions
We’ve anticipated your concerns and engineered solutions for each one.
It reduces the likelihood significantly. If GPTBot cannot access your pages, OpenAI’s retrieval systems cannot include your current content in answers.
Your site may still appear based on data indexed before the block, but you lose ongoing visibility.
GPTBot is OpenAI’s general-purpose crawler used for training and browsing. OAI-SearchBot is specifically used for SearchGPT (OpenAI’s search product). For AI search visibility, allow both. See the full AI crawler user-agent reference.
Mostly yes. Training crawlers (like CCBot) are separate from search/browse crawlers (like GPTBot). Blocking CCBot does not directly prevent GPTBot from crawling you for search citations. However, this distinction may become less clear as AI systems evolve.
Check your WordPress AI Analytics dashboard in PulseRank, or look in your server access logs for user-agent strings matching the known AI crawler list.
PulseRank includes AI crawler detection in its analytics layer. For access control decisions, robots.txt is currently the most reliable and universally supported method.
Review every quarter, or whenever a major new AI system launches. New crawlers (with new user-agents) emerge regularly. See the AI Crawler User-Agent Reference for updates.
