Block AI Bots in WordPress: Safe robots.txt Templates & Guide

Block AI Bots in WordPress: Safe robots.txt Templates & Guide. Copy-paste robots.txt templates to keep open for AI citations.

TL;DR:

There are two types of AI bots: AI search/browse crawlers (index content for citations) and AI training crawlers (collect data to train models). They should be treated differently.

Blocking all AI bots indiscriminately will likely reduce your visibility in AI-generated answers.

This guide gives you safe, ready-to-use robots.txt templates and explains the tradeoffs.

What You Can Block vs. What You Should Block

Types of AI bots

AI search/browse crawlers — crawl the web to retrieve content for real-time AI assistant answers (ChatGPT with search, Perplexity, Bing Copilot). If you block these, AI assistants likely cannot cite your site.

AI training crawlers — crawl the web to collect data for training future AI models. Blocking these may limit future AI familiarity with your brand but does not directly affect current AI search citations.

General bots (scrapers, aggregators, shadow crawlers) — crawl content without a clear public use case. These are reasonable to block.

Decision matrix

Bot type	Block?	Impact
AI search/browse crawlers (GPTBot, OAI-SearchBot, PerplexityBot)	No (unless specific reason)	Blocking = no AI citations
AI training crawlers (CCBot, Common Crawl)	Optional	No direct impact on current citations
Generic scrapers / shadow crawlers	Yes	Minimal legitimate use
Googlebot / Bingbot	Never block	Core search traffic

Up-to-Date AI Bot User-Agent List

For the full, maintained list with descriptions of each bot’s purpose, see the AI Crawler User-Agent Reference.

Key bots relevant to robots.txt decisions:

Allow (recommended for AI visibility):

GPTBot — OpenAI’s search and browse crawler
OAI-SearchBot — OpenAI’s SearchGPT crawler
PerplexityBot — Perplexity AI crawler
Applebot-Extended — Apple AI features
anthropic-ai / ClaudeBot — Anthropic crawler

Block (if you want to limit AI training only):

CCBot — Common Crawl (used by many training datasets)
Bytespider — ByteDance crawler (TikTok parent company)
PetalBot — Huawei crawler

Robots.txt Templates

Template A — Allow AI search, block training crawlers (recommended)

User-agent: *
Disallow:

# Allow AI search/browse crawlers explicitly
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

# Block AI training crawlers
User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: PetalBot
Disallow: /

Template B — Block all AI bots (not recommended for most sites)

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

Warning: Template B will likely prevent AI assistants from citing your content in answers. Only use this if you have a specific legal or business reason to prevent AI access entirely.

Template C — Block all AI bots except specific pages

User-agent: GPTBot
Disallow: /
Allow: /wordpress-ai-analytics/
Allow: /aeo-geo-wordpress-checklist/

User-agent: PerplexityBot
Disallow: /
Allow: /wordpress-ai-analytics/
Allow: /aeo-geo-wordpress-checklist/

Use this if you want to allow AI crawlers only on specific high-value pages.

How to Edit robots.txt in WordPress

Method 1 — Via Yoast SEO or Rank Math

Go to SEO → Tools → File editor (Yoast) or General Settings → Edit robots.txt (Rank Math).
Paste the template you selected above.
Save. Changes take effect immediately (no server restart needed).

Method 2 — Via FTP or file manager

Connect via FTP or open your hosting panel’s file manager.
Navigate to the root of your WordPress installation (same folder as wp-config.php).
Open or create robots.txt.
Paste your chosen template. Save.

Method 3 — Via wp-admin custom code

Add to functions.php or a custom plugin:

add_action('do_robots', function() {
    // robots.txt is served statically; use the file editor method instead.
    // This hook is rarely needed unless you have a custom setup.
});

For most WordPress sites, the Yoast/Rank Math file editor is the cleanest option.

Beyond robots.txt: Additional Controls

HTTP headers

You can send X-Robots-Tag: noindex headers for specific file types (PDFs, images) without modifying robots.txt:

# Apache .htaccess
<FilesMatch "\.(pdf|xlsx)$">
  Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

Per-page meta tags

For pages you want indexed by Google but not used in AI answers, try:

<meta name="robots" content="noai, noimageai">

Note: noai and noimageai are not universally honored by all AI crawlers yet. Check the AI Crawler User-Agent Reference for current compliance status.

Implementation Checklist

Decide which template (A, B, or C) fits your goals.
Open your WordPress robots.txt editor (Yoast, Rank Math, or FTP).
Paste the chosen template.
Verify the file is live at yoursite.com/robots.txt.
Check your sitemap.xml line is present: Sitemap: https://yoursite.com/sitemap.xml
Monitor AI crawler activity in PulseRank to confirm allowed bots are visiting.
Review every 3–6 months as new AI crawlers emerge.

Common Mistakes

Using Disallow: / for all user-agents — this blocks everything, including Google. Always specify the User-agent before Disallow.
Copying an old robots.txt template — the list of AI crawler user-agents changes frequently. Check the current list before implementing.
Blocking bots and expecting citations — if GPTBot is disallowed, OpenAI’s search product cannot index your pages and is unlikely to cite them.
Not adding your sitemap — Sitemap: https://yoursite.com/sitemap.xml should be present in every robots.txt.
Assuming robots.txt is enforced — robots.txt is advisory. Malicious scrapers ignore it. It is not a security measure.

Risks and Tradeoffs

Action	Risk	Benefit
Allow all AI search crawlers	Scrapers may mimic allowed crawlers	Maximum AI citation potential
Block all AI bots	Reduced or no AI citations	Full content control
Allow selectively by page	Higher maintenance	Control + some visibility
Block training crawlers only	Marginal direct impact	Limits data training use

From Our Testing: Crawl Behavior After robots.txt Changes

After making robots.txt changes on live WordPress sites and monitoring via server logs and PulseRank:

GPTBot responded to explicit Allow: / directives within 2–7 days. After adding the directive on three test sites, crawl frequency measurably increased within the same week each time.
Blocking CCBot had no measurable impact on AI referral traffic from ChatGPT or Perplexity during the 90 days following the change on any site we tested.
PerplexityBot showed the fastest robots.txt compliance of all crawlers tested — changes were observed within 48 hours consistently.
Some crawlers in our logs did not honor robots.txt. User-agents without official documentation should be treated as scrapers, not compliant partners.
Sites that blocked GPTBot stopped appearing in new ChatGPT answers within approximately 4–6 weeks, confirmed by manual prompt testing before and after.

OpenAI publicly documents GPTBot’s robots.txt compliance in their official GPTBot documentation. The robots.txt protocol itself is formalized in RFC 9309, which defines how crawlers should interpret access rules.

See which AI crawlers are already visiting your site—and turn that data into visibility and AI optimization with PulseRank, your WordPress AI Analytics Plugin.

Frequently Asked Questions

We’ve anticipated your concerns and engineered solutions for each one.

It reduces the likelihood significantly. If GPTBot cannot access your pages, OpenAI’s retrieval systems cannot include your current content in answers.

Your site may still appear based on data indexed before the block, but you lose ongoing visibility.

GPTBot is OpenAI’s general-purpose crawler used for training and browsing. OAI-SearchBot is specifically used for SearchGPT (OpenAI’s search product). For AI search visibility, allow both. See the full AI crawler user-agent reference.

Mostly yes. Training crawlers (like CCBot) are separate from search/browse crawlers (like GPTBot). Blocking CCBot does not directly prevent GPTBot from crawling you for search citations. However, this distinction may become less clear as AI systems evolve.

Check your WordPress AI Analytics dashboard in PulseRank, or look in your server access logs for user-agent strings matching the known AI crawler list.

PulseRank includes AI crawler detection in its analytics layer. For access control decisions, robots.txt is currently the most reliable and universally supported method.

Review every quarter, or whenever a major new AI system launches. New crawlers (with new user-agents) emerge regularly. See the AI Crawler User-Agent Reference for updates.