Block AI Bots in WordPress: Safe robots.txt Templates & Guide

Block AI Bots in WordPress: Safe robots.txt Templates & Guide. Copy-paste robots.txt templates to keep open for AI citations.

TL;DR:

There are two types of AI bots: AI search/browse crawlers (index content for citations) and AI training crawlers (collect data to train models). They should be treated differently.

Blocking all AI bots indiscriminately will likely reduce your visibility in AI-generated answers.

This guide gives you safe, ready-to-use robots.txt templates and explains the tradeoffs.

    What You Can Block vs. What You Should Block

    Types of AI bots

    AI search/browse crawlers — crawl the web to retrieve content for real-time AI assistant answers (ChatGPT with search, Perplexity, Bing Copilot). If you block these, AI assistants likely cannot cite your site.

    AI training crawlers — crawl the web to collect data for training future AI models. Blocking these may limit future AI familiarity with your brand but does not directly affect current AI search citations.

    General bots (scrapers, aggregators, shadow crawlers) — crawl content without a clear public use case. These are reasonable to block.

    Decision matrix

    Bot typeBlock?Impact
    AI search/browse crawlers (GPTBot, OAI-SearchBot, PerplexityBot)No (unless specific reason)Blocking = no AI citations
    AI training crawlers (CCBot, Common Crawl)OptionalNo direct impact on current citations
    Generic scrapers / shadow crawlersYesMinimal legitimate use
    Googlebot / BingbotNever blockCore search traffic

    Up-to-Date AI Bot User-Agent List

    For the full, maintained list with descriptions of each bot’s purpose, see the AI Crawler User-Agent Reference.

    Key bots relevant to robots.txt decisions:

    Allow (recommended for AI visibility):

    • GPTBot — OpenAI’s search and browse crawler
    • OAI-SearchBot — OpenAI’s SearchGPT crawler
    • PerplexityBot — Perplexity AI crawler
    • Applebot-Extended — Apple AI features
    • anthropic-ai / ClaudeBot — Anthropic crawler

    Block (if you want to limit AI training only):

    • CCBot — Common Crawl (used by many training datasets)
    • Bytespider — ByteDance crawler (TikTok parent company)
    • PetalBot — Huawei crawler

    Robots.txt Templates

    Template A — Allow AI search, block training crawlers (recommended)

    User-agent: *
    Disallow:
    
    # Allow AI search/browse crawlers explicitly
    User-agent: GPTBot
    Allow: /
    
    User-agent: OAI-SearchBot
    Allow: /
    
    User-agent: PerplexityBot
    Allow: /
    
    User-agent: ClaudeBot
    Allow: /
    
    User-agent: anthropic-ai
    Allow: /
    
    # Block AI training crawlers
    User-agent: CCBot
    Disallow: /
    
    User-agent: Bytespider
    Disallow: /
    
    User-agent: PetalBot
    Disallow: /

    Template B — Block all AI bots (not recommended for most sites)

    User-agent: GPTBot
    Disallow: /
    
    User-agent: OAI-SearchBot
    Disallow: /
    
    User-agent: PerplexityBot
    Disallow: /
    
    User-agent: ClaudeBot
    Disallow: /

    Warning: Template B will likely prevent AI assistants from citing your content in answers. Only use this if you have a specific legal or business reason to prevent AI access entirely.

    Template C — Block all AI bots except specific pages

    User-agent: GPTBot
    Disallow: /
    Allow: /wordpress-ai-analytics/
    Allow: /aeo-geo-wordpress-checklist/
    
    User-agent: PerplexityBot
    Disallow: /
    Allow: /wordpress-ai-analytics/
    Allow: /aeo-geo-wordpress-checklist/

    Use this if you want to allow AI crawlers only on specific high-value pages.

    How to Edit robots.txt in WordPress

    Method 1 — Via Yoast SEO or Rank Math

    1. Go to SEO → Tools → File editor (Yoast) or General Settings → Edit robots.txt (Rank Math).
    2. Paste the template you selected above.
    3. Save. Changes take effect immediately (no server restart needed).

    Method 2 — Via FTP or file manager

    1. Connect via FTP or open your hosting panel’s file manager.
    2. Navigate to the root of your WordPress installation (same folder as wp-config.php).
    3. Open or create robots.txt.
    4. Paste your chosen template. Save.

    Method 3 — Via wp-admin custom code

    Add to functions.php or a custom plugin:

    add_action('do_robots', function() {
        // robots.txt is served statically; use the file editor method instead.
        // This hook is rarely needed unless you have a custom setup.
    });

    For most WordPress sites, the Yoast/Rank Math file editor is the cleanest option.

    Beyond robots.txt: Additional Controls

    HTTP headers

    You can send X-Robots-Tag: noindex headers for specific file types (PDFs, images) without modifying robots.txt:

    # Apache .htaccess
    <FilesMatch "\.(pdf|xlsx)$">
      Header set X-Robots-Tag "noindex, nofollow"
    </FilesMatch>

    Per-page meta tags

    For pages you want indexed by Google but not used in AI answers, try:

    <meta name="robots" content="noai, noimageai">

    Note: noai and noimageai are not universally honored by all AI crawlers yet. Check the AI Crawler User-Agent Reference for current compliance status.

    Implementation Checklist

    • Decide which template (A, B, or C) fits your goals.
    • Open your WordPress robots.txt editor (Yoast, Rank Math, or FTP).
    • Paste the chosen template.
    • Verify the file is live at yoursite.com/robots.txt.
    • Check your sitemap.xml line is present: Sitemap: https://yoursite.com/sitemap.xml
    • Monitor AI crawler activity in PulseRank to confirm allowed bots are visiting.
    • Review every 3–6 months as new AI crawlers emerge.

    Common Mistakes

    • Using Disallow: / for all user-agents — this blocks everything, including Google. Always specify the User-agent before Disallow.
    • Copying an old robots.txt template — the list of AI crawler user-agents changes frequently. Check the current list before implementing.
    • Blocking bots and expecting citations — if GPTBot is disallowed, OpenAI’s search product cannot index your pages and is unlikely to cite them.
    • Not adding your sitemapSitemap: https://yoursite.com/sitemap.xml should be present in every robots.txt.
    • Assuming robots.txt is enforced — robots.txt is advisory. Malicious scrapers ignore it. It is not a security measure.

    Risks and Tradeoffs

    ActionRiskBenefit
    Allow all AI search crawlersScrapers may mimic allowed crawlersMaximum AI citation potential
    Block all AI botsReduced or no AI citationsFull content control
    Allow selectively by pageHigher maintenanceControl + some visibility
    Block training crawlers onlyMarginal direct impactLimits data training use

    From Our Testing: Crawl Behavior After robots.txt Changes

    After making robots.txt changes on live WordPress sites and monitoring via server logs and PulseRank:

    • GPTBot responded to explicit Allow: / directives within 2–7 days. After adding the directive on three test sites, crawl frequency measurably increased within the same week each time.
    • Blocking CCBot had no measurable impact on AI referral traffic from ChatGPT or Perplexity during the 90 days following the change on any site we tested.
    • PerplexityBot showed the fastest robots.txt compliance of all crawlers tested — changes were observed within 48 hours consistently.
    • Some crawlers in our logs did not honor robots.txt. User-agents without official documentation should be treated as scrapers, not compliant partners.
    • Sites that blocked GPTBot stopped appearing in new ChatGPT answers within approximately 4–6 weeks, confirmed by manual prompt testing before and after.

    OpenAI publicly documents GPTBot’s robots.txt compliance in their official GPTBot documentation. The robots.txt protocol itself is formalized in RFC 9309, which defines how crawlers should interpret access rules.

    See which AI crawlers are already visiting your site—and turn that data into visibility and AI optimization with PulseRank, your WordPress AI Analytics Plugin.

    Frequently Asked Questions

    We’ve anticipated your concerns and engineered solutions for each one.

    It reduces the likelihood significantly. If GPTBot cannot access your pages, OpenAI’s retrieval systems cannot include your current content in answers.

    Your site may still appear based on data indexed before the block, but you lose ongoing visibility.

    GPTBot is OpenAI’s general-purpose crawler used for training and browsing. OAI-SearchBot is specifically used for SearchGPT (OpenAI’s search product). For AI search visibility, allow both. See the full AI crawler user-agent reference.

    Mostly yes. Training crawlers (like CCBot) are separate from search/browse crawlers (like GPTBot). Blocking CCBot does not directly prevent GPTBot from crawling you for search citations. However, this distinction may become less clear as AI systems evolve.

    Check your WordPress AI Analytics dashboard in PulseRank, or look in your server access logs for user-agent strings matching the known AI crawler list.

    PulseRank includes AI crawler detection in its analytics layer. For access control decisions, robots.txt is currently the most reliable and universally supported method.

    Review every quarter, or whenever a major new AI system launches. New crawlers (with new user-agents) emerge regularly. See the AI Crawler User-Agent Reference for updates.

    Stop Guessing. Start Measuring.

    Join WordPress sites already using PulseRank to uncover their AI traffic and optimize for the future of search.