Every AI crawler that matters — and what each one does

Updated June 4, 2026 · 6 min read

Not all AI crawlers do the same job, and treating them the same is how businesses accidentally make themselves invisible. "Search" crawlers fetch live pages to build the answer a user sees right now; "training" crawlers build the background knowledge that makes a model recommend you unprompted. Here's who's who.

Search crawlers (these decide if you're cited today)

OAI-SearchBot — builds ChatGPT's web answers.
ChatGPT-User — fetches pages when a user asks ChatGPT about something specific.
PerplexityBot — indexes pages for Perplexity's answers.
Perplexity-User — fetches pages for live Perplexity answers.
Claude-SearchBot and Claude-User — build Claude's web answers and fetch pages on request.

Training crawlers (these build long-term brand knowledge)

GPTBot — OpenAI's model-training crawler.
ClaudeBot — Anthropic's training crawler.
Google-Extended — controls whether your content is used in Gemini and AI Overviews.
Applebot-Extended — controls use in Apple Intelligence.
CCBot — Common Crawl, the open dataset many models train on.
Meta-ExternalAgent — trains and powers Meta AI.

How to allow them without thinking about it

The safest default for a business that wants to be found is to allow these user-agents in robots.txt and avoid blanket Disallow rules that catch them. If you want to be cited in live answers but stay out of training datasets, allow the search crawlers and selectively disallow the training ones — but know that doing so shrinks how well models know your brand over time.

Check which ones can see you

A readiness audit tests each crawler against your robots.txt and tells you, engine by engine, who's allowed and who's blocked — so you can fix the accidental blocks and keep the deliberate ones.

See where your site stands in AI search

Run a free AI Search Readiness audit and get your score plus the exact fixes — or see whether AI already recommends you.

Frequently asked questions

Should I block AI crawlers?

For most businesses that want customers to find them, no — blocking search crawlers removes you from AI answers. Publishers protecting paid content sometimes block training crawlers deliberately, but that's a different goal.

Is blocking a crawler in robots.txt enough to stop it?

Reputable AI crawlers respect robots.txt. It's the right control for the major engines, though it relies on each crawler honoring the standard.