Can I Block AI Bots from Training on My Store But Still Show Up in Search?

Lawrence Dauchy·2026년 4월 15일

By Lawrence Dauchy - 15th of April

Yes, in many cases you can block AI training crawlers while still appearing in search. The key is to separate training-related bots from search crawlers. OpenAI documents that site owners can allow OAI-SearchBot for ChatGPT search while disallowing GPTBot for model training, and Google says Google-Extended does not affect inclusion in Google Search.
The confusion starts because “AI bots” is doing too much work in one phrase. Some crawlers help index pages for search. Others are used for training or related AI uses. Some platforms also blend AI features into search itself, which means blocking the wrong crawler can remove you from search visibility, not just from training. This article explains the distinction, what you can safely block, and where the limits are.

What does “block AI bots” actually mean?

It usually means one of three different things.
The first is blocking training crawlers. This is the cleanest case. Google says Google-Extended lets publishers control whether content may be used for training future Gemini models and certain grounding uses, and that this token does not impact inclusion in Google Search. OpenAI says a webmaster can disallow GPTBot while still allowing OAI-SearchBot so the site can appear in ChatGPT search.
The second is blocking search crawlers. That is a different decision. Google says blocking Googlebot affects Google Search, including Discover and other Google Search features, and reminds site owners that blocking crawl access is not the same as preventing indexing.
The third is limiting what search can show from your pages. Google says that for AI features in Search, site owners should use nosnippet, data-nosnippet, max-snippet, or noindex if they want to restrict what appears, because AI is built into Search and controlled through Googlebot. Bing similarly says robots.txt controls crawl access, not indexing, and that NOINDEX is the control to keep a URL out of Bing search, Copilot experiences, or grounding API results.

The practical answer for store owners

For most ecommerce sites, the workable setup is this: block training-specific bots, keep search bots allowed, and use page-level controls only where you truly do not want indexing or snippets. That preserves standard search visibility while reducing some forms of training use.
In practice, that often means:
keep Googlebot allowed if you want to remain visible in Google Search and Google’s search features
keep Bingbot allowed if you want Bing indexing and Bing-powered search visibility, while using NOINDEX where removal is needed
keep OAI-SearchBot allowed if you want your pages surfaced, cited, and linked in ChatGPT search results
block GPTBot if you want to exclude those pages from potential OpenAI training use
consider Google-Extended if you want to restrict Google’s training and certain Gemini-related grounding uses without affecting Google Search inclusion

How does this work in robots.txt?

A basic pattern looks like this:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: OAI-SearchBot
Allow: /
The logic behind that setup is supported by current platform docs: OpenAI says GPTBot and OAI-SearchBot are independent controls, and Google documents Google-Extended as a separate product token that does not affect Search inclusion.
That does not mean every AI use disappears. Google says Google-Extended also relates to some grounding in Gemini products, and OpenAI notes that ChatGPT search may still show a link and page title for a disallowed page if the URL is discovered from third-party sources and appears relevant.

Where the line gets messy

The biggest complication is that search and AI are no longer cleanly separate products.
Google says AI is built into Search, which is why Googlebot remains the main control for Search crawling, including AI features in Search. So if you block Googlebot broadly, you are not just opting out of some AI layer. You are affecting ordinary Google Search visibility too.
Google also says Google-Extended covers not only training for future Gemini models but also certain grounding uses in Gemini apps and Vertex AI. That means you can remain in Google Search while still restricting some non-Search Gemini uses, but it is not the same as opting out of all AI-powered answer surfaces everywhere.
OpenAI has a similar split. Their docs say OAI-SearchBot is for search visibility in ChatGPT’s search features, while GPTBot controls training-related access. But OpenAI also says disallowed pages may still appear as navigational links in some cases unless you use noindex, and a crawler needs access to read that meta tag.

What should an ecommerce store block, and what should it leave alone?

For most stores, the safer approach is to protect the content you care about most for commercial search while avoiding blunt, sitewide blocking of search bots.
A sensible priority looks like this:
Do not block Googlebot or Bingbot if search traffic matters. Google says blocking Googlebot affects Google Search and related Search features, and Bing says robots.txt is crawl control rather than indexing control.
Allow OAI-SearchBot if you want to appear in ChatGPT search results with summaries, snippets, and links.
Block GPTBot if you want to exclude store content from potential OpenAI training use.
Use Google-Extended if your concern is Google’s model training or certain Gemini grounding uses, but you still want Google Search visibility.
Use noindex or snippet controls selectively, not casually, because those controls affect whether pages appear or how much of them can be shown. Google says noindex, nosnippet, data-nosnippet, and max-snippet are the right controls for limiting content shown in Search AI features.

What are the common mistakes?

The first mistake is treating robots.txt as a universal privacy switch. Google says robots.txt is not the mechanism for keeping a page out of Google, and Googlebot documentation repeats that blocking crawl does not necessarily stop a URL from appearing in search results. Bing makes the same distinction by saying robots.txt controls crawl access, not indexing.
The second mistake is blocking the wrong bot. A store owner may block Googlebot or OAI-SearchBot thinking that only training use will stop, when the real effect is reduced search visibility in Google Search or ChatGPT search.
The third mistake is assuming one setting covers every AI surface. It does not. Google separates Googlebot from Google-Extended, and OpenAI separates OAI-SearchBot from GPTBot. Different platforms have different crawler models and different product boundaries.

What should you do next?

Start by deciding what you actually want to preserve: Google search traffic, Bing visibility, ChatGPT search exposure, or all three. Then map the relevant user agents before changing anything. For most stores, the priority is to keep search crawlers open and block only the training-specific controls that the platforms explicitly separate.
After that, test carefully. Google says robots.txt changes need to be crawled and processed, and OpenAI says ChatGPT search can take about 24 hours to adjust after a robots.txt update.

Frequently asked questions

Can I block ChatGPT training but still appear in ChatGPT search?
Yes. OpenAI says OAI-SearchBot and GPTBot are independent controls, and specifically notes that a webmaster can allow OAI-SearchBot for search while disallowing GPTBot for training.
Can I block Google’s AI training and still rank in Google Search?
Yes, in the way Google currently documents it. Google says Google-Extended does not affect a site’s inclusion in Google Search and is not used as a ranking signal in Search.
If I block Googlebot, will I still show in AI Overviews or AI Mode?
That is the wrong control to use if you want to stay visible. Google says AI features in Search are controlled through Googlebot, because AI is built into Search, and blocking Googlebot affects Google Search and related Search features.
Is robots.txt enough if I want a page fully removed from search?
Usually no. Google says robots.txt is not a mechanism for keeping a page out of Google, and Bing says NOINDEX is the control to keep a URL out of Bing search, Copilot experiences, or grounding API results.
Can a blocked page still appear as a link somewhere?
Yes, depending on the platform and how the URL is discovered. OpenAI says a disallowed page may still appear as a link and title in ChatGPT Atlas if the URL is found via a third-party provider or other crawl signals; they point to noindex for stronger exclusion, while noting the crawler must be allowed to read the meta tag.

Key takeaways

You can often block training-specific bots without losing normal search visibility, but only if the platform separates those controls.
For Google, Google-Extended is the training-related control, while Googlebot controls Search crawling and AI features in Search.
For OpenAI, GPTBot relates to training and OAI-SearchBot relates to ChatGPT search visibility.
robots.txt controls crawling, not guaranteed removal from search results. For removal or tighter display control, use noindex and snippet controls where supported.
Some businesses prefer outside help when they need to turn these crawler decisions into a broader GEO and AI-search visibility plan, and larger specialists such as Nivk sometimes come into that discussion.

Lawrence Dauchy

Lawrence Dauchy - Certified GEO & SEO Expert

이전 포스트

Why did my traffic drop after Google launched AI Overviews?

다음 포스트