Last month, Cloudflare announced that they would be adding a new feature to their platform. As an extension of their existing AI bot blocking technology, they introduced a more robust service called “AI Scrapers and Crawlers”.
What is the AI Scrapers and Crawlers feature?
The AI Scrapers and Crawlers feature goes beyond a more simplified approach to blocking AI bots. There are some common ways to indicate to AI crawlers—which will crawl sites and extract content to be included in their training models and services—that you would like to exclude your content from their models. Many of the most popular AI services have agreed to respect a site’s robots.txt file as a way of letting these services know you want to exclude your site’s content from their training models. This is the same technology search engines have been using for quite some time.
It’s also become clear that some AI services will ignore these instructions either because of a technical oversight, or through willful defiance. Standard blocking mechanisms have failed in recent months, and many enterprise companies have noticed an increase in bot crawling from AI services anyway.
This new feature from Cloudflare will not only block known bots, but will also try to block bots that are circumventing the rules and trying to crawl your site anyway. This includes some AI bots using techniques intended to disguise themselves from detection. It is an extremely comprehensive firewall that protects against just about any kind of AI crawler.
With so much of the web’s traffic running through their infrastructure, Cloudflare is positioned well to take on this challenge. They are able to analyze hundreds of millions of web requests to identify and triage possible AI bots, and then automatically intervene on behalf of their customers.
What Does it Mean For Me?
Cloudflare has recognized a clear need from its customers, as previous bot blocking technology introduced last year is used by 85% of available customers.
This move, however, addresses a problem for everyone operating a website at scale. AI bots can be persistent and spread across a lot of different crawlers. For high traffic websites, all of those bots can be a drag on server performance, especially at peak times. And unlike search engines like Google and Bing, where their crawling technology can result directly in more traffic, the incentives for AI crawlers aren’t clear, and it’s possible that they can use content from your site without any attribution or traffic boost at all.
Even if you are not in the direct business of content, you are almost certainly in the business of creating knowledge and generating your own intellectual property. A cautious approach may benefit enterprise companies in the long run. Custom solutions can be implemented today which can help, but given how aggressive bots have become, this may not be enough.
Some hosting providers are already backed by Cloudflare technology, in which case you may see this feature available to you. It seems likely that other hosts and CDNs will roll out similar options. It’s also a feature that’s become available in some plugins, such as in Yoast SEO.
Next Steps
The best place to start is to consider your approach to AI crawling technology. There are reasons that you may want to be included in training models, to make sure information given on behalf of your company is at least somewhat accurate. And in the case that AI services improve their attribution capabilities, it could lead to more traffic in the long run.
However, you may also want to protect your IP from AI services. On top of the performance implications, they may misrepresent your content or brand, or use it in ways you had not previously intended. Or it may enable others to generate derivative content that directly competes or leeches from existing traffic.
In this uncertain territory, each case requires careful consideration. Ready to discuss a winning strategy? We are committed to collaborating with our clients on implementing tailored solutions that maximize long-term benefits.