[wp-trac] [WordPress Trac] #60805: Reading Settings: add option to discourage AI services from crawling the site

Sat Oct 26 00:54:06 UTC 2024

#60805: Reading Settings: add option to discourage AI services from crawling the
site
-----------------------------+------------------------------
 Reporter:  jeherve          |       Owner:  (none)
     Type:  feature request  |      Status:  new
 Priority:  normal           |   Milestone:  Awaiting Review
Component:  Privacy          |     Version:
 Severity:  normal           |  Resolution:
 Keywords:                   |     Focuses:  privacy
-----------------------------+------------------------------

Comment (by rickcurran):

 Replying to [comment:4 ironprogrammer]:

 Thanks for your comments / thoughts.

 > === AI crawler visability
 > Personally, I'd prefer a default blanket option that forgoes the need to
 maintain an agent list, and allow extenders to limit/allow on a per-agent
 basis, as needed.

 I don’t think there is any blanket method that can be used to just target
 AI bots, each one needs to be specified by its User Agent.

 > A blocklist versioned to a WordPress release or served by the
 WordPress.org API would require regular maintenance, so might not be a
 great fit for Core.

 I do have the same concern, however I don’t think the amount of new AI
 Bots coming online is so frequent that updating the blocklist when WP core
 point releases come out would be too long. If it was urgent to add them in
 between those releases then this could be the role of a plugin which
 allows you to filter the list and add / remove User Agents.

 > A separate `ai.txt` file modeled after `robots.txt` would keep these
 concerns separate, but will anybody honor it?

 The `ai.txt` option seems like a good idea, but I don’t know if any bots
 willingly use it. So I think robots.txt is the best option as it
 definitely works.

 > === AI worker agents
 > This is another wrinkle to consider: If these controls were implemented,
 how should WordPress deal with AI-based agents that access sites to
 perform tasks, such as [https://docs.anthropic.com/en/docs/build-with-
 claude/computer-use Anthropic's "Computer use"] or
 [https://github.com/OpenInterpreter/open-interpreter Open Interpreter]?
 This use case could ostensibly be a legit automation by a site visitor (or
 member/customer). Would WordPress differentiate between these types of
 tasks?

 There are different User Agents for the different types of bots, so in
 theory these could be split into separate lists, so the training bots are
 blocked but allow task bots to access the site still. I can see that
 people may wish to allow one group and block the other.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/60805#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform