[wp-trac] [WordPress Trac] #60805: Reading Settings: add option to discourage AI services from crawling the site
WordPress Trac
noreply at wordpress.org
Sat Oct 26 00:54:06 UTC 2024
#60805: Reading Settings: add option to discourage AI services from crawling the
site
-----------------------------+------------------------------
Reporter: jeherve | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Privacy | Version:
Severity: normal | Resolution:
Keywords: | Focuses: privacy
-----------------------------+------------------------------
Comment (by rickcurran):
Replying to [comment:4 ironprogrammer]:
Thanks for your comments / thoughts.
> === AI crawler visability
> Personally, I'd prefer a default blanket option that forgoes the need to
maintain an agent list, and allow extenders to limit/allow on a per-agent
basis, as needed.
I don’t think there is any blanket method that can be used to just target
AI bots, each one needs to be specified by its User Agent.
> A blocklist versioned to a WordPress release or served by the
WordPress.org API would require regular maintenance, so might not be a
great fit for Core.
I do have the same concern, however I don’t think the amount of new AI
Bots coming online is so frequent that updating the blocklist when WP core
point releases come out would be too long. If it was urgent to add them in
between those releases then this could be the role of a plugin which
allows you to filter the list and add / remove User Agents.
> A separate `ai.txt` file modeled after `robots.txt` would keep these
concerns separate, but will anybody honor it?
The `ai.txt` option seems like a good idea, but I don’t know if any bots
willingly use it. So I think robots.txt is the best option as it
definitely works.
> === AI worker agents
> This is another wrinkle to consider: If these controls were implemented,
how should WordPress deal with AI-based agents that access sites to
perform tasks, such as [https://docs.anthropic.com/en/docs/build-with-
claude/computer-use Anthropic's "Computer use"] or
[https://github.com/OpenInterpreter/open-interpreter Open Interpreter]?
This use case could ostensibly be a legit automation by a site visitor (or
member/customer). Would WordPress differentiate between these types of
tasks?
There are different User Agents for the different types of bots, so in
theory these could be split into separate lists, so the training bots are
blocked but allow task bots to access the site still. I can see that
people may wish to allow one group and block the other.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/60805#comment:5>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list