[wp-trac] [WordPress Trac] #60805: Reading Settings: add option to discourage AI services from crawling the site
WordPress Trac
noreply at wordpress.org
Fri Oct 25 23:19:50 UTC 2024
#60805: Reading Settings: add option to discourage AI services from crawling the
site
-----------------------------+------------------------------
Reporter: jeherve | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Privacy | Version:
Severity: normal | Resolution:
Keywords: | Focuses: privacy
-----------------------------+------------------------------
Comment (by ironprogrammer):
Thanks for the ticket, @jeherve, and for continuing the discussion,
@rickcurran 🙌🏻
=== Search visibility status
Regarding precedent for informing users of search engine visibility, yes,
both the "At a Glance" dashboard widget and ''Site Health > Info >
WordPress'' section include notices to this effect:
[[Image(https://cldup.com/C3AdUCmIJV.png, 500px)]]
[[Image(https://cldup.com/yH9KgXuP9p.png)]]
I agree that both would be helpful indicators/reminders to couple with
this feature.
=== AI crawler visability
Personally, I'd prefer a default blanket option that forgoes the need to
maintain an agent list, and allow extenders to limit/allow on a per-agent
basis, as needed. From what I've observed in the media, concern voiced
around AI companies scraping content seems quite separated from the
ability to show up in search results, which would rule out a blanket AI
"disallow" in `robots.txt`. A blocklist versioned to a WordPress release
or served by the WordPress.org API would require regular maintenance, so
might not be a great fit for Core.
A separate `ai.txt` file modeled after `robots.txt` would keep these
concerns separate, but will anybody honor it? As mentioned by @rickcurran,
could WordPress lead by example here, by establishing a standard to be
used by 43% of sites?
With regard to a default of allowing or blocking AI crawlers, while it
would indeed send a powerful message, I don't know if all WordPress users
would necessarily agree to block on Day One when this feature shipped.
However, a one-time admin notice after update, and a persistent AI crawler
status on "At a Glance" could serve as reminders of this option.
=== AI worker agents
This is another wrinkle to consider: If these controls were implemented,
how should WordPress deal with AI-based agents that access sites to
perform tasks, such as [https://docs.anthropic.com/en/docs/build-with-
claude/computer-use Anthropic's "Computer use"] or
[https://github.com/OpenInterpreter/open-interpreter Open Interpreter]?
This use case could ostensibly be a legit automation by a site visitor (or
member/customer). Would WordPress differentiate between these types of
tasks? A commerce site might be fine with an automation to re-order toilet
paper, but a ticket site might not want bots gobbling up seats to an
event.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/60805#comment:4>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list