[wp-trac] [WordPress Trac] #60805: Reading Settings: add option to discourage AI services from crawling the site

WordPress Trac noreply at wordpress.org
Fri Oct 25 23:19:50 UTC 2024


#60805: Reading Settings: add option to discourage AI services from crawling the
site
-----------------------------+------------------------------
 Reporter:  jeherve          |       Owner:  (none)
     Type:  feature request  |      Status:  new
 Priority:  normal           |   Milestone:  Awaiting Review
Component:  Privacy          |     Version:
 Severity:  normal           |  Resolution:
 Keywords:                   |     Focuses:  privacy
-----------------------------+------------------------------

Comment (by ironprogrammer):

 Thanks for the ticket, @jeherve, and for continuing the discussion,
 @rickcurran 🙌🏻

 === Search visibility status
 Regarding precedent for informing users of search engine visibility, yes,
 both the "At a Glance" dashboard widget and ''Site Health > Info >
 WordPress'' section include notices to this effect:

  [[Image(https://cldup.com/C3AdUCmIJV.png, 500px)]]

  [[Image(https://cldup.com/yH9KgXuP9p.png)]]

 I agree that both would be helpful indicators/reminders to couple with
 this feature.

 === AI crawler visability
 Personally, I'd prefer a default blanket option that forgoes the need to
 maintain an agent list, and allow extenders to limit/allow on a per-agent
 basis, as needed. From what I've observed in the media, concern voiced
 around AI companies scraping content seems quite separated from the
 ability to show up in search results, which would rule out a blanket AI
 "disallow" in `robots.txt`. A blocklist versioned to a WordPress release
 or served by the WordPress.org API would require regular maintenance, so
 might not be a great fit for Core.

 A separate `ai.txt` file modeled after `robots.txt` would keep these
 concerns separate, but will anybody honor it? As mentioned by @rickcurran,
 could WordPress lead by example here, by establishing a standard to be
 used by 43% of sites?

 With regard to a default of allowing or blocking AI crawlers, while it
 would indeed send a powerful message, I don't know if all WordPress users
 would necessarily agree to block on Day One when this feature shipped.
 However, a one-time admin notice after update, and a persistent AI crawler
 status on "At a Glance" could serve as reminders of this option.

 === AI worker agents
 This is another wrinkle to consider: If these controls were implemented,
 how should WordPress deal with AI-based agents that access sites to
 perform tasks, such as [https://docs.anthropic.com/en/docs/build-with-
 claude/computer-use Anthropic's "Computer use"] or
 [https://github.com/OpenInterpreter/open-interpreter Open Interpreter]?
 This use case could ostensibly be a legit automation by a site visitor (or
 member/customer). Would WordPress differentiate between these types of
 tasks? A commerce site might be fine with an automation to re-order toilet
 paper, but a ticket site might not want bots gobbling up seats to an
 event.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/60805#comment:4>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list