[wp-trac] [WordPress Trac] #64696: Real time collboration effectively disables persistent post caches while anyone edits a post

Tue Mar 10 15:13:10 UTC 2026

#64696: Real time collboration effectively disables persistent post caches while
anyone edits a post
--------------------------------------+--------------------------
 Reporter:  peterwilsoncc             |       Owner:  (none)
     Type:  defect (bug)              |      Status:  new
 Priority:  normal                    |   Milestone:  7.0
Component:  Posts, Post Types         |     Version:  trunk
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:  performance
--------------------------------------+--------------------------

Comment (by mindctrl):

 Replying to [comment:48 peterwilsoncc]:

 I've been thinking about the schema proposal and whether a single table
 with an `event_type` column is the right approach, or whether awareness
 and content updates are different enough to warrant separate storage.

 The access patterns and lifecycles are different:

 - Content updates are append-only messages that must be reliably delivered
 to every client exactly once. They're ordered by cursor (auto-increment
 ID), compacted periodically, and may persist for minutes to hours during
 an editing session. Reliable delivery matters, as a missed or duplicated
 update means a diverged document.
 - Awareness state is per-client, last-write-wins type data (cursor
 position, selection, user info). It's overwritten on every poll cycle
 (~250ms–1s), expires after a short period of inactivity, and losing it
 briefly just means a collaborator's cursor flickers. There's no ordering
 requirement, we just want the latest state per client. If awareness is
 expanded beyond the editor screen, as mentioned by Matt elsewhere (I
 remember reading it but can't find where he said it at the moment), the
 need to save awareness state will increase, and it will exacerbate any
 tradeoffs we make.

 Combining them in one table means awareness writes (which are the highest-
 frequency operations involving every active client, every poll cycle)
 inflate the auto-increment ID space and add rows that the
 compaction/cleanup logic needs to work around. The indexing needs are also
 different: content updates need (room, id) for cursor-based polling, while
 awareness needs (room, client_id) for upsert-by-client lookups.

 Separate storage would let each be optimized for its actual workload.
 Content updates could stay in the `wp_collaboration` table with auto-
 increment cursors and compaction, and that would keep it lean and fast
 with no need to scan past expired or current awareness rows. Awareness
 could use a second table with an INSERT ... ON DUPLICATE KEY UPDATE keyed
 on (room, client_id). Each client only ever writes its own row, so
 concurrent updates can't overwrite each other, and it doesn't add write
 volume to the content updates table.

 Replying to [comment:50 czarate]:

 > Both `client_id` and `type` are internal implementation details that are
 specific to Yjs and the polling provider. They could change in the future.
 Keeping the storage mechanism opaque to these implementation details
 greatly reduces the risk of breaking changes.

 Regardless of the backing implementation (Yjs or otherwise), any
 collaborative editing system needs to identify its clients and distinguish
 between types of sync data. If the table is holding multiple types of
 data, we'll always need to know what type we have, and preferably query
 for only the data needed at any given time.

 - client_id — Any collaboration protocol needs to identify which
 participant originated an update. Without a client identifier, we can't
 filter a client's own updates out of poll responses, we can't do per-
 client awareness state, and we can't attribute changes. The concept of a
 client identity is inherent to multiuser editing.
 - type — The distinction between document updates and awareness/presence
 information exists in every collaboration system. These are conceptually
 different operations with different lifecycles. Whether we call it type,
 event_type, channel, or whatever, we need to distinguish them.

 Optimizing tables for the use case we have, and know we will have soon,
 seems like a good win and doesn't saddle us with more schemas that are
 less than ideal. There seems to be several tradeoffs associated with
 trying to put both in a single table. Is there an actual mandate to limit
 to one table? Is there a technical/deployment type concern of having two
 new tables instead of one? Do we think there would be a higher rate of
 failure or some other kind of ecosystem fallout if we had two new tables?

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/64696#comment:51>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform