[wp-trac] [WordPress Trac] #64696: Real time collboration effectively disables persistent post caches while anyone edits a post
WordPress Trac
noreply at wordpress.org
Tue Mar 10 15:13:10 UTC 2026
#64696: Real time collboration effectively disables persistent post caches while
anyone edits a post
--------------------------------------+--------------------------
Reporter: peterwilsoncc | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: 7.0
Component: Posts, Post Types | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses: performance
--------------------------------------+--------------------------
Comment (by mindctrl):
Replying to [comment:48 peterwilsoncc]:
I've been thinking about the schema proposal and whether a single table
with an `event_type` column is the right approach, or whether awareness
and content updates are different enough to warrant separate storage.
The access patterns and lifecycles are different:
- Content updates are append-only messages that must be reliably delivered
to every client exactly once. They're ordered by cursor (auto-increment
ID), compacted periodically, and may persist for minutes to hours during
an editing session. Reliable delivery matters, as a missed or duplicated
update means a diverged document.
- Awareness state is per-client, last-write-wins type data (cursor
position, selection, user info). It's overwritten on every poll cycle
(~250ms–1s), expires after a short period of inactivity, and losing it
briefly just means a collaborator's cursor flickers. There's no ordering
requirement, we just want the latest state per client. If awareness is
expanded beyond the editor screen, as mentioned by Matt elsewhere (I
remember reading it but can't find where he said it at the moment), the
need to save awareness state will increase, and it will exacerbate any
tradeoffs we make.
Combining them in one table means awareness writes (which are the highest-
frequency operations involving every active client, every poll cycle)
inflate the auto-increment ID space and add rows that the
compaction/cleanup logic needs to work around. The indexing needs are also
different: content updates need (room, id) for cursor-based polling, while
awareness needs (room, client_id) for upsert-by-client lookups.
Separate storage would let each be optimized for its actual workload.
Content updates could stay in the `wp_collaboration` table with auto-
increment cursors and compaction, and that would keep it lean and fast
with no need to scan past expired or current awareness rows. Awareness
could use a second table with an INSERT ... ON DUPLICATE KEY UPDATE keyed
on (room, client_id). Each client only ever writes its own row, so
concurrent updates can't overwrite each other, and it doesn't add write
volume to the content updates table.
Replying to [comment:50 czarate]:
> Both `client_id` and `type` are internal implementation details that are
specific to Yjs and the polling provider. They could change in the future.
Keeping the storage mechanism opaque to these implementation details
greatly reduces the risk of breaking changes.
Regardless of the backing implementation (Yjs or otherwise), any
collaborative editing system needs to identify its clients and distinguish
between types of sync data. If the table is holding multiple types of
data, we'll always need to know what type we have, and preferably query
for only the data needed at any given time.
- client_id — Any collaboration protocol needs to identify which
participant originated an update. Without a client identifier, we can't
filter a client's own updates out of poll responses, we can't do per-
client awareness state, and we can't attribute changes. The concept of a
client identity is inherent to multiuser editing.
- type — The distinction between document updates and awareness/presence
information exists in every collaboration system. These are conceptually
different operations with different lifecycles. Whether we call it type,
event_type, channel, or whatever, we need to distinguish them.
Optimizing tables for the use case we have, and know we will have soon,
seems like a good win and doesn't saddle us with more schemas that are
less than ideal. There seems to be several tradeoffs associated with
trying to put both in a single table. Is there an actual mandate to limit
to one table? Is there a technical/deployment type concern of having two
new tables instead of one? Do we think there would be a higher rate of
failure or some other kind of ecosystem fallout if we had two new tables?
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64696#comment:51>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list