[wp-trac] [WordPress Trac] #58998: Prevent races and stampedes when flush_rewrite_rules() is called
WordPress Trac
noreply at wordpress.org
Tue Aug 8 09:04:37 UTC 2023
#58998: Prevent races and stampedes when flush_rewrite_rules() is called
---------------------------+------------------------------
Reporter: iCaleb | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Rewrite Rules | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch | Focuses: performance
---------------------------+------------------------------
Description changed by SergeyBiryukov:
Old description:
> This ticket is more or less a follow-up to
> https://core.trac.wordpress.org/ticket/11384.
>
> Currently when `flush_rewrite_rules()` is called,
> `WP_Rewrite->flush_rules()` will set the `rewrite_rules` option to an
> empty string first. And then afterwards it will go and generate the rules
> and save the results to the option.
>
> On a high traffic site there could be potentially hundreds of requests
> all coming in during this time period where the option is empty
> (considerably more-so if it either takes a particularly long time to
> generate, or if the site is utilizing replica databases). Each request
> during the time period will find `rewrite_rules` empty and attempt to
> generate and save the rules. This is the first of the stampedes that
> result from this behavior. You now have all these frontend requests for a
> period of time consuming more processing time than usual, and each having
> to connect to the primary database to perform a database write (which can
> further delay replication lag).
>
> Eventually that stampede should come to an end, but not before
> potentially triggering other stampedes. Since `rewrite_rules` is stored
> in alloptions, you have that cache key being invalidated and set quite
> frequently because all the requests are touching it. That introduces both
> some fun alloptions race conditions in general, but also makes it
> possible that the object cache (Memcached specifically) can enter into a
> stampede condition.
>
> In short, each memcached add() has to find a correctly-sized slab and
> starts to write it's data into memory. A few things happen in between and
> then at the end the item is linked to the hash table so it becomes the
> official record for that key. However if there is a large amount of add()
> requests all coming in at the same time all wanting to be apart of the
> same memory slab - each will essentially be evicting each other so nobody
> is the winner. This can very quickly cascade the site into severe
> instability when alloptions is not being cached. This is of course a
> problem in it's own right and rewrite_rules is not directly to blame, but
> thought it was worth mentioning to add more context.
>
> Lastly, because a request could be at any point in it's logic when
> `rewrite_rules` is made into an empty string, you open up to the
> possibility of just some weird race conditions with execution order and
> whatnot. A plugin could be doing something admittedly funky for example
> (`WP_Rewrite` leaves all it's properties open to being directly changed),
> resulting in a small percentage of requests possibly coming up with a
> different/partial result for `rewrite_rules` than others. I've seen this
> first-hand a number of times. It's not always obvious bugs either,
> sometimes just very particular load order / hooks / current cached state.
> At best you end up with a few 404s during the initial stampede. At worst,
> the last request to save was the faulty path and now you're stuck with
> incorrect rules until the next flush. And it's very tricky to narrow down
> what happened. We of course can't prevent all incorrect code, but we can
> give it less chances to mess things up.
>
> Ideally, and with a seemingly easy fix with limited to no downside, we
> can eliminate these problems and make the rewrite_rules generation more
> consistent and predictable.
New description:
This ticket is more or less a follow-up to #11384.
Currently when `flush_rewrite_rules()` is called,
`WP_Rewrite->flush_rules()` will set the `rewrite_rules` option to an
empty string first. And then afterwards it will go and generate the rules
and save the results to the option.
On a high traffic site there could be potentially hundreds of requests all
coming in during this time period where the option is empty (considerably
more-so if it either takes a particularly long time to generate, or if the
site is utilizing replica databases). Each request during the time period
will find `rewrite_rules` empty and attempt to generate and save the
rules. This is the first of the stampedes that result from this behavior.
You now have all these frontend requests for a period of time consuming
more processing time than usual, and each having to connect to the primary
database to perform a database write (which can further delay replication
lag).
Eventually that stampede should come to an end, but not before potentially
triggering other stampedes. Since `rewrite_rules` is stored in alloptions,
you have that cache key being invalidated and set quite frequently because
all the requests are touching it. That introduces both some fun alloptions
race conditions in general, but also makes it possible that the object
cache (Memcached specifically) can enter into a stampede condition.
In short, each memcached add() has to find a correctly-sized slab and
starts to write it's data into memory. A few things happen in between and
then at the end the item is linked to the hash table so it becomes the
official record for that key. However if there is a large amount of add()
requests all coming in at the same time all wanting to be apart of the
same memory slab - each will essentially be evicting each other so nobody
is the winner. This can very quickly cascade the site into severe
instability when alloptions is not being cached. This is of course a
problem in it's own right and rewrite_rules is not directly to blame, but
thought it was worth mentioning to add more context.
Lastly, because a request could be at any point in it's logic when
`rewrite_rules` is made into an empty string, you open up to the
possibility of just some weird race conditions with execution order and
whatnot. A plugin could be doing something admittedly funky for example
(`WP_Rewrite` leaves all it's properties open to being directly changed),
resulting in a small percentage of requests possibly coming up with a
different/partial result for `rewrite_rules` than others. I've seen this
first-hand a number of times. It's not always obvious bugs either,
sometimes just very particular load order / hooks / current cached state.
At best you end up with a few 404s during the initial stampede. At worst,
the last request to save was the faulty path and now you're stuck with
incorrect rules until the next flush. And it's very tricky to narrow down
what happened. We of course can't prevent all incorrect code, but we can
give it less chances to mess things up.
Ideally, and with a seemingly easy fix with limited to no downside, we can
eliminate these problems and make the rewrite_rules generation more
consistent and predictable.
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/58998#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list