[wp-hackers] GSoC Proposal: Caching in core
Thomas Bukowski
wordpress at neodude.net
Mon Mar 31 08:16:00 GMT 2008
Hi all,
Though I think this has been discussed here before, I'm trying to
present the whole idea coherently to solicit some more feedback, so I
hope you'll will bear with another discussion.
Here's the idea: extend the caching API (WP_Object_Cache and
wp_cache_* funcs) currently in the core. Build an object-caching
solution persists between requests and limits the cache size by
invalidating the object with the oldest last-accessed date to make
space for a new one. Maintain a lookup table, probably as a cached
array, of 'version' numbers of each table. Hash cached objects with
which table they come from and its version at the time. When a table
gets its data changed, bump up the version number; all stale data will
automatically be inaccessible, and hence removed automatically with
the last-accessed-date invalidation as above.
(Idea credit: the whole business with versioning tags and tables is
andy's idea, from
http://andy.wordpress.com/2008/03/01/cache-with-versioned-tags/ )
That's the API. There'll need to be two ways you can extend it: where
to cache things to (database, memory, file, etc), and what to cache
(pages, serialized objects, etc).
Next phase: on top of that, build a database-backed object cache
plugin, shipped with WP, enabled by default. Why? - it's the only
solution that will work on any setup that will run WP (i.e., no apache/
mod_rewrite, no directory write access, etc). There are serious
performance considerations, which I attempt to address below.
Pre-fetch a set of common objects (maybe pages can hint which objects
to fetch) to drastically cut down on back-and-forth between the
database and the cache class. (This maybe should be a feature in the
API instead, though it should be able to be switched off by the
storage method - memcached, for example, probably doesn't need
prefetching.) On each request, remember what, if anything, was
changed, and bump up the version of the changed tables.
(Another idea: perhaps the hinting data can be collected automatically
by the caching engine as it gets requests for data. The first time
each type of page is accessed it would be a little slow, but pretty
fast from then on. The hints could then be stored in the cache itself
(or somewhere else more permanent), and this solution would adapt to
plugins' data access automatically.)
I'm not sure if versioning the caches with *tables* would be feasible
- I don't know how difficult it will be to trace which tables modified
data came from. Perhaps a different metric would be better; perhaps
using the 'groups' in WP_Object_Cache is the logical solution.
Something to investigate, obviously.
With this, full-page caching becomes feasible, namely by tagging each
full-page's cache entry with the tables/whatever it depends on. Maybe
take WP Super Cache, modify it to fit this new API, maybe then ship it
with WP, disabled by default, as a caching plugin. There's a lot more
to fiddle with full-page caching; there may be some merit in marking
sections of a cached page as dynamic (widgets?); then, all pages can
be cached, and only the truly dynamic data will need to be regenerated
per hit by the server. I think I might have time to start
investigating this, but an implementation will probably be out of
scope of this specific gsoc idea per se.
There's a lot of infrastructure fiddling and community approval with
this idea; it'll go slow, since caching is tricky to get right, and
we, of course, want to get it right :)
What do you all think?
Thank you all,
thomas.
More information about the wp-hackers
mailing list