[wp-hackers] GSoC Proposal: Caching in core

Thomas Bukowski wordpress at neodude.net
Mon Mar 31 08:16:00 GMT 2008


Hi all,
Though I think this has been discussed here before, I'm trying to  
present the whole idea coherently to solicit some more feedback, so I  
hope you'll will bear with another discussion.

Here's the idea: extend the caching API (WP_Object_Cache and  
wp_cache_* funcs) currently in the core. Build an object-caching  
solution persists between requests and limits the cache size by  
invalidating the object with the oldest last-accessed date to make  
space for a new one. Maintain a lookup table, probably as a cached  
array, of 'version' numbers of each table. Hash cached objects with  
which table they come from and its version at the time. When a table  
gets its data changed, bump up the version number; all stale data will  
automatically be inaccessible, and hence removed automatically with  
the last-accessed-date invalidation as above.

(Idea credit: the whole business with versioning tags and tables is  
andy's idea, from
http://andy.wordpress.com/2008/03/01/cache-with-versioned-tags/ )

That's the API. There'll need to be two ways you can extend it: where  
to cache things to (database, memory, file, etc), and what to cache  
(pages, serialized objects, etc).

Next phase: on top of that, build a database-backed object cache  
plugin, shipped with WP, enabled by default. Why? - it's the only  
solution that will work on any setup that will run WP (i.e., no apache/ 
mod_rewrite, no directory write access, etc). There are serious  
performance considerations, which I attempt to address below.

Pre-fetch a set of common objects (maybe pages can hint which objects  
to fetch) to drastically cut down on back-and-forth between the  
database and the cache class. (This maybe should be a feature in the  
API instead, though it should be able to be switched off by the  
storage method - memcached, for example, probably doesn't need  
prefetching.) On each request, remember what, if anything, was  
changed, and bump up the version of the changed tables.

(Another idea: perhaps the hinting data can be collected automatically  
by the caching engine as it gets requests for data. The first time  
each type of page is accessed it would be a little slow, but pretty  
fast from then on. The hints could then be stored in the cache itself  
(or somewhere else more permanent), and this solution would adapt to  
plugins' data access automatically.)

I'm not sure if versioning the caches with *tables* would be feasible  
- I don't know how difficult it will be to trace which tables modified  
data came from. Perhaps a different metric would be better; perhaps  
using the 'groups' in WP_Object_Cache is the logical solution.  
Something to investigate, obviously.

With this, full-page caching becomes feasible, namely by tagging each  
full-page's cache entry with the tables/whatever it depends on. Maybe  
take WP Super Cache, modify it to fit this new API, maybe then ship it  
with WP, disabled by default, as a caching plugin. There's a lot more  
to fiddle with full-page caching; there may be some merit in marking  
sections of a cached page as dynamic (widgets?); then, all pages can  
be cached, and only the truly dynamic data will need to be regenerated  
per hit by the server. I think I might have time to start  
investigating this, but an implementation will probably be out of  
scope of this specific gsoc idea per se.

There's a lot of infrastructure fiddling and community approval with  
this idea; it'll go slow, since caching is tricky to get right, and  
we, of course, want to get it right :)

What do you all think?

Thank you all,
thomas.


More information about the wp-hackers mailing list