[wp-hackers] GSoC Project: Performance Pre-application
Jacob Santos
wordpress at santosj.name
Wed Mar 12 04:30:31 GMT 2008
I've been thinking about it a lot and one area that I'm needing brushing
up on, is building high performance PHP applications. I know a few
things about the subject, but I think I can learn a lot more from the
mentors on the subject. I've decided that if I'm accepted and you guys
will have me, then I'll want to take up the Performance project for the
GSoC.
Here are some notes that I came up with. I think it would be better to
get some further understanding about the project. (The notes might also
be useful to another student, if I'm not accepted or who wants to do it.
However, the rules do say that no groups can work together.)
* A testing suite that measures performance of various components and
can be regularly run against new code.
I was contemplating this and I think the general solution is to build a
plugin which hooks into the shutdown and grabs the XDebug Performance
file that is created (um, XDebug would have to be enabled of course). By
pulling this in and building a parser (how hard can that be?), it should
be possible to build a table of the work flow and find trouble areas
during normal visits on the site. This solution will provide the ability
to run Apache or PHP site tools to "visit" the WordPress pages, since
not all of the code is executed during any one run.
The main areas of concern are the Plugin API (not much you can optimize,
but who can tell?), Gettext, Post API, Template APIs. Also, it will be
useful to find out which functions are called the most (besides the
Plugin API) to find what functions to try to optimize first. Another
problem area are long running (relative term, but percentage wise to
everything else).
PHPUnit is good at manually running suspect functions for performance,
but building the custom plugin and system, might serve better than
writing test suites for each and every function in WordPress.
* Measure the performance gain having a build system which combines all
major WordPress code into one file.
Given a recent discussion, it might be reasonable to measure the
performance gains of having one (or two) WordPress library files and a
build system (separate of the official one probably) that handles it.
* Finding sections of the WordPress code base that particularly don't
scale well under load.
The first item would do the finding, the second part would be to
providing patches which addresses any problems that come up.
* Review of the object caching system to natively support pre-fetching
and zero-query WordPress pages (without wp-cache)
Given the recent discussion on the preloading pattern, I think it might
be worth investigating which configuration terms are used the most often
and write a plugin which writes a PHP define (something suggested during
a recent discussion, unsure of who it was).
Some of what WP Super Cache does can be incorporated into the core, in
writing HTML files to a directory and loading that file with hooks to
plugins and whatever.
I almost come to the conclusion that this item belongs in the Integrated
Caching Solutions GSoC project instead. I think doing tests with the
mysqlnd extension to see where SQL optimizations can be made, the scope
of this shouldn't extend to WordPress caching.
The API could be improved to set up theme sections which can be cached
(if a plugin exists to implement the method) different times, which
means that once a post has been through the filters and plugins, it will
be cached and pulled in by the API. It'll probably be best to work like
the Widget API and be as simple as possible. Probably be best to be in
the other project also.
* If proper HTTP headers were sent from WordPress pages then external
squid-like proxy caches could be used to vastly improve performance.
Totally. I think "Not Modified" and "last modified" headers could
improve the experience for sites which use WordPress as a CMS. It will
need hooks to allow dynamic content plugins to change the headers, so
that the browser will be updated with their content. I've heard that
caching is probably best left to the browsers (I'm unsure if I agree,
until I see more evidence).
It will also help with Search Engine Optimization, since it will tell
search engines whether or not a page it has visited had been updated.
With that it will be useful for blogs also, however, the most gain will
probably be in pages.
* Currently WP loads all its code on every page, could a selective code
loading scheme improve performance in a meaningful way?
The answer is no. Any function which is conditional will only be
compiled during the runtime, which will hinder the Opcode Caching. It
should generally be assumed that helping opcode caching extensions to
cache files and functions in the first stage is acceptable practice. To
explain how the pluggable functions work with the WordPress library, it
is that the code in the library isn't executed (except in rare of cases)
until after the function call is made which is usually after the
pluggable file is called and made available.
The functions will still only be compiled in run time, therefore, what
could be done to optimize the code for those who do not use plugins is
to use a Registry (I know I have the pattern wrong, what is the pattern
called, which allows you to... Dependency Injection! I'm unsure if that
isn't the method and not the Pattern name. Should probably edit this
out, but I could still be wrong). Regardless, what it does is totally
how the Plugin API works, except you can only have one function
registered at a time. So the pluggable functions will not have
conditional checks around them allowing them to be compiled, but allow
allow for the function to be replaced by a plugin.
The Dependency Injection API will allow for the Pluggables to be
compiled before run time allowing a Opcode Cache to save the functions
during the first stage. It won't break backwards compatibility, if the
functions are renamed. A possible problem is that instead of the first
plugin which is loaded gaining control of that pluggable function, it
will be the last which will gain control of that function. A possible
solution is to run it through a filter and only use the function which
is returned.
* PerformancePress: WordPress PHP Extension
You have three months, what better way to spend some of that time than
to write C and C++ code. There has been one person which has said that
if an extension existed, he wouldn't mind working on it also. I'm unsure
if he or she will follow up on that (not that it matters, I haven't been
to keen on actually going through with writing it so I'm only to blame).
The Plugin API almost needs to be compiled in C/C++ in order to be fully
optimized. The Plugin API has been stable for a number of years and
barring any minor corrections, enhancements, and bug fixes I think it
would fit great in a PHP Extension.
That said, I think every internal WordPress function can be moved to the
PHP Extension. Also, except for any function which uses database or
external library calls, I think the most used (see item 1) can be ported
over to the extension. Many functions in WordPress are fairly stable and
should be fairly safe to be moved to a PHP Extension.
Furthermore, during my research, I've found many 1:1 PHP Extension areas
which work just like you would expect in PHP. To further explain, many
PHP functions exist as simple macro call away and you can call
"userland" functions from the extension also. Technically, you can make
mysql calls or more accurately calls to PHP globals (which the wpdb
class resides in). However, I'd like to keep the extension as simple as
possible during the Google Summer of Code, if this section of the
project is accepted. (Note: the short and simple of this long paragraph
is that in a lot of areas, porting functions over shouldn't be that time
consuming. Ha ha, just kidding, but seriously, it could be worse.)
There is one disadvantage to using Zend Engine macros, is that the
performance gained won't be as much as using native C++ STL style Lists
or Vectors, however the Zend Engine ZVal Hash is pretty geared to
performance, it couldn't out performance a well written list when a list
should be used instead of a hash. Well, other than the advantage that it
makes it possible for a n00b to actually even write PHP Extensions,
which is pretty major advantage if you ask me.
Please note, "PerformancePress" name is just for marketing. I will
probably call it WordPress Library PHP Extension.
I think this part will be a success if only the Plugin API is ported over.
--
Jacob Santos
http://www.santosj.name - blog
http://funcdoc.wordpress.com - WordPress Documentation Blog/Guide Licensed under GPLv2
Also known as darkdragon and santosj on WP trac.
More information about the wp-hackers
mailing list