[wp-hackers] GSoC Project: Performance Pre-application

Jacob Santos wordpress at santosj.name
Wed Mar 12 04:30:31 GMT 2008


I've been thinking about it a lot and one area that I'm needing brushing 
up on, is building high performance PHP applications. I know a few 
things about the subject, but I think I can learn a lot more from the 
mentors on the subject. I've decided that if I'm accepted and you guys 
will have me, then I'll want to take up the Performance project for the 
GSoC.

Here are some notes that I came up with. I think it would be better to 
get some further understanding about the project. (The notes might also 
be useful to another student, if I'm not accepted or who wants to do it. 
However, the rules do say that no groups can work together.)

* A testing suite that measures performance of various components and 
can be regularly run against new code.

I was contemplating this and I think the general solution is to build a 
plugin which hooks into the shutdown and grabs the XDebug Performance 
file that is created (um, XDebug would have to be enabled of course). By 
pulling this in and building a parser (how hard can that be?), it should 
be possible to build a table of the work flow and find trouble areas 
during normal visits on the site. This solution will provide the ability 
to run Apache or PHP site tools to "visit" the WordPress pages, since 
not all of the code is executed during any one run.

The main areas of concern are the Plugin API (not much you can optimize, 
but who can tell?), Gettext, Post API, Template APIs. Also, it will be 
useful to find out which functions are called the most (besides the 
Plugin API) to find what functions to try to optimize first. Another 
problem area are long running (relative term, but percentage wise to 
everything else).

PHPUnit is good at manually running suspect functions for performance, 
but building the custom plugin and system, might serve better than 
writing test suites for each and every function in WordPress.

* Measure the performance gain having a build system which combines all 
major WordPress code into one file.

Given a recent discussion, it might be reasonable to measure the 
performance gains of having one (or two) WordPress library files and a 
build system (separate of the official one probably) that handles it.

* Finding sections of the WordPress code base that particularly don't 
scale well under load.

The first item would do the finding, the second part would be to 
providing patches which addresses any problems that come up.

* Review of the object caching system to natively support pre-fetching 
and zero-query WordPress pages (without wp-cache)

Given the recent discussion on the preloading pattern, I think it might 
be worth investigating which configuration terms are used the most often 
and write a plugin which writes a PHP define (something suggested during 
a recent discussion, unsure of who it was).

Some of what WP Super Cache does can be incorporated into the core, in 
writing HTML files to a directory and loading that file with hooks to 
plugins and whatever.

I almost come to the conclusion that this item belongs in the Integrated 
Caching Solutions GSoC project instead. I think doing tests with the 
mysqlnd extension to see where SQL optimizations can be made, the scope 
of this shouldn't extend to WordPress caching.

The API could be improved to set up theme sections which can be cached 
(if a plugin exists to implement the method) different times, which 
means that once a post has been through the filters and plugins, it will 
be cached and pulled in by the API. It'll probably be best to work like 
the Widget API and be as simple as possible. Probably be best to be in 
the other project also.

* If proper HTTP headers were sent from WordPress pages then external 
squid-like proxy caches could be used to vastly improve performance.

Totally. I think "Not Modified" and "last modified" headers could 
improve the experience for sites which use WordPress as a CMS. It will 
need hooks to allow dynamic content plugins to change the headers, so 
that the browser will be updated with their content. I've heard that 
caching is probably best left to the browsers (I'm unsure if I agree, 
until I see more evidence).

It will also help with Search Engine Optimization, since it will tell 
search engines whether or not a page it has visited had been updated. 
With that it will be useful for blogs also, however, the most gain will 
probably be in pages.

* Currently WP loads all its code on every page, could a selective code 
loading scheme improve performance in a meaningful way?

The answer is no. Any function which is conditional will only be 
compiled during the runtime, which will hinder the Opcode Caching. It 
should generally be assumed that helping opcode caching extensions to 
cache files and functions in the first stage is acceptable practice. To 
explain how the pluggable functions work with the WordPress library, it 
is that the code in the library isn't executed (except in rare of cases) 
until after the function call is made which is usually after the 
pluggable file is called and made available.

The functions will still only be compiled in run time, therefore, what 
could be done to optimize the code for those who do not use plugins is 
to use a Registry (I know I have the pattern wrong, what is the pattern 
called, which allows you to... Dependency Injection! I'm unsure if that 
isn't the method and not the Pattern name. Should probably edit this 
out, but I could still be wrong). Regardless, what it does is totally 
how the Plugin API works, except you can only have one function 
registered at a time. So the pluggable functions will not have 
conditional checks around them allowing them to be compiled, but allow 
allow for the function to be replaced by a plugin.

The Dependency Injection API will allow for the Pluggables to be 
compiled before run time allowing a Opcode Cache to save the functions 
during the first stage. It won't break backwards compatibility, if the 
functions are renamed. A possible problem is that instead of the first 
plugin which is loaded gaining control of that pluggable function, it 
will be the last which will gain control of that function. A possible 
solution is to run it through a filter and only use the function which 
is returned.

* PerformancePress: WordPress PHP Extension

You have three months, what better way to spend some of that time than 
to write C and C++ code. There has been one person which has said that 
if an extension existed, he wouldn't mind working on it also. I'm unsure 
if he or she will follow up on that (not that it matters, I haven't been 
to keen on actually going through with writing it so I'm only to blame).

The Plugin API almost needs to be compiled in C/C++ in order to be fully 
optimized. The Plugin API has been stable for a number of years and 
barring any minor corrections, enhancements, and bug fixes I think it 
would fit great in a PHP Extension.

That said, I think every internal WordPress function can be moved to the 
PHP Extension. Also, except for any function which uses database or 
external library calls, I think the most used (see item 1) can be ported 
over to the extension. Many functions in WordPress are fairly stable and 
should be fairly safe to be moved to a PHP Extension.

Furthermore, during my research, I've found many 1:1 PHP Extension areas 
which work just like you would expect in PHP. To further explain, many 
PHP functions exist as simple macro call away and you can call 
"userland" functions from the extension also. Technically, you can make 
mysql calls or more accurately calls to PHP globals (which the wpdb 
class resides in). However, I'd like to keep the extension as simple as 
possible during the Google Summer of Code, if this section of the 
project is accepted. (Note: the short and simple of this long paragraph 
is that in a lot of areas, porting functions over shouldn't be that time 
consuming. Ha ha, just kidding, but seriously, it could be worse.)

There is one disadvantage to using Zend Engine macros, is that the 
performance gained won't be as much as using native C++ STL style Lists 
or Vectors, however the Zend Engine ZVal Hash is pretty geared to 
performance, it couldn't out performance a well written list when a list 
should be used instead of a hash. Well, other than the advantage that it 
makes it possible for a n00b to actually even write PHP Extensions, 
which is pretty major advantage if you ask me.

Please note, "PerformancePress" name is just for marketing. I will 
probably call it WordPress Library PHP Extension.

I think this part will be a success if only the Plugin API is ported over.

-- 

Jacob Santos

http://www.santosj.name - blog
http://funcdoc.wordpress.com - WordPress Documentation Blog/Guide Licensed under GPLv2

Also known as darkdragon and santosj on WP trac.




More information about the wp-hackers mailing list