[wp-hackers] HTML Purifier
Edward Z. Yang
edwardzyang at thewritingpot.com
Mon Feb 12 21:35:58 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi, this is the author of HTML Purifier. I'm glad to hear you discussing
HTML Purifier as a possible alternative.
> The primary downside I see to this is the size/number of files. KSES
> is small and effective as a security filter, while HTML Purifier is
> bigger and can do a whole lot more.
Yes, HTML Purifier is guilty of the large number of files, but kses is
by no means "effective", see
<http://hp.jpsband.org/comparison.html#kses> for the problems I found in
it.
Would offering a "single huge monster file" help out in any way?
> One possible benefit, something we accomplished on wordpress.com with
> a customized csstidy library, would be the ability to sanitize inline
> CSS.
HTML Purifier actually already rolls its own CSS parser and validator,
so it's not necessary.
> Does anyone have experience with integrating or profiling HTML Purifier?
> Per?
I actually hacked up a little plugin for WordPress, I'm not so sure how
well it works (I don't use WordPress for a blog, although I have it
installed on my own machine), but it seems to be functional. Comments on
it would be appreciated:
<?php
/*
Plugin Name: HTML Purifier
Version: 1.0.0beta
Plugin URI: http://hp.jpsband.org/
Description: Sends blog posts through a standards-compliant HTML filter,
HTMLPurifier. Standards-compliant output, guaranteed!
Author: Edward Z. Yang
*/
// include the library file
set_include_path(
// change this to the path to your installation of HTML Purifier
'/Documents and Settings/Edward/My Documents/My
Webs/htmlpurifier/library'
. PATH_SEPARATOR . get_include_path()
);
require_once 'HTMLPurifier.php';
function wordpress_htmlpurifier($text) {
static $purifier = null;
if ($purifier === null) $purifier = new HTMLPurifier();
// ugly hack, since content_save_pre doesn't have strip-slashed content
static $magic_quotes = null;
if ($magic_quotes === null) $magic_quotes = get_magic_quotes_gpc();
if ($magic_quotes) $text = stripslashes($text);
// preserve magic comments
$magic_comments = array('more', 'nextpage', 'noteaser');
foreach ($magic_comments as $name) {
$text = str_replace("<!--$name-->", "<br class=\"wp-$name\" />",
$text);
}
// do our stuff
$text = $purifier->purify($text);
foreach ($magic_comments as $name) {
$text = str_replace("<br class=\"wp-$name\" />", "<!--$name-->",
$text);
}
if ($magic_quotes) $text = addslashes($text);
// the original text is lost, I don't like that very much.
// PreFormatted <http://vapourtrails.ca/wp-preformatted> might
// be able to help you
return $text;
}
add_filter('content_save_pre', 'wordpress_htmlpurifier', 100);
// if you're outputting data from the post_content_filtered data,
// you might want to use this
// add_filter('content_filtered_save_pre', 'wordpress_htmlpurifier', 100);
// disable client-side filtering. As a general rule, client-side
// filtering shouldn't be trusted, so we won't make the attempt at all.
function wordpress_mce_allow_all() { return '*[*]'; }
add_filter('mce_valid_elements', 'wordpress_mce_allow_all');
// disable balanceTags, this is core HTML Purifier functionality
remove_filter('content_save_pre', 'balanceTags');
// disable auto-paragraphing, this can easily break advanced HTML
remove_filter('the_content', 'wpautop');
// as long as this filter runs before HTML Purifier, you could use it:
// add_filter('content_filtered_save_pre', 'wpautop');
// disable kses filtering: HTML Purifier is a kses replacement!
remove_filter('content_save_pre', 'wp_filter_post_kses');
remove_filter('content_filtered_save_pre', 'wp_filter_post_kses');
// We decided to keep some filters since they do things that
// HTML Purifier does and are fairly safe. But you may still want
// to replace them. Here they are:
// wptexturize:
// Applies typographic corrections and stray ampersand corrections.
// Much of it is redundant, though the dash conversions may be
// appreciated.
// remove_filter('the_content', 'wptexturize');
?>
- --
Edward Z. Yang Personal: edwardzyang at thewritingpot.com
SN:Ambush Commander Website: http://www.thewritingpot.com/
GPGKey:0x869C48DA http://www.thewritingpot.com/gpgpubkey.asc
3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF0N29qTO+fYacSNoRAswnAJ48GSuGJ4fW0ZP8enAWNgTl/Dn3lwCeJqe6
lADdwXJSIevw5iCqzCkp49o=
=wPja
-----END PGP SIGNATURE-----
More information about the wp-hackers
mailing list