[wp-trac] [WordPress Trac] #20368: htmlspecialchars() returns empty string for non-UTF-8 input in PHP 5.4

WordPress Trac wp-trac at lists.automattic.com
Thu Apr 5 12:43:24 UTC 2012


#20368: htmlspecialchars() returns empty string for non-UTF-8 input in PHP 5.4
--------------------------+-----------------------------
 Reporter:  convissor     |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  General       |    Version:
 Severity:  major         |   Keywords:
--------------------------+-----------------------------
 The default value of the input `$encoding` parameter for
 `htmlspecialchars()` changed to UTF-8 in PHP 5.4.  The prior default was
 ISO-8859-1.  The function's UTF-8 handler checks the input, returning an
 empty string if the input isn't valid UTF-8.

 WordPress will see the UTF-8 validator kicking because most of the
 `htmlspecialchars()` calls don't use the `$encoding` parameter.  This will
 cause major problems for sites that have a `DB_CHARSET` other than `utf8`.

 [http://article.gmane.org/gmane.comp.php.devel/71783 Posting 58859 to php-
 internals] by Rasmus gives a clear example of the problem.  Here is a link
 to [http://thread.gmane.org/gmane.comp.php.devel/71777 view the whole
 thread], starting with posting 58853).

 Creating two centralized functions is an approach for resolving this
 problem.  This route is simpler and easier to maintain than adding the
 parameters to each `htmlspecialchars()` call throughout the code base.

 1. `wp_hsc_db()` for safely displaying database results.  Uses
 `DB_CHARSET` to calculate the appropriate `$encoding` parameter.  MySQL's
 character set names are not equivalent to the values PHP is looking for in
 the `$encoding` parameter.  Please see the `hsc_db()` method in the
 [http://plugins.svn.wordpress.org/login-security-solution/trunk/login-
 security-solution.php Login Security Solution plugin] for a mapping of the
 valid options.

 2. `wp_hsc_utf8()` for safely displaying strings known to be saved as
 UTF-8, such as error messages written in core.  Uses `UTF-8` as the
 `$encoding` parameter.

 Some calls in core use the `$flags` parameter, so these new functions will
 need the parameter too.  The default should be `ENT_COMPAT`, which works
 under PHP 5.2, 5.3 and 5.4.

 It may be suggested that WP use `htmlspecialchar()`'s auto-detection
 option (by passing an empty string to the `$encoding` parameter).  This is
 not advisable because it can produce inconsistent behavior.  Even the PHP
 manual says this route is not recommended.

-- 
Ticket URL: <http://core.trac.wordpress.org/ticket/20368>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list