[wp-trac] [WordPress Trac] #22692: Quotes Are Messing Up
WordPress Trac
noreply at wordpress.org
Sat Nov 9 22:17:27 UTC 2013
#22692: Quotes Are Messing Up
--------------------------+------------------
Reporter: miqrogroove | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: 3.8
Component: Formatting | Version: 1.2
Severity: normal | Resolution:
Keywords: has-patch |
--------------------------+------------------
Comment (by azaozz):
Yeah, now we are getting somewhere...
PHP 5.4.16 on Windows 7:
{{{
var_dump( setlocale( LC_ALL, 0 ) ); // string(1) "C"
var_dump( preg_match( '/^\s$/', "\xA0" ) ); // int(0)
var_dump( setlocale( LC_ALL, '' ) ); // On Windows this sets it to the
system default
// string(19) "English_Canada.1252"
var_dump( preg_match( '/^\s$/', "\xA0" ) ); // int(1)
var_dump( preg_match( '/^\s$/u', "\xA0" ) ); // bool(false) (as \xA0 is
not full UTF char)
setlocale( LC_ALL, 'C' );
var_dump( preg_match( '/^\s$/', "\xA0" ) ); // int(0)
var_dump( preg_match( '/^\s$/u', "\xA0" ) ); // bool(false)
}}}
PHP 5.3.1 on Mac OSX (note: 5.3.1 doesn't set PCRE_UCP with the `u`
modifier)
{{{
var_dump( setlocale( LC_ALL, 0 ) ); // string(1) "C"
var_dump( preg_match( '/^\s$/', "\xA0" ) ); // int(0)
setlocale( LC_ALL, 'en_CA' ); // Also with 'en_CA.UTF-8',
'en_CA.ISO8859-1', 'en_CA.ISO8859-15', etc.
var_dump( preg_match( '/^\s$/', "\xA0" ) ); // int(1)
var_dump( preg_match( '/^\s$/u', "\xA0" ) ); // int(0)
setlocale( LC_ALL, 'C' ); // Also with 'en_CA.US-ASCII'
var_dump( preg_match( '/^\s$/', "\xA0" ) ); // int(0)
var_dump( preg_match( '/^\s$/u', "\xA0" ) ); // int(0)
}}}
So when the locale is anything other than `C` or `US-ASCII` equivalent,
`\s` matches `\xA0`. Also on multithreaded servers like Apache on Windows,
setlocale() is "sticky",
[http://php.net/manual/en/function.setlocale.php#refsect1-function
.setlocale-notes more info].
From the PCRE manual:
{{{
PCRE handles caseless matching, and determines whether characters are
letters, digits, or whatever, by reference to a set of tables, indexed
by character value. When running in UTF-8 mode, this applies only to
characters with codes less than 128.
}}}
It is not mentioned there but seems `\s` is also affected by these tables.
{{{
The internal tables can always be overridden by tables supplied by the
application that calls PCRE... External tables are built by calling
pcre_maketables()
}}}
PHP uses pcre_maketables() only if the locale is 'C':
http://git.php.net/?p=php-
src.git;a=blob;f=ext/pcre/php_pcre.c;h=7d34d9feb15a81b5e80973cf1aaa1c4936543173;hb=refs/heads/master#l392
So it seems that the unexpected behavior of `\s` is caused by PCRE when a
locale other than `C` is set.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/22692#comment:58>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list