[wp-trac] [WordPress Trac] #35022: WP allows Unicode 0x00a0 spaces in editor but shortcode parser can't handle them

WordPress Trac noreply at wordpress.org
Wed Apr 20 14:49:58 UTC 2016


#35022: WP allows Unicode 0x00a0 spaces in editor but shortcode parser can't handle
them
--------------------------+-----------------------------
 Reporter:  steevithak    |       Owner:
     Type:  defect (bug)  |      Status:  assigned
 Priority:  normal        |   Milestone:  Future Release
Component:  Shortcodes    |     Version:  4.4
 Severity:  normal        |  Resolution:
 Keywords:  needs-patch   |     Focuses:
--------------------------+-----------------------------

Comment (by gitlost):

 Discovered one can't rely on PCRE being installed with UTF-8 enabled. Also
 the check for `U+00A0` should only happen when the charset is UTF-8.
 (Apart from that the previous patch was fine.)

 So the simplest thing I think is just to use PCRE in single-byte mode and
 to only use the extended check when the blog charset is UTF-8, which the
 above patch does with conditional defines (it now needs two each for a
 positive and a negative match). Could add extra defines for particular
 legacy charsets like latin1 if one wanted.

 (I spent quite a bit of time trying to track down where TinyMCE was adding
 the `\x00a0`'s, and (as any schoolboy knows) it turns out it doesn't -
 it's the browser automatically adding ` `'s to `ContentEditable`
 divs, which TinyMCE then encodes into `\x00a0`'s. Chrome adds alternate
 "space   space  " etc while Firefox does "   ...
 space". IE does something similar to Firefox, but more microsofty. So
 there you go...)

--
Ticket URL: <https://core.trac.wordpress.org/ticket/35022#comment:22>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list