[wp-trac] [WordPress Trac] #33924: sanitize_html_class valid characters

WordPress Trac noreply at wordpress.org
Mon Sep 5 17:49:39 UTC 2022


#33924: sanitize_html_class valid characters
-------------------------------------------------+-------------------------
 Reporter:  m-e-h                                |       Owner:  (none)
     Type:  defect (bug)                         |      Status:  new
 Priority:  normal                               |   Milestone:  Future
                                                 |  Release
Component:  Formatting                           |     Version:  4.4
 Severity:  normal                               |  Resolution:
 Keywords:  has-patch 2nd-opinion has-unit-      |     Focuses:
  tests                                          |
-------------------------------------------------+-------------------------

Comment (by anrghg):

 Thank you for supporting Non-Latin scripts so everybody gets the same
 opportunity of using the slug as a class.

 As @peterwilsoncc wrote in #56504 — my apologies for opening a duplicate —
 today:

 > Raising the issue of non-latin alphabets is an excellent point.
 > I do agree that the function ought to be more permissive for valid
 characters

 Since page slugs are used as class names, all scripts should be equal:
 Latin, Greek, Cyrillic, all 160 (number growing) Non-Latin scripts already
 supported by Unicode. (Plus non-ASCII Latin, since for slugs, Latin-script
 users can choose between simplified Latin (remove accents) and real
 Latin.)

 Currently, `sanitize_html_class()` provides security at the expense of
 usability, equity, internationalization and localization. By deleting all
 non-ASCII characters along with the non-alphanumeric ASCII (except hyphen,
 underscore), WordPress is throwing the baby with the bathwater.

 That behavior surely breaks WordPress’ internationalization and inclusion
 policies.

 Test example of added body classes based on a page slug `/χαιρε-εν-αμπ/`
 (slashes for clarity, Greek transliteration intentional, quoted from
 https://anrghg.sunsite.fr/test-amp-compat/64-characters-%e2%96%b6-css-
 allows-all-non-
 ascii-%f0%9f%98%91/#1129-id-1164-%cf%87%ce%b1%ce%b9%cf%81%ce%b5-%ce%b5%ce%bd-%ce%b1%ce%bc%cf%80):

 * CSS spec conformant or permissive (simplified markup):

 {{{
 <body class="id-1164 χαιρε-εν-αμπ">
 }}}

 * Legacy aka strict:

 {{{
 <body class="id-1164 --">
 }}}

 These examples are made up to demonstrate levels of usability. In real
 life, all three classes are added together, like in the full source at
 view-source:https://anrghg.sunsite.fr/test-amp-
 compat/%cf%87%ce%b1%ce%b9%cf%81%ce%b5-%ce%b5%ce%bd-%ce%b1%ce%bc%cf%80/
 {{{
 <body class="page-template-default page page-id-1164 logged-in wp-embed-
 responsive id-1164 _-- χαιρε-εν-αμπ">
 }}}

 On a side note: The double-hyphen class is invalid CSS so it has a
 (configurable) underscore prepended as a more intuitive alternative to
 escaping the second hyphen in CSS: `.-\2D`. The goal is maximum
 intuitivity and usability for users adding Custom CSS and to avoid
 screwing things up.

 In Ukrainian, the equivalent CSS spec conformant or permissive class is
 (courtesy Google Translate):
 {{{
 <body class="id-1164 ласкаво-просимо-до-amp">
 }}}

 The derived legacy aka strict class is not very specific:
 {{{
 <body class="id-1164 ---amp">
 }}}

 Using the built-in (and screwed-up — it requires picking the right prefix
 among `postid-` and `page-id-`) post ID selector is currently still the
 only option for Non-Latin users. Using the convenient slug selector is
 currently still a privilege of Latin script users.

 Thank you to everyone for striving to lift that limitation.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/33924#comment:18>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list