[wp-trac] [WordPress Trac] #56504: `sanitize_html_class()` is both too restrictive, and too permissive so it may return an invalid class name

WordPress Trac noreply at wordpress.org
Fri Sep 2 20:47:05 UTC 2022


#56504: `sanitize_html_class()` is both too restrictive, and too permissive so it
may return an invalid class name
--------------------------+-----------------------------
 Reporter:  anrghg        |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  General       |    Version:
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 `sanitize_html_class()` returns invalid class when arguments start with a
 digit 0-9, or a hyphen followed by a digit 0-9, per CSS spec
 (https://www.w3.org/TR/CSS21/syndata.html#characters). Brave/Chrome does
 not support these invalid classes, they do not work, so they are not
 “sane”, the return value is not sanitized.

 At the other end, `sanitize_html_class()` needlessly degrades class names
 containing or made of non-ASCII Unicode, from no-break space and above,
 accented letters and emoji are allowed in class names (and IDs) and work
 perfectly, provided of course they are not URL-encoded, but they may be
 backslash escaped, e.g. UTF-8. Best is to have them in plain Unicode, per
 https://www.w3.org/International/questions/qa-escapes

 A sanitizing function conforming to the spec and providing a better user
 experience could be coded for example like so:
 {{{
 function anrghg_sanitize_html_id_class( $p_s_string, $p_s_prefix = '_',
 $p_b_decode = true ) {
         if ( preg_match( '/[0-9]/', $p_s_string[0] )
                 ||
                 ( preg_match( '/-/', $p_s_string[0] ) && preg_match(
 '/[-0-9]/', $p_s_string[1] ) )
         ) {
                 $p_s_string = $p_s_prefix . $p_s_string;
         }
         if ( $p_b_decode ) {
                 $p_s_string = urldecode( $p_s_string );
         } else {
                 $p_s_string = preg_replace( '/%[0-9A-Fa-f]{2}/', '',
 $p_s_string );
         }
         $p_s_string = preg_replace( '/((?<!\\\\[0-9A-
 Fa-f]{2})\s|(?<!\\\\)[%^{}~@`\'"&#$()+[\]|\/*<>=?;:!,.])/', '',
 $p_s_string );
         return $p_s_string;
 }
 }}}
 Prepending an underscore probably provides a better UX than escaping the
 first digit.

 The `apply_filters()` is skipped for brevity. We can use this filter to
 override default processing, but the issue is not so much about
 customization, rather about conformance to the CSS specification.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/56504>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list