[wp-trac] [WordPress Trac] #32136: strip_invalid_text removes all russian utf8 chars

WordPress Trac noreply at wordpress.org
Sun Apr 26 01:11:18 UTC 2015


#32136: strip_invalid_text removes all russian utf8 chars
--------------------------+-----------------------------
 Reporter:  Fahrain       |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  General       |    Version:  4.2
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 wordpress now updated to 4.1.3

 i have some custom tables inside wordpress database and can not insert
 data into them beacuse function strip_invalid_text removes all russian
 characters from input arrays with data.

 into $wpdb->insert data array is

 {{{
 array(5) {
   ["name"]=>
   string(15) "Земля Испытаний"
 ...
 }
 }}}

 format array is
 {{{
  ('%s' ...)
 }}}


 when function strip_invalid_text return data result is
 {{{
 array(5) {
   ["name"]=>
   array(4) {
     ["value"]=>
     string(1) " "
     ["format"]=>
     string(2) "%s"
     ["charset"]=>
     string(4) "utf8"
     ["ascii"]=>
     bool(false)
   }
 ...
 }
 }}}

 Problem is in regular expression inside block
 {{{
 if ( 'utf8' === $charset || 'utf8mb3' === $charset || 'utf8mb4' ===
 $charset ) {
 ...
         $value['value'] = preg_replace( $regex, '$1', $value['value'] );
 }
 }}}

 this preg_replace fails bacause input value "name" is not in urf-8
 encoding at all. It is windows-1251 encoding.
 I'm attached file with example. It is in windows-1251 encoding, so, if you
 use iconv to utf8 on input string - all works fine, but if you remove
 iconv then result will contain only numbers and ascii chars

 At now i removed {{{ 'utf8' === $charset || }}} from if. It helps, but...

--
Ticket URL: <https://core.trac.wordpress.org/ticket/32136>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list