[wp-trac] [WordPress Trac] #54836: Huge error logs filled with "WordPress database error Illegal mix of collations" errors caused by spammers
    WordPress Trac 
    noreply at wordpress.org
       
    Sun May 21 11:02:05 UTC 2023
    
    
  
#54836: Huge error logs filled with "WordPress database error Illegal mix of
collations" errors caused by spammers
--------------------------+------------------------------
 Reporter:  jh20001       |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Database      |     Version:
 Severity:  critical      |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------
Comment (by craigfrancis):
 The error_log from @jh20001 shows the database tables/fields using
 `latin1_general_ci`, and trying to compare against a value it believes is
 `utf8mb4_unicode_520_ci` - this might be from another table, or from the
 user value (e.g. it guesses this type due to characters such as "π").
 Likewise, the error_log from @varinupadhyay (ref #58133) uses
 `utf8_general_ci` (aka `utf8mb3_general_ci`), which does not support all
 unicode characters (maximum of three bytes per code point), so it has the
 same problem (in this case, characters like "π₯").
 ----
 You can test this with:
 {{{#!sql
 CREATE TABLE `test` (
   `value` TINYTEXT NOT NULL
 ) CHARACTER SET utf8 COLLATE utf8_general_ci;
 INSERT INTO test VALUES ("a"), ("ΓΌ"), ("ζ₯Ό");
 INSERT INTO test VALUES ("π₯");
 SELECT * FROM test WHERE value LIKE "%π₯%";
 }}}
 ----
 WordPress tries to setup new tables/fields with `utf8mb4_unicode_520_ci`
 or `utf8mb4_unicode_ci`, which supports characters like π and π₯, but
 older databases would not have used them.
 You can test your servers support by going to "WP-Admin > Tools > Site
 Health", and checking the result for "UTF8MB4 is supported" (which might
 be under "Passed Tests").
 I'd recommend changing the character-set in both of these databases (with
 the usual notes of making a backup, checking for encoding issues, and
 being aware that it can take time to alter big tables).
 ----
 As to solutions...
 It would be too difficult/dangerous/slow to check and make changes to
 every single SELECT/INSERT/UPDATE query - i.e. parsing the SQL,
 identifying every field and value, checking their character-set, and doing
 something (e.g. using CONVERT) to make them compatible.
 Personally I agree with @pbiron, I don't think core can/should do anything
 about this... unless someone can come up with a way to get the upgrade
 process to check/alter every single WP database/table/field to use utf8mb4
 without any issues.
 ----
 As an aside, `update.php` does have a `maybe_convert_table_to_utf8mb4()`
 function, but it only works when all fields are `utf8` or `utf8mb4`, it's
 only used when upgrading between certain versions of the database, and it
 can still be risky:
 {{{#!sql
 CREATE TABLE `test` (
   `a` TINYTEXT NOT NULL,
   `b` TINYTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
 ) CHARACTER SET utf8 COLLATE utf8_general_ci;
 ALTER TABLE `test` CONVERT TO CHARACTER SET utf8mb4 COLLATE
 utf8mb4_unicode_ci;
 SHOW CREATE TABLE test;
 }}}
 Note how the Bin field becomes Case-Insensitive, and the fields change
 from TINYTEXT to TEXT.
 https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-
 character-set
 https://codex.wordpress.org/Converting_Database_Character_Sets
 https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/54836#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
    
    
More information about the wp-trac
mailing list