[wp-trac] [WordPress Trac] #54836: Huge error logs filled with "WordPress database error Illegal mix of collations" errors caused by spammers
WordPress Trac
noreply at wordpress.org
Sun May 21 11:02:05 UTC 2023
#54836: Huge error logs filled with "WordPress database error Illegal mix of
collations" errors caused by spammers
--------------------------+------------------------------
Reporter: jh20001 | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Database | Version:
Severity: critical | Resolution:
Keywords: | Focuses:
--------------------------+------------------------------
Comment (by craigfrancis):
The error_log from @jh20001 shows the database tables/fields using
`latin1_general_ci`, and trying to compare against a value it believes is
`utf8mb4_unicode_520_ci` - this might be from another table, or from the
user value (e.g. it guesses this type due to characters such as "π").
Likewise, the error_log from @varinupadhyay (ref #58133) uses
`utf8_general_ci` (aka `utf8mb3_general_ci`), which does not support all
unicode characters (maximum of three bytes per code point), so it has the
same problem (in this case, characters like "π₯").
----
You can test this with:
{{{#!sql
CREATE TABLE `test` (
`value` TINYTEXT NOT NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO test VALUES ("a"), ("ΓΌ"), ("ζ₯Ό");
INSERT INTO test VALUES ("π₯");
SELECT * FROM test WHERE value LIKE "%π₯%";
}}}
----
WordPress tries to setup new tables/fields with `utf8mb4_unicode_520_ci`
or `utf8mb4_unicode_ci`, which supports characters like π and π₯, but
older databases would not have used them.
You can test your servers support by going to "WP-Admin > Tools > Site
Health", and checking the result for "UTF8MB4 is supported" (which might
be under "Passed Tests").
I'd recommend changing the character-set in both of these databases (with
the usual notes of making a backup, checking for encoding issues, and
being aware that it can take time to alter big tables).
----
As to solutions...
It would be too difficult/dangerous/slow to check and make changes to
every single SELECT/INSERT/UPDATE query - i.e. parsing the SQL,
identifying every field and value, checking their character-set, and doing
something (e.g. using CONVERT) to make them compatible.
Personally I agree with @pbiron, I don't think core can/should do anything
about this... unless someone can come up with a way to get the upgrade
process to check/alter every single WP database/table/field to use utf8mb4
without any issues.
----
As an aside, `update.php` does have a `maybe_convert_table_to_utf8mb4()`
function, but it only works when all fields are `utf8` or `utf8mb4`, it's
only used when upgrading between certain versions of the database, and it
can still be risky:
{{{#!sql
CREATE TABLE `test` (
`a` TINYTEXT NOT NULL,
`b` TINYTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE `test` CONVERT TO CHARACTER SET utf8mb4 COLLATE
utf8mb4_unicode_ci;
SHOW CREATE TABLE test;
}}}
Note how the Bin field becomes Case-Insensitive, and the fields change
from TINYTEXT to TEXT.
https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-
character-set
https://codex.wordpress.org/Converting_Database_Character_Sets
https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
--
Ticket URL: <https://core.trac.wordpress.org/ticket/54836#comment:6>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list