[wp-trac] [WordPress Trac] #38186: Database Collations Bypassed by determine_charset() in wp-db.php

WordPress Trac noreply at wordpress.org
Fri Feb 10 14:32:57 UTC 2017


#38186: Database Collations Bypassed by determine_charset() in wp-db.php
--------------------------+------------------------------
 Reporter:  natecf        |       Owner:
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Charset       |     Version:  4.6.1
 Severity:  major         |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------
Changes (by cimatti):

 * severity:  normal => major


Comment:

 I think this is a deep issue with many potential consequence, because even
 if WordPress changed the default result of $wpdb->charset and
 $wpdb->collate , charset and collations in databases of already existings
 installations are not updated.

 Even plugins are involved, because they should use $wpdb->charset and
 $wpdb->collate to create tables. So plugins that created tables with
 collation utf8mb4_unicode_ci with an older WordPress version, now may
 create new tables and columns with collation utf8mb4_unicode_520_ci

 I already noticed in an old WordPress installation that wordpress columns
 remained on collation utf8mb4_unicode_ci but a plugin created a table with
 utf8mb4_unicode_520_ci. I have a plugin that has to create a temporary
 table and join it to existing tables to do a task. This stopped to work
 because old tables uses utf8mb4_unicode_ci and the new temporary table
 uses utf8mb4_unicode_520_ci

 So the big problem is that if you make a join or an operation between two
 columns with collation utf8mb4_unicode_ci and utf8mb4_unicode_520_ci the
 query fails

 The passage from utf8 to utf8mb4 could be problematic, because MySQL
 normally has a limit of 1000 byte for keys, so with utf8 the key can't
 hold more than 333 characters, and with utf8mb4 the limit is 250 and if
 you have a key valid with utf8 it may be too long with utf8mb4

 Changing a collation may be problematic too if you change it on a column
 with an unique key, because values that were considered different before,
 may be considered equal with the new collation

 I propose to follow this path:
 - default charset and collation should be chosen during installation, and
 you should stick with that
 - you should define a standard procedure to change collation, and plugins
 should implement a callback to change it in their tables when called
 - anyway migration to another collation should be discouraged, and if
 necessary it should be tested before on a copy of the site, and in any
 case a backup is strongly suggested

--
Ticket URL: <https://core.trac.wordpress.org/ticket/38186#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list