[wp-trac] [WordPress Trac] #53623: MariaDB 10.6 renamed utf8 to utf8mb3

WordPress Trac noreply at wordpress.org
Mon Aug 22 15:20:18 UTC 2022


#53623: MariaDB 10.6 renamed utf8 to utf8mb3
--------------------------+---------------------
 Reporter:  skithund      |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  6.1
Component:  Database      |     Version:
 Severity:  normal        |  Resolution:
 Keywords:  has-patch     |     Focuses:
--------------------------+---------------------
Changes (by SergeyBiryukov):

 * keywords:  needs-patch => has-patch
 * milestone:  Future Release => 6.1


Comment:

 Replying to [comment:4 ayeshrajans]:
 > MySQL 8.0.26 also has related changes:
 https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-26.html
 >
 > ----
 >
 > These statements now report `utf8mb3` rather than `utf8` when writing
 character set names: `EXPLAIN`, `SHOW CREATE PROCEDURE`, `SHOW CREATE
 EVENT`.
 >
 > Stored program definitions retrieved from the data dictionary now report
 `utf8mb3` rather than `utf8` in character set references. This affects any
 output produced from those definitions, such as `SHOW CREATE` statements.
 >
 > This error message now reports `utf8mb3` rather than `utf8` when writing
 character set names: `ER_INVALID_CHARACTER_STRING`.

 Thanks! These changes are indeed related, but they don't appear to cause
 the test failures here.

 In my testing, the current tests still pass on MySQL up until version
 8.0.29, which is
 [https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html no
 longer available for download], but has some more
 [https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html#mysqld-8-0-29-charset
 character set support changes]. The tests start failing on MySQL 8.0.30,
 with the same six failures as listed in comment:6.

 From
 [https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-30.html#mysqld-8-0-30-charset
 MySQL 8.0.30 release notes]:
 > **Important Change:** A previous change renamed character sets having
 deprecated names prefixed with `utf8_` to use `utf8mb3_` instead. In this
 release, we rename the `utf8_` collations as well, using the `utf8mb3_`
 prefix; this is to make the collation names consistent with those of the
 character sets, not to rely any longer on the deprecated collation names,
 and to clarify the distinction between `utf8mb3` and `utf8mb4`. The names
 using the `utf8mb3_` prefix are now used exclusively for these collations
 in the output of `SHOW` statements such as `SHOW CREATE TABLE`, as well as
 in the values displayed in the columns of Information Schema tables
 including the `COLLATIONS` and `COLUMNS` tables.


 Replying to [comment:8 JavierCasares]:
 > - **utf8mb3**: is at [https://mariadb.com/docs/reference/mdb/character-
 sets/utf8mb3/ MariaDB 10.2] included.
 > - **utf8 -> utf8mb3**: forced at [https://mariadb.com/docs/reference/mdb
 /character-sets/utf8mb3/ MariaDB 10.6].
 >
 > Checking MySQL and MariaDB versions, all supported versions have support
 for utf8mb3, so we should update "utf8" for "utf8mb3" by default and do
 some testing.

 It is worth noting that WordPress does automatically upgrade to `utf8mb4`
 when possible, see comment:1:ticket:48285.

 Reading the MariaDB ticket [https://jira.mariadb.org/browse/MDEV-8334
 MDEV-8334 Rename utf8 to utf8mb3]:
 > In long terms we want the name `utf8` mean the full featured UTF-8.
 > We'll do a few preparatory steps:
 >
 > 1. Change the main name of the 3-byte character set from `utf8` to
 `utf8mb3` and make `utf8` alias for `utf8mb3`. This will change all `SHOW`
 and `INFORMATION_SCHEMA` output to display `utf8mb3` instead of `utf8`, as
 well as change `mysqldump` to dump `utf8mb3` instead of just `utf8`.
 > 2. Add a new server option, say `--utf8-is-utf8mb3`, which will be
 `true` by default, but the DBA will be able to change it to false and thus
 make `utf8` mean `utf8mb4`.
 > 3. A few releases later we'll change `--utf8-is-utf8mb3` to be `false`
 by default.
 >
 > Or
 >
 > 2. Do not add any new server options and
 > 3. Add a new `old_mode` value for reverting `utf8` to `utf8mb3` when the
 default will mean `utf8mb4`.

 The latter appears to be [https://mariadb.com/kb/en/mariadb-1061-release-
 notes/#character-sets implemented in MariaDB 10.6.1].

 Also reading the MySQL note on [https://dev.mysql.com/doc/refman/8.0/en
 /charset-unicode-utf8mb3.html The utf8mb3 Character Set (3-Byte UTF-8
 Unicode Encoding)]:
 > Historically, MySQL has used `utf8` as an alias for `utf8mb3`; beginning
 with MySQL 8.0.28, `utf8mb3` is used exclusively in the output of `SHOW`
 statements and in Information Schema tables when this character set is
 meant.
 >
 > At some point in the future `utf8` is expected to become a reference to
 `utf8mb4`. To avoid ambiguity about the meaning of `utf8`, consider
 specifying `utf8mb4` explicitly for character set references instead of
 `utf8`.
 >
 > You should also be aware that the `utf8mb3` character set is deprecated
 and you should expect it to be removed in a future MySQL release. Please
 use `utf8mb4` instead.

 If the long-term goal of both projects is to make `utf8` an alias for
 `utf8mb4` as mentioned above, it seems like `utf8mb3` is an intermediate
 step, and there is no need for WordPress to use that as the default
 charset at this time, since it already uses `utf8mb4` when possible.

 I believe the only changes required here would be:
 * Adding `utf8mb3_bin` and `utf8mb3_general_ci` to the list of safe
 collations recognized by `wpdb::check_safe_collation()`. This would be the
 only change for WordPress core.
 * Adding some conditional version checking for the expected test results
 as suggested in comment:1. This would only affect the unit tests.

 See [attachment:"53623.diff"]. Tested on:
 * MariaDB 10.6.8
 * MySQL 8.0.25
 * MySQL 8.0.27
 * MySQL 8.0.28
 * MySQL 8.0.29
 * MySQL 8.0.30

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/53623#comment:11>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list