[wp-trac] [WordPress Trac] #53623: MariaDB 10.6 renamed utf8 to utf8mb3
WordPress Trac
noreply at wordpress.org
Mon Aug 22 15:20:18 UTC 2022
#53623: MariaDB 10.6 renamed utf8 to utf8mb3
--------------------------+---------------------
Reporter: skithund | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: 6.1
Component: Database | Version:
Severity: normal | Resolution:
Keywords: has-patch | Focuses:
--------------------------+---------------------
Changes (by SergeyBiryukov):
* keywords: needs-patch => has-patch
* milestone: Future Release => 6.1
Comment:
Replying to [comment:4 ayeshrajans]:
> MySQL 8.0.26 also has related changes:
https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-26.html
>
> ----
>
> These statements now report `utf8mb3` rather than `utf8` when writing
character set names: `EXPLAIN`, `SHOW CREATE PROCEDURE`, `SHOW CREATE
EVENT`.
>
> Stored program definitions retrieved from the data dictionary now report
`utf8mb3` rather than `utf8` in character set references. This affects any
output produced from those definitions, such as `SHOW CREATE` statements.
>
> This error message now reports `utf8mb3` rather than `utf8` when writing
character set names: `ER_INVALID_CHARACTER_STRING`.
Thanks! These changes are indeed related, but they don't appear to cause
the test failures here.
In my testing, the current tests still pass on MySQL up until version
8.0.29, which is
[https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html no
longer available for download], but has some more
[https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html#mysqld-8-0-29-charset
character set support changes]. The tests start failing on MySQL 8.0.30,
with the same six failures as listed in comment:6.
From
[https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-30.html#mysqld-8-0-30-charset
MySQL 8.0.30 release notes]:
> **Important Change:** A previous change renamed character sets having
deprecated names prefixed with `utf8_` to use `utf8mb3_` instead. In this
release, we rename the `utf8_` collations as well, using the `utf8mb3_`
prefix; this is to make the collation names consistent with those of the
character sets, not to rely any longer on the deprecated collation names,
and to clarify the distinction between `utf8mb3` and `utf8mb4`. The names
using the `utf8mb3_` prefix are now used exclusively for these collations
in the output of `SHOW` statements such as `SHOW CREATE TABLE`, as well as
in the values displayed in the columns of Information Schema tables
including the `COLLATIONS` and `COLUMNS` tables.
Replying to [comment:8 JavierCasares]:
> - **utf8mb3**: is at [https://mariadb.com/docs/reference/mdb/character-
sets/utf8mb3/ MariaDB 10.2] included.
> - **utf8 -> utf8mb3**: forced at [https://mariadb.com/docs/reference/mdb
/character-sets/utf8mb3/ MariaDB 10.6].
>
> Checking MySQL and MariaDB versions, all supported versions have support
for utf8mb3, so we should update "utf8" for "utf8mb3" by default and do
some testing.
It is worth noting that WordPress does automatically upgrade to `utf8mb4`
when possible, see comment:1:ticket:48285.
Reading the MariaDB ticket [https://jira.mariadb.org/browse/MDEV-8334
MDEV-8334 Rename utf8 to utf8mb3]:
> In long terms we want the name `utf8` mean the full featured UTF-8.
> We'll do a few preparatory steps:
>
> 1. Change the main name of the 3-byte character set from `utf8` to
`utf8mb3` and make `utf8` alias for `utf8mb3`. This will change all `SHOW`
and `INFORMATION_SCHEMA` output to display `utf8mb3` instead of `utf8`, as
well as change `mysqldump` to dump `utf8mb3` instead of just `utf8`.
> 2. Add a new server option, say `--utf8-is-utf8mb3`, which will be
`true` by default, but the DBA will be able to change it to false and thus
make `utf8` mean `utf8mb4`.
> 3. A few releases later we'll change `--utf8-is-utf8mb3` to be `false`
by default.
>
> Or
>
> 2. Do not add any new server options and
> 3. Add a new `old_mode` value for reverting `utf8` to `utf8mb3` when the
default will mean `utf8mb4`.
The latter appears to be [https://mariadb.com/kb/en/mariadb-1061-release-
notes/#character-sets implemented in MariaDB 10.6.1].
Also reading the MySQL note on [https://dev.mysql.com/doc/refman/8.0/en
/charset-unicode-utf8mb3.html The utf8mb3 Character Set (3-Byte UTF-8
Unicode Encoding)]:
> Historically, MySQL has used `utf8` as an alias for `utf8mb3`; beginning
with MySQL 8.0.28, `utf8mb3` is used exclusively in the output of `SHOW`
statements and in Information Schema tables when this character set is
meant.
>
> At some point in the future `utf8` is expected to become a reference to
`utf8mb4`. To avoid ambiguity about the meaning of `utf8`, consider
specifying `utf8mb4` explicitly for character set references instead of
`utf8`.
>
> You should also be aware that the `utf8mb3` character set is deprecated
and you should expect it to be removed in a future MySQL release. Please
use `utf8mb4` instead.
If the long-term goal of both projects is to make `utf8` an alias for
`utf8mb4` as mentioned above, it seems like `utf8mb3` is an intermediate
step, and there is no need for WordPress to use that as the default
charset at this time, since it already uses `utf8mb4` when possible.
I believe the only changes required here would be:
* Adding `utf8mb3_bin` and `utf8mb3_general_ci` to the list of safe
collations recognized by `wpdb::check_safe_collation()`. This would be the
only change for WordPress core.
* Adding some conditional version checking for the expected test results
as suggested in comment:1. This would only affect the unit tests.
See [attachment:"53623.diff"]. Tested on:
* MariaDB 10.6.8
* MySQL 8.0.25
* MySQL 8.0.27
* MySQL 8.0.28
* MySQL 8.0.29
* MySQL 8.0.30
--
Ticket URL: <https://core.trac.wordpress.org/ticket/53623#comment:11>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list