[wp-trac] [WordPress Trac] #64842: Upload problems with Umlauts in ID3 Tags
WordPress Trac
noreply at wordpress.org
Thu Mar 19 22:17:02 UTC 2026
#64842: Upload problems with Umlauts in ID3 Tags
-------------------------------------+------------------------------
Reporter: claireschlamm | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Upload | Version: 6.9.1
Severity: normal | Resolution:
Keywords: has-patch needs-testing | Focuses:
-------------------------------------+------------------------------
Changes (by abhishekfdd):
* keywords: => has-patch needs-testing
Comment:
I was able to reproduce this. Uploading the example MP3 via **Media > Add
Media File** fails with "Could not insert attachment into database."
However, uploading the same file inside a post using the Audio or File
block succeeds.
This difference points to the two different code paths:
- **Media Library upload** uses `media_handle_upload()` in `wp-
admin/includes/media.php`.
- **Block editor upload** uses the REST API (`/wp/v2/media`) via
`WP_REST_Attachments_Controller`, which handles metadata differently.
**Root cause:**
The ID3v1 specification mandates ISO-8859-1 encoding for tag values.
German umlauts like `äöüÄÖÜß` are valid ISO-8859-1 characters, but they
are **not** valid UTF-8 byte sequences.
The `getID3` library (bundled in `wp-includes/ID3/`) is configured with
`$encoding = 'UTF-8'` and should convert ID3v1 tags from ISO-8859-1 to
UTF-8. However, in certain cases — particularly when files have both ID3v1
and ID3v2 tags, or when tag editors write non-standard encodings — the
conversion doesn't happen correctly.
In `wp_add_id3_tag_data()`, these potentially invalid-UTF-8 tag values are
passed through `wp_kses_post()`, which does not fix encoding issues. The
values then flow into `media_handle_upload()`:
1. `$meta['title']` is assigned directly to `$title` **without**
`sanitize_text_field()` (the filename-based title gets
`sanitize_text_field()`, but the ID3 title does not).
2. `$title`, `$meta['album']`, `$meta['artist']`, and `$meta['genre']` are
interpolated into `$content` via `sprintf()`.
3. Both `post_title` and `post_content` are passed to
`wp_insert_attachment()` → `wp_insert_post()`.
4. MySQL rejects the invalid UTF-8, and the insertion fails.
**Patch:**
Attaching `64842.3.diff` which addresses this in three ways:
1. **Introduces `_wp_id3_ensure_utf8()`** — a private helper in
`media.php` that detects invalid UTF-8 and converts from Windows-1252 (a
superset of ISO-8859-1 covering the ID3v1 spec encoding). This preserves
the actual umlaut characters rather than stripping them.
2. **Applies the conversion in `wp_add_id3_tag_data()`** — each tag value
is passed through `_wp_id3_ensure_utf8()` before `wp_kses_post()`, fixing
the encoding at the source.
3. **Adds `sanitize_text_field()` on the ID3 title** in
`media_handle_upload()` — currently the ID3-sourced title is assigned raw,
unlike the filename-based fallback.
I chose `mb_convert_encoding()` with `'Windows-1252'` source encoding over
`'ISO-8859-1'` because Windows-1252 is a strict superset (covers bytes
`0x80–0x9F` which ISO-8859-1 leaves undefined) and is what most real-world
tag editors actually use.
**Testing:**
1. Download the reporter's example file from `https://cba.media/wp-
content/uploads/example_with_umlaut.mp3`
2. Without patch: upload via Media > Add Media File → fails with "Could
not insert attachment into database"
3. With patch: upload succeeds; the attachment title and description
preserve the German umlauts correctly
4. Also verify that uploading the same file via the Audio/File block in
the editor still works (no regression)
5. Test with a file containing only ASCII ID3 tags to confirm no
regression on normal uploads
Note: The recent UTF-8 modernization work in #63863 (WordPress 6.9)
improves `wp_check_invalid_utf8()` with replacement characters, but that
function is designed for strings that are *nominally* UTF-8 with some bad
bytes. Here the problem is that the entire string is in a *different
encoding* (ISO-8859-1), so conversion is the correct approach rather than
replacement.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/64842#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list