[wp-trac] [WordPress Trac] #63864: Support RFC 2047 MIME-decoding / improve `wp_iso_descrambler()`
WordPress Trac
noreply at wordpress.org
Sat Aug 23 01:58:35 UTC 2025
#63864: Support RFC 2047 MIME-decoding / improve `wp_iso_descrambler()`
--------------------------------------+-----------------------------
Reporter: dmsnell | Owner: (none)
Type: enhancement | Status: new
Priority: low | Milestone: Future Release
Component: Formatting | Version: trunk
Severity: normal | Resolution:
Keywords: has-patch has-unit-tests | Focuses:
--------------------------------------+-----------------------------
Description changed by dmsnell:
Old description:
> The existing `wp_iso_descrambler()` supports an extremely limited subset
> of MIME-encoded data. Specifically, it supports only the `Q` encoding and
> directly reads bytes from the encoded string instead of converting those
> bytes. It was added to fix an issue where subjects from inbound emails
> were “scrambled.”
>
> While this surely improved the situation in 2004 when many systems were
> sending `latin1` and where the system locale was `latin1`, it’s pretty
> insufficient today. WordPress could benefit from improving its support
> for RFC 2047 enabling proper reading of things like email subjects
> containing emoji.
>
> == Proposal
>
> - Introduce `wp_decode_rfc2047()` for focus and clarity around the
> intention of what is happening. This communicates more clearly to
> developers and provides more opportunity to test and improve support for
> the function.
> - Deprecate `wp_iso_descrambler()` and delegate its responsibility to
> `wp_decode_rfc2047()`. The unclear and inaccurate naming and description
> of this function leaves little room to substantially improve it.
> - Require calling-code to indicate how to handle parsing errors for
> explicit recovery.
>
> RFC 2047 / MIME decoding is not too complicated in the “happy path.” It
> indicates the encoding of the escaped bytes and whether the escaping is
> via replacing certain bytes with their hex equivalent (the `Q` encoding)
> or replacing the whole byte sequence with a base64 representation (the
> `B` encoding).
>
> What remains is //uncertainty in the path of invalid encodings//. There
> may be standardized behaviors for handling parse errors and that would be
> ideal to incorporate into this enhancement.
>
> === What about `iconv_mime_decode()`?
>
> PHP provides [https://www.php.net/manual/en/function.iconv-mime-
> decode.php iconv_mime_decode()] whose purpose is the same as for this
> ticket. In supported environments it //may// be useful, though it’s not
> clear how it resolves parsing errors or what changes exactly are made by
> its options.
>
> If a PHP implementation is going to be required anyway to support
> runtimes lacking the `iconv` support then it makes sense to lean into a
> custom solution where WordPress can define the error-handling behaviors
> and change them as is appropriate, retaining full control over the
> behavior and specification.
>
> === A short background
>
> For those unfamiliar, email systems were based on 7-bit ASCII
> interchange. This posed challenges when attempting to communicate between
> systems which relied on 8-bit or multi-byte encodings. MIME encoding was
> introduced as a way of incorporating other character sets within the
> existing supported domain of 7-bit US-ASCII. The syntax was chosen with
> an attempt to minimize the chance of conflating intended plaintext with
> encoded text.
>
> - Certain email headers may contain MIME-encoded strings.
> - Spans of encoded text //MUST// not exceed 75 characters, but a single
> header may contain multiple sections of encoded text.
> - When unable to decode the spans, it’s permitted to display the raw
> text of the encoding.
>
> == Examples
>
> === Before
>
> {{{#!php
> <?php
> var_dump( wp_iso_descrambler( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
> string(4) "��d�"
>
> var_dump( wp_iso_descrambler( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
> ASCII?B?SGVsbG8=?=' ) );
> string(33) "Café?= and =?US-ASCII?B?SGVsbG8="
>
> var_dump( wp_iso_descrambler( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
> string(24) "=?UTF-8?B?4q2QIOKtkA==?="
> }}}
>
> === After
>
> {{{#!php
> <?php
> var_dump( rfc2047_decode( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
> string(6) "Łódź"
>
> var_dump( rfc2047_decode( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
> ASCII?B?SGVsbG8=?=' ) );
> string(15) "Café and Hello"
>
> var_dump( rfc2047_decode( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
> string(7) "⭐ ⭐"
> }}}
New description:
The existing `wp_iso_descrambler()` supports an extremely limited subset
of MIME-encoded data. Specifically, it supports only the `Q` encoding and
directly reads bytes from the encoded string instead of converting those
bytes. It was added to fix an issue where subjects from inbound emails
were “scrambled.”
While this surely improved the situation in 2004 when many systems were
sending `latin1` and where the system locale was `latin1`, it’s pretty
insufficient today. WordPress could benefit from improving its support for
RFC 2047 enabling proper reading of things like email subjects containing
emoji.
== Proposal
- Introduce `wp_decode_rfc2047()` for focus and clarity around the
intention of what is happening. This communicates more clearly to
developers and provides more opportunity to test and improve support for
the function.
- Deprecate `wp_iso_descrambler()` and delegate its responsibility to
`wp_decode_rfc2047()`. The unclear and inaccurate naming and description
of this function leaves little room to substantially improve it.
- Require calling-code to indicate how to handle parsing errors for
explicit recovery.
RFC 2047 / MIME decoding is not too complicated in the “happy path.” It
indicates the encoding of the escaped bytes and whether the escaping is
via replacing certain bytes with their hex equivalent (the “quoted” or `Q`
encoding) or replacing the whole byte sequence with a base64
representation (the “binary” or `B` encoding).
What remains is //uncertainty in the path of invalid encodings//. There
may be standardized behaviors for handling parse errors and that would be
ideal to incorporate into this enhancement.
=== What about `iconv_mime_decode()`?
PHP provides [https://www.php.net/manual/en/function.iconv-mime-decode.php
iconv_mime_decode()] whose purpose is the same as for this ticket. In
supported environments it //may// be useful, though it’s not clear how it
resolves parsing errors or what changes exactly are made by its options.
If a PHP implementation is going to be required anyway to support runtimes
lacking the `iconv` support then it makes sense to lean into a custom
solution where WordPress can define the error-handling behaviors and
change them as is appropriate, retaining full control over the behavior
and specification.
=== A short background
For those unfamiliar, email systems were based on 7-bit ASCII interchange.
This posed challenges when attempting to communicate between systems which
relied on 8-bit or multi-byte encodings. MIME encoding was introduced as a
way of incorporating other character sets within the existing supported
domain of 7-bit US-ASCII. The syntax was chosen with an attempt to
minimize the chance of conflating intended plaintext with encoded text.
- Certain email headers may contain MIME-encoded strings.
- Spans of encoded text //MUST// not exceed 75 characters, but a single
header may contain multiple sections of encoded text.
- When unable to decode the spans, it’s permitted to display the raw text
of the encoding.
== Examples
=== Before
{{{#!php
<?php
var_dump( wp_iso_descrambler( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
string(4) "��d�"
var_dump( wp_iso_descrambler( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
ASCII?B?SGVsbG8=?=' ) );
string(33) "Café?= and =?US-ASCII?B?SGVsbG8="
var_dump( wp_iso_descrambler( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
string(24) "=?UTF-8?B?4q2QIOKtkA==?="
}}}
=== After
{{{#!php
<?php
var_dump( rfc2047_decode( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
string(6) "Łódź"
var_dump( rfc2047_decode( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
ASCII?B?SGVsbG8=?=' ) );
string(15) "Café and Hello"
var_dump( rfc2047_decode( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
string(7) "⭐ ⭐"
}}}
--
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63864#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list