[wp-trac] [WordPress Trac] #63864: Support RFC 2047 MIME-decoding / improve `wp_iso_descrambler()`

WordPress Trac noreply at wordpress.org
Sat Aug 23 01:58:35 UTC 2025


#63864: Support RFC 2047 MIME-decoding / improve `wp_iso_descrambler()`
--------------------------------------+-----------------------------
 Reporter:  dmsnell                   |       Owner:  (none)
     Type:  enhancement               |      Status:  new
 Priority:  low                       |   Milestone:  Future Release
Component:  Formatting                |     Version:  trunk
 Severity:  normal                    |  Resolution:
 Keywords:  has-patch has-unit-tests  |     Focuses:
--------------------------------------+-----------------------------
Description changed by dmsnell:

Old description:

> The existing `wp_iso_descrambler()` supports an extremely limited subset
> of MIME-encoded data. Specifically, it supports only the `Q` encoding and
> directly reads bytes from the encoded string instead of converting those
> bytes. It was added to fix an issue where subjects from inbound emails
> were “scrambled.”
>
> While this surely improved the situation in 2004 when many systems were
> sending `latin1` and where the system locale was `latin1`, it’s pretty
> insufficient today. WordPress could benefit from improving its support
> for RFC 2047 enabling proper reading of things like email subjects
> containing emoji.
>
> == Proposal
>
>  - Introduce `wp_decode_rfc2047()` for focus and clarity around the
> intention of what is happening. This communicates more clearly to
> developers and provides more opportunity to test and improve support for
> the function.
>  - Deprecate `wp_iso_descrambler()` and delegate its responsibility to
> `wp_decode_rfc2047()`. The unclear and inaccurate naming and description
> of this function leaves little room to substantially improve it.
>  - Require calling-code to indicate how to handle parsing errors for
> explicit recovery.
>
> RFC 2047 / MIME decoding is not too complicated in the “happy path.” It
> indicates the encoding of the escaped bytes and whether the escaping is
> via replacing certain bytes with their hex equivalent (the `Q` encoding)
> or replacing the whole byte sequence with a base64 representation (the
> `B` encoding).
>
> What remains is //uncertainty in the path of invalid encodings//. There
> may be standardized behaviors for handling parse errors and that would be
> ideal to incorporate into this enhancement.
>
> === What about `iconv_mime_decode()`?
>
> PHP provides [https://www.php.net/manual/en/function.iconv-mime-
> decode.php iconv_mime_decode()] whose purpose is the same as for this
> ticket. In supported environments it //may// be useful, though it’s not
> clear how it resolves parsing errors or what changes exactly are made by
> its options.
>
> If a PHP implementation is going to be required anyway to support
> runtimes lacking the `iconv` support then it makes sense to lean into a
> custom solution where WordPress can define the error-handling behaviors
> and change them as is appropriate, retaining full control over the
> behavior and specification.
>
> === A short background
>
> For those unfamiliar, email systems were based on 7-bit ASCII
> interchange. This posed challenges when attempting to communicate between
> systems which relied on 8-bit or multi-byte encodings. MIME encoding was
> introduced as a way of incorporating other character sets within the
> existing supported domain of 7-bit US-ASCII. The syntax was chosen with
> an attempt to minimize the chance of conflating intended plaintext with
> encoded text.
>
>  - Certain email headers may contain MIME-encoded strings.
>  - Spans of encoded text //MUST// not exceed 75 characters, but a single
> header may contain multiple sections of encoded text.
>  - When unable to decode the spans, it’s permitted to display the raw
> text of the encoding.
>
> == Examples
>
> === Before
>
> {{{#!php
> <?php
> var_dump( wp_iso_descrambler( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
> string(4) "��d�"
>
> var_dump( wp_iso_descrambler( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
> ASCII?B?SGVsbG8=?=' ) );
> string(33) "Café?= and =?US-ASCII?B?SGVsbG8="
>
> var_dump( wp_iso_descrambler( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
> string(24) "=?UTF-8?B?4q2QIOKtkA==?="
> }}}
>
> === After
>
> {{{#!php
> <?php
> var_dump( rfc2047_decode( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
> string(6) "Łódź"
>
> var_dump( rfc2047_decode( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
> ASCII?B?SGVsbG8=?=' ) );
> string(15) "Café and Hello"
>
> var_dump( rfc2047_decode( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
> string(7) "⭐ ⭐"
> }}}

New description:

 The existing `wp_iso_descrambler()` supports an extremely limited subset
 of MIME-encoded data. Specifically, it supports only the `Q` encoding and
 directly reads bytes from the encoded string instead of converting those
 bytes. It was added to fix an issue where subjects from inbound emails
 were “scrambled.”

 While this surely improved the situation in 2004 when many systems were
 sending `latin1` and where the system locale was `latin1`, it’s pretty
 insufficient today. WordPress could benefit from improving its support for
 RFC 2047 enabling proper reading of things like email subjects containing
 emoji.

 == Proposal

  - Introduce `wp_decode_rfc2047()` for focus and clarity around the
 intention of what is happening. This communicates more clearly to
 developers and provides more opportunity to test and improve support for
 the function.
  - Deprecate `wp_iso_descrambler()` and delegate its responsibility to
 `wp_decode_rfc2047()`. The unclear and inaccurate naming and description
 of this function leaves little room to substantially improve it.
  - Require calling-code to indicate how to handle parsing errors for
 explicit recovery.

 RFC 2047 / MIME decoding is not too complicated in the “happy path.” It
 indicates the encoding of the escaped bytes and whether the escaping is
 via replacing certain bytes with their hex equivalent (the “quoted” or `Q`
 encoding) or replacing the whole byte sequence with a base64
 representation (the “binary” or `B` encoding).

 What remains is //uncertainty in the path of invalid encodings//. There
 may be standardized behaviors for handling parse errors and that would be
 ideal to incorporate into this enhancement.

 === What about `iconv_mime_decode()`?

 PHP provides [https://www.php.net/manual/en/function.iconv-mime-decode.php
 iconv_mime_decode()] whose purpose is the same as for this ticket. In
 supported environments it //may// be useful, though it’s not clear how it
 resolves parsing errors or what changes exactly are made by its options.

 If a PHP implementation is going to be required anyway to support runtimes
 lacking the `iconv` support then it makes sense to lean into a custom
 solution where WordPress can define the error-handling behaviors and
 change them as is appropriate, retaining full control over the behavior
 and specification.

 === A short background

 For those unfamiliar, email systems were based on 7-bit ASCII interchange.
 This posed challenges when attempting to communicate between systems which
 relied on 8-bit or multi-byte encodings. MIME encoding was introduced as a
 way of incorporating other character sets within the existing supported
 domain of 7-bit US-ASCII. The syntax was chosen with an attempt to
 minimize the chance of conflating intended plaintext with encoded text.

  - Certain email headers may contain MIME-encoded strings.
  - Spans of encoded text //MUST// not exceed 75 characters, but a single
 header may contain multiple sections of encoded text.
  - When unable to decode the spans, it’s permitted to display the raw text
 of the encoding.

 == Examples

 === Before

 {{{#!php
 <?php
 var_dump( wp_iso_descrambler( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
 string(4) "��d�"

 var_dump( wp_iso_descrambler( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
 ASCII?B?SGVsbG8=?=' ) );
 string(33) "Café?= and =?US-ASCII?B?SGVsbG8="

 var_dump( wp_iso_descrambler( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
 string(24) "=?UTF-8?B?4q2QIOKtkA==?="
 }}}

 === After

 {{{#!php
 <?php
 var_dump( rfc2047_decode( '=?ISO-8859-2?Q?=A3=F3d=BC?=' ) );
 string(6) "Łódź"

 var_dump( rfc2047_decode( '=?UTF-8?Q?Caf=C3=A9?= and =?US-
 ASCII?B?SGVsbG8=?=' ) );
 string(15) "Café and Hello"

 var_dump( rfc2047_decode( '=?UTF-8?B?4q2QIOKtkA==?=' ) );
 string(7) "⭐ ⭐"
 }}}

--

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63864#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list