[wp-trac] [WordPress Trac] #62940: wp_mail(): Address header parsing is not RFC-5322 complient and fails on quoted-string when including a "<", ">" or ", "

WordPress Trac noreply at wordpress.org
Tue Feb 11 05:41:21 UTC 2025


#62940: wp_mail(): Address header parsing is not RFC-5322 complient and fails on
quoted-string when including a "<", ">" or ","
--------------------------+-----------------------------
 Reporter:  bhujagendra   |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Mail          |    Version:  trunk
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 == Problem

 The following `From` field values result in an error, although both comply
 with RFC 2822:

 ==== **Case 1: quoted printable:**
    - Input: {{{From: "test<stage" <test at example.com>}}}
    - Result:
      - **$from_name**: `"test`
      - **$from_email**: `stage" <test at example.com`

 ==== **Case 2: multiple from addresses:**
    - Input: {{{From: test <test at example.com>, User <user at example.com>}}}
    - Result:
      - **$from_name**: `test`
      - **$from_email**: `test at example.com, User <user at example.com`

 ==== **Case 3: similar problem for `to`, `cc`, `bcc`, and `reply_to`
 fields! **
    - Input: {{{To: test <test at example.com>, "Joe, Doe"
 <joe.doe at example.com>}}}
    - Result:
      - **address 1**: `test <test at example.com>`
      - **address 2**: `"Joe`
      - **address 3**: `Doe" <joe.doe at example.com>`

 == Reason
 The [https://www.php.net/manual/en/function.mail.php php mail()
 documentation] does not explicitly state that the `From` header needs to
 comply with RFC 2822. But it does say so for the `To` header. Furthermore,
 the current implementation does already support the `name-
 addr`-[https://datatracker.ietf.org/doc/html/rfc2822#section-3.4 syntax]
 (i.e. `[display-name] angle-addr`), which was not allowed in the preceding
 [https://datatracker.ietf.org/doc/html/rfc822#section-4.4.1 RFC 822].

 However, `wp_mail` does ignore the facts
 - that `display-name` may include `<` (case !#1 above), and
 - that the From field may actually include multiple addresses.

 ==== RFC 2822 Specification

 - `From` field value is `mailbox-list`
 [https://datatracker.ietf.org/doc/html/rfc2822#section-3.6.2 section
 3.6.2]
 - `mailbox-list` is a single `mailbox`, or two or more comma-separated
 `mailbox, mailbox`
 [https://datatracker.ietf.org/doc/html/rfc2822#section-3.4 section 3.4]
 - `mailbox` is either just an `addr-spec` (e-mail address) or a `name-
 addr` [https://datatracker.ietf.org/doc/html/rfc2822#section-3.4 section
 3.4]
 - `display-name` is a `phrase`
 [https://datatracker.ietf.org/doc/html/rfc2822#section-3.4 section 3.4]
 - `phrase` is a `word` and hence a `quoted-string`
 [https://datatracker.ietf.org/doc/html/rfc2822#section-3.2.6 section
 3.2.6]
 - `quoted-string`
 [https://datatracker.ietf.org/doc/html/rfc2822#section-3.2.5 (i.e. a
 string in double quotes)] may include any US-ASCII prinatable character
 except for the double quote (`"`) and the backslash (`\`). And as such
 also the `<`.


 ==== Current Implementation

 The parsing of the field simply splits the two parts of the `name-addr` at
 the position of the first occurrence of `<`, see line 2 of the code in
 question [https://github.com/WordPress/wordpress-
 develop/blob/d71f29ff0899c6eb29bc037a7c521a966405cb35/src/wp-
 includes/pluggable.php#L301-L319 (pluggable.php:301-319)]:

 {{{#!php
 <?php
 case 'from':
         $bracket_pos = strpos( $content, '<' );
         if ( false !== $bracket_pos ) {
                 // Text before the bracketed email is the "From" name.
                 if ( $bracket_pos > 0 ) {
                         $from_name = substr( $content, 0, $bracket_pos );
                         $from_name = str_replace( '"', '', $from_name );
                         $from_name = trim( $from_name );
                 }

                 $from_email = substr( $content, $bracket_pos + 1 );
                 $from_email = str_replace( '>', '', $from_email );
                 $from_email = trim( $from_email );

                 // Avoid setting an empty $from_email.
         } elseif ( '' !== trim( $content ) ) {
                 $from_email = trim( $content );
         }
         break;

 }}}

 Similarly, for the `to`, `cc`, `bcc`, and `reply_to` fields:
     - [https://github.com/WordPress/wordpress-
 develop/blob/d71f29ff0899c6eb29bc037a7c521a966405cb35/src/wp-
 includes/pluggable.php#L336-L344 Splitting the field value at every comma]
 (which may be quoted)
     - [https://github.com/WordPress/wordpress-
 develop/blob/d71f29ff0899c6eb29bc037a7c521a966405cb35/src/wp-
 includes/pluggable.php#L445-L456 Ignoring possible quotes when extracting
 name and address from the different values]

 == Discussion

 Unfortunately there is no simple solution:

 - Correct parsing - according to the standard outlined in RFC 2822 - would
 probably require the use of a real parser (as opposed to searching the
 entire string for certain tokens).
   - `PHPMailer::parseAddresses()` [https://github.com/WordPress/wordpress-
 develop/blob/d71f29ff0899c6eb29bc037a7c521a966405cb35/src/wp-
 includes/PHPMailer/PHPMailer.php#L1222 method] cannot be used, as it lacks
 the same shortcoming, if the IMAP extension is not installed.
   - [https://github.com/zbateson/mail-mime-parser/ mail-mime-parser] could
 work as an alternative. But it's a whole other level of complexity and
 only supports php v8+
 - Even if correct parsing would be supported and `$from_name` as well as
 `$from_email` correctly assigned, this would only work for
 single-`mailbox` entries. Multiple from addresses are not supported by
   - the the `wp_mail_from`-[https://github.com/WordPress/wordpress-
 develop/blob/d71f29ff0899c6eb29bc037a7c521a966405cb35/src/wp-
 includes/pluggable.php#L397 filter] and
 `wp_mail_from_name`-[https://github.com/WordPress/wordpress-
 develop/blob/d71f29ff0899c6eb29bc037a7c521a966405cb35/src/wp-
 includes/pluggable.php#L406 filter]
   - the `PHPMailer\PHPMailer\PHPMailer`'s `createHeader`
 [https://github.com/WordPress/wordpress-
 develop/blob/d71f29ff0899c6eb29bc037a7c521a966405cb35/src/wp-
 includes/PHPMailer/PHPMailer.php#L2683 method]

 === Weak implementation

 It must be noted, that the current implementation
 [https://core.trac.wordpress.org/search?changeset=on&ticket=on&q=wp_mail+%3C&page=6&noquickjump=1
 has repeatedly had bugs and regressions] (although it has been quiet
 around this in the last years):
 - #23243
 - #19847
 - #18463
 - #17305
 - #1593


 === Pragmatic proposal

 - Introduce a new filter `wp_mail_from_raw` at the beginning of the `From`
 field parsing:
   {{{#!php
 <?php
 /**
  * Filters the raw "From" header field before parsing it.
  *
  * @since 7.6.2
  *
  * @param string $content Form header content.
  * @param string $header Full raw header.
  */
 $content = apply_filters( 'wp_mail_from_raw', $content, $header );
 }}}
   This would allow a user (or plugin author) to mitigate the issue, if it
 happens on a given site. But it would save them the effort of parsing the
 headers entirely (as would be required with `wp_mail`).
 - Alternatively, the filter could be called for all headers in
 [https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-
 includes/pluggable.php#L300 line 300]
   {{{#!php
 <?php

 $origName = $name; // used later for the default case in line 347
 $name = strtolower( $name );

 /**
  * Filters the raw header fields before parsing them.
  *
  * @since 7.6.2
  *
  * @param string $content Header field's content.
  * @param string $name Header field's name.
  * @param string $header Full raw header.
  */
 $content = apply_filters( 'wp_mail_raw_header', $content, $name, $header
 );

 /**
  * Filters the raw header fields before parsing them.
  *
  * @since 7.6.2
  *
  * @param string $content Header field's content.
  * @param string $name Header field's name.
  * @param string $header Full raw header.
  */
 $content = apply_filters( "wp_mail_header_$name", $content, $header );

 // ...

 // Line 347 needs to be updated:
 // $headers[ trim( $name ) ] = trim( $content );
 $headers[ $origName ] = $content; // both values are already trimmed in
 lines 296 & 297
 }}}
 - Potentially use a RFC-compliant parser to avoid future errors with
 quoted address names at least in the other address fields
 - Use an instance of an object implementing `ArrayAccess` and
 `Stringable`. This can be used to store one or a set of addresses/names in
 the `$from_email`/`wp_mail_from` and `$from_name`/`wp_mail_from_name`
 values and is sent through the respective filters. By this it is basically
 backwards compatible.
 - Parse the `Sender` header, add a filter and default it to the first
 address of a multi-address From field of the default wordpress from
 address.
 - Enhance the `PHPMailer::setFrom()` method to accept `$address` and
 `$name` to be arrays. And therefore also the corresponting class fields
 `$From` and `$FromName` (maybe through a PR to
 [https://github.com/PHPMailer/PHPMailer/ PHPMailer])

 Happy to help with a PR. I would just love to know what would be accepted
 in there.

 Thanks,
 Bhujagendra

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62940>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list