[wp-trac] [WordPress Trac] #62483: maybe_serialize() does support double serialization, but does not inform the developer if doing so

WordPress Trac noreply at wordpress.org
Mon Mar 3 07:59:38 UTC 2025


#62483: maybe_serialize() does support double serialization, but does not inform
the developer if doing so
-------------------------+------------------------
 Reporter:  apermo       |       Owner:  audrasjb
     Type:  enhancement  |      Status:  reviewing
 Priority:  normal       |   Milestone:  6.8
Component:  General      |     Version:  3.6.1
 Severity:  normal       |  Resolution:
 Keywords:  has-patch    |     Focuses:
-------------------------+------------------------

Comment (by apermo):

 Replying to [comment:16 TimothyBlynJacobs]:
 > I agree, we should discourage developers from doing `add_option(
 serialize() )`. This does seem like something that could mostly be
 accomplished by a sniff, however.

 The problem is, the case where I stumbled upon it, wasn't a simple nested
 function call.

 It was rather similar to this:

 {{{#!php
 <?php
 $serialized_data = serialize( $data );
 //...
 update_post_meta( $post_id, 'some_key', $serialized_data );
 }}}

 And what I've learned from discussions with several members of the WPCS
 team, this is something that is not trivial to do.

 Especially in a scenario like this:
 {{{#!php
 <?php

 function my_store_alias( $data ) {
     update_post_meta( get_the_ID(), 'key', $data );
 }

 my_store_alias( serialize( $data ) );
 }}}

 One example is the `$wpdb->prepare()` sniff. It will only work if you
 nested it, if you move it out of your query function to improve
 readability, you'll end up with `// phpcs:ignore
 WordPress.DB.PreparedSQL.NotPrepared -- $query already prepared.` to
 silence the sniff. I've double checked this exact example with Juliette a
 few weeks ago.

 Replying to [comment:16 TimothyBlynJacobs]:
 > Firing an action is a bit better, but still seems heavy for code quality
 signal.

 I've had a thought about the performance. To better grasp my point made
 above I've added `maybe_serialize()` (with my suggested action) and
 `is_serialized()` here.

 Case 1: We enter `maybe_serialize()` with previously serialized data.
 * It's a string, so we'll go by the first check.
 * We go into `is_serialized()`
 * Depending on the exact string, we will go past 4-5 PHP core function
 calls, most likely including `preg_match()` once and 6+ `if`s and a
 `switch` statement.
 * Then we'll do my added `do_action()`

 Case 2: We enter `maybe_serialize()` with unserialized data.
 * If it's an array or object, we're already done and leave in the first
 condition.
 * For strings and other trivial types, we'll enter `is_serialized()`
 * For anything but strings we'll leave here.
 * No matter how the string is formed, at some point we'll leave the
 function, and my `do_action()` is never fired.

 So if the developer uses WordPress data storage in the way it was meant to
 be, the action is never fired.

 Replying to [comment:16 TimothyBlynJacobs]:
 > I'm still worried about developers taking it the wrong way and trying to
 remove all instances of double serialization "to increase their code
 quality".

 I think those developers who will find and use that action, will be
 capable enough not to break things. For those developers who might not
 even be aware of this issue, nothing will change.

 (Further points below)

 {{{#!php
 <?php
 /**
  * Serializes data, if needed.
  *
  * @since 2.0.5
  *
  * @param string|array|object $data Data that might be serialized.
  * @return mixed A scalar data.
  */
 function maybe_serialize( $data ) {
         if ( is_array( $data ) || is_object( $data ) ) {
                 return serialize( $data );
         }

         /*
          * Double serialization is required for backward compatibility.
          * See https://core.trac.wordpress.org/ticket/12930
          * Also the world will end. See WP 3.6.1.
          */
         if ( is_serialized( $data, false ) ) {
                 /**
                  * Allows the developer to track double serialization in
 their projects.
                  *
                  * While not broken, double serialization increases the
 complexity of data structures,
                  * pose a performance overhead, and increases the
 cognitive load when reading code and data.
                  *
                  * @since 6.8.0
                  *
                  * @param string $data Already serialized data.
                  */
                 do_action( 'maybe_serialize_data_is_serialized', $data );
                 return serialize( $data );
         }

         return $data;
 }
 }}}
 {{{#!php
 /**
  * Checks value to find if it was serialized.
  *
  * If $data is not a string, then returned value will always be false.
  * Serialized data is always a string.
  *
  * @since 2.0.5
  * @since 6.1.0 Added Enum support.
  *
  * @param string $data   Value to check to see if was serialized.
  * @param bool   $strict Optional. Whether to be strict about the end of
 the string. Default true.
  * @return bool False if not serialized and true if it was.
  */
 function is_serialized( $data, $strict = true ) {
         // If it isn't a string, it isn't serialized.
         if ( ! is_string( $data ) ) {
                 return false;
         }
         $data = trim( $data );
         if ( 'N;' === $data ) {
                 return true;
         }
         if ( strlen( $data ) < 4 ) {
                 return false;
         }
         if ( ':' !== $data[1] ) {
                 return false;
         }
         if ( $strict ) {
                 $lastc = substr( $data, -1 );
                 if ( ';' !== $lastc && '}' !== $lastc ) {
                         return false;
                 }
         } else {
                 $semicolon = strpos( $data, ';' );
                 $brace     = strpos( $data, '}' );
                 // Either ; or } must exist.
                 if ( false === $semicolon && false === $brace ) {
                         return false;
                 }
                 // But neither must be in the first X characters.
                 if ( false !== $semicolon && $semicolon < 3 ) {
                         return false;
                 }
                 if ( false !== $brace && $brace < 4 ) {
                         return false;
                 }
         }
         $token = $data[0];
         switch ( $token ) {
                 case 's':
                         if ( $strict ) {
                                 if ( '"' !== substr( $data, -2, 1 ) ) {
                                         return false;
                                 }
                         } elseif ( ! str_contains( $data, '"' ) ) {
                                 return false;
                         }
                         // Or else fall through.
                 case 'a':
                 case 'O':
                 case 'E':
                         return (bool) preg_match( "/^{$token}:[0-9]+:/s",
 $data );
                 case 'b':
                 case 'i':
                 case 'd':
                         $end = $strict ? '$' : '';
                         return (bool) preg_match(
 "/^{$token}:[0-9.E+-]+;$end/", $data );
         }
         return false;
 }
 }}}

 I've also just found a wrong type hint in `is_serialized()`, `$data`
 should be `mixed` and not `string`, I've fixed it in both approaches.

 And to anticipate:
 **Who will use it?**
 Well... I will, and I will spread the word to other professional
 teams/developers. So there won't be many from the start, but I have hopes
 that it will propagate and make people aware of this.

 **What other alternatives already exist to catch this?**

 We're talking about all meta tables and the option table.

 So I could hook into `update_{$meta_type}_metadata` (4x),
 `added_{$meta_type}_meta`(4x), `add_option` and `update_option` to achieve
 the same, this will likely work, but it won't be as precise. Though
 arguable, it should be a dev environment only tool, so if my arguments did
 not convince you, yes I can work with that if I must.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/62483#comment:17>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list