[wp-trac] [WordPress Trac] #36456: oEmbeds containing Emojis aren't cached if the meta_value column's charset is utf8 and not utf8mb4

WordPress Trac noreply at wordpress.org
Sat Apr 9 10:56:03 UTC 2016


#36456: oEmbeds containing Emojis aren't cached if the meta_value column's charset
is utf8 and not utf8mb4
--------------------------+-----------------------------
 Reporter:  birgire       |      Owner:
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Embeds        |    Version:  4.4.2
 Severity:  normal        |   Keywords:
  Focuses:                |
--------------------------+-----------------------------
 oEmbeds (e.g. Instagram, Twitter, ... ) that contains Emojis aren't cached
 in the {{{wp_postmeta}}} table, if the {{{meta_value}}} column's charset
 is reported as {{{utf8}}} and not {{{utf8mb4}}}.

 This means that {{{wp_oembed_get()}}} is called for every such embed on
 each page load.

 Here's an example of an Instagram containing an Emoji:

 https://www.instagram.com/p/_cj9NXvhrh/

 where only the:

 {{{
 _oembed_time_02ba39bcc45811ad672a288b772ca9a2
 }}}

 is written to the {{{wp_postmeta}}} table but not

 {{{
 _oembed_02ba39bcc45811ad672a288b772ca9a2
 }}}

 containing the cached oEmbed content.


 I just noticed this for a site with a shared database server with MySQL
 version < 5.5.3.

 In the {{{WP_Embed::shortcode()}}} method we have following:

 {{{
 // Maybe cache the result
 if ( $html ) {
         update_post_meta( $post_ID, $cachekey, $html );
         update_post_meta( $post_ID, $cachekey_time, time() );
 } elseif ( ! $cache ) {
         update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
 }
 }}}

 This doesn't take care of the situation when {{{update_postmeta()}}} fails
 to write to the database.

 For new meta inserts it uses:

 {{{
 $result = $wpdb->insert( $table, array(
         $column => $object_id,
         'meta_key' => $meta_key,
         'meta_value' => $meta_value
     ) );
 }}}

 This is a wrapper for:

 {{{
 wpdb::_insert_replace_helper()
 }}}

 that contains this part:

 {{{
 $data = $this->process_fields( $table, $data, $format );
 if ( false === $data ) {
     return false;
 }
 }}}

 And similarly within {{{wpdb::process_fields()}}} we have:

 {{{
 $converted_data = $this->strip_invalid_text( $data );
 if ( $data !== $converted_data ) {
     return false;
 }
 }}}

 and this can return {{{false}}} for some data conversions.

 == Example ==

 We can test this with a simple example:

 {{{
 update_post_meta( 123, 'test', 'Smile 😍' );
 }}}

 that returns {{{false}}} if:

 {{{
 $wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' )
 }}}

 returns {{{'utf8'}}} and not {{{'utf8mb4'}}}.

 In this case nothing is written to the {{{wp_postmeta}}} table.

 == Workaround ==

 Here's what I constructed as a temporary workaround:

 {{{
 /**
  * Encode Emojis in all oembed results, if the charset for the meta_value
 column is utf8.
  */
 add_filter( 'oembed_result', function( $html, $url, $args ) use ( $wpdb )
 {
         /**
          * For WordPress 4.2+ where the get_col_charset method and
 wp_encode_emoji function where introduced.
          * The charset check is based on the
 https://codex.wordpress.org/Function_Reference/wp_encode_emoji#Example
          */
         if (   method_exists( $wpdb, 'get_col_charset' )
                  && 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta,
 'meta_value' )
          )
                 $html = wp_encode_emoji( $html );

         return $html;
 }, 10, 3 );

 }}}

 == Suggestions ==

 I see two possibilities to rewrite this part:

 {{{
 // Maybe cache the result
 if ( $html ) {
         update_post_meta( $post_ID, $cachekey, $html );
         update_post_meta( $post_ID, $cachekey_time, time() );
 } elseif ( ! $cache ) {
         update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
 }
 }}}

 to fix the caching.

 '''Suggestion 1)'''

 {{{
 // Maybe cache the result
 if ( $html ) {

     $updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );

     if( $updated_cachekey ) {
         update_post_meta( $post_ID, $cachekey_time, time() );
      } else {
         update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
      }

  } elseif ( ! $cache ) {
      update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
  }
 }}}

 This means that only the link will show up if {{{$updated_cachekey}}} is
 {{{false}}} and the cache key for the time isn't written to the db as
 well.

 We could also rewrite this to only contain a single
 {{{update_post_meta()}}} call.

 '''Suggestion 2)'''

 We could also extend 1) to encode the Emojis if needed:

 {{{
 // Maybe cache the result
 if ( $html ) {

     // Encode Emojis if the charset of the meta_value column is 'utf8'
     if (    method_exists( $wpdb, 'get_col_charset' )
          && 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta,
 'meta_value' )
          && function_exists( 'wp_encode_emoji' )
     ) {
         $html = wp_encode_emoji( $html );
     }

     $updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );

     if( $updated_cachekey ) {
         update_post_meta( $post_ID, $cachekey_time, time() );
      } else {
         update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
      }

  } elseif ( ! $cache ) {
      update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
  }

 }}}

 where we encode the Emojis when the meta_value column's charset is utf8.
 The method/function checks would be needed since they are only available
 in WP 4.2+.

 This should give us the cached content instead of only the link.

 PS: For the above case the Query Monitor plugin reports 4 HTTP GET
 requests for a single instagram oEmbed url, that contains an Emoji.

 Example:

 For https://www.instagram.com/p/_cj9NXvhrh/ they are:

 {{{
 https://api.instagram.com/oembed
 ?maxwidth=640
 &maxheight=960
 &url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F
 &format=json

 https://api.instagram.com/publicapi/oembed/
 ?maxwidth=640
 &maxheight=960
 &url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F
 &format=json

 https://www.instagram.com/publicapi/oembed/
 ?maxwidth=640
 &maxheight=960
 &url=https://www.instagram.com/p/_cj9NXvhrh/
 &format=json

 https://api.instagram.com/oembed/
 ?maxwidth=640
 &maxheight=960
 &url=https://www.instagram.com/p/_cj9NXvhrh/
 &format=json

 }}}

 I'm not sure why but I only get a single request for a Twitter oEmbed
 containing Emojis.

 PPS: Sorry for the long post.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/36456>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list