[wp-hackers] Escaping post meta values

Justas Butkus jbutkus at time.ly
Wed May 22 20:15:35 UTC 2013


Hello, Otto.

Thank your for taking time and writing this extensive response.

Actually I was thinking on the lines of "more complex case", that you 
described. Where complex data is generated by some PHP process. For 
example if I were to store some measurement values - I would choose an 
array, or an associative array. Structure simple, and complex, at the 
same time.

I might pass it directly to the storage function, and I would get what I 
expect: array stored, array retrieved.

But then, beneath this, there is a `serialize()`.
It is great, as it allows to implement callbacks, and allows to 
serialize data of any complexity, that PHP may handle.
On the other hand - due to this it's performance is not that good, when 
comparing with `json_encode()`, for example, when serializing rather 
simple structures (lists, hashes, etc.).

So, given that, I might use `json_encode()` and then pass the resulting 
data to meta storage function, where I expect it to be stored as-is (as 
a string, what it seems like).

That's just one use case, I came up with.
To me it appears as rather legitimate, for developer to try to build 
upon WordPress foundation, and try to use some sub-set of features, 
where possible, to achieve better performance, even in small parts.

Closing note - thank you for sharing your view, that was really insightful.


-- 
Regards,
Justas

2013.05.22 20:51, Otto rašė:
> On Wed, May 22, 2013 at 12:06 PM, Justas Butkus <jbutkus at time.ly> wrote:
>> How do you feel about performance issue of this question?
>> I am not questioning the fundamental feature of WordPress (namely -
>> backwards compatibility), just asking, whereas this could be considered,
>> when talking about such functions?
>
> To tackle the performance question, you have to examine the more
> common behavior being used.
>
> For the basic case, I'm storing information gained from some simple
> process in the PHP. My data is generated or gathered from the user
> input, and is plain text say. In that case, the storage of the meta is
> simple, and has no issues.
>
> For a more complex case, I'm storing information gained from some
> complex PHP process. Say, an array, or an object. In this case, the
> data is serialized on saving to the DB, and unserialized when
> retrieving it. Again, no issues.
>
> The only question of performance comes from when the data is gathered
> from an external source. To pick an example, let's say I get data from
> the Flickr API.
>
> The data comes back from Flickr in a JSON format. Now, what am I doing
> with that data? This is the key question that makes the answer
> possible. There's two possibilities for usage of this Flickr JSON
> structure:
>
> 1) I'm decoding it and using pieces of it to display something in the post.
> 2) I'm passing some or all of the data on to a Javascript process, or
> making it otherwise available via an external API call.
>
> For the first case, then I can either json_decode it when I receive
> the data from Flickr, or every time I display the data in some manner.
> Obviously, decoding takes time and though that time is small, it makes
> more sense to decode it one time, get the data I actually need, and
> save that in the meta storage. Saving the whole blob means I'm a)
> saving data I may not use and b) having to decode it every time I need
> it. Performance is better if I convert to PHP variables first and
> discard the unnecessary pieces.
>
> For the second case, then it would indeed make more sense to store the
> raw data, if all that raw data is needed by the resulting final
> process using it. API calls often tend to return more than we actually
> need though, so in terms of space savings in the DB, it makes more
> sense to decode the data and pare it down to what I need to use, then
> pass that and only that data along later. We still have to json_encode
> the data later, but it's probably substantially less data. This
> tradeoff is difficult to measure in the general sense, and you'd need
> to profile your exact case to know the faster approach.
>
> Realistically, the first case is probably more common. The reason to
> get data from an external service is to use that data, generally
> speaking. It's rare that you pass that data on to some third system.
> And even if you are passing it to a browser via JS, it's better to
> pass small amounts of data instead of relaying large API responses.
>
> Storing whole API responses from external calls, unaltered, rarely
> makes sense from a performance viewpoint. Sometimes, yes. But not
> often. Better to decode the moment you receive the data, then
> manipulate it there, then store just the pieces you need. Smaller.
> Faster.
>
> -Otto
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers



More information about the wp-hackers mailing list