[wp-trac] [WordPress Trac] #40150: VARCHAR/TEXT inefficiencies

WordPress Trac noreply at wordpress.org
Tue Mar 14 00:45:12 UTC 2017


#40150: VARCHAR/TEXT inefficiencies
-------------------------+-----------------------------
 Reporter:  arjenlentz   |      Owner:
     Type:  enhancement  |     Status:  new
 Priority:  normal       |  Milestone:  Awaiting Review
Component:  Database     |    Version:  4.7.3
 Severity:  normal       |   Keywords:
  Focuses:               |
-------------------------+-----------------------------
 From conversation with Matty Cohen, a selection of structural improvement
 suggestions, for performance and future-proofing. Note that these aspects
 have recently been applied to WooCommerce already, resulting in a much
 optimised/modernised default table structure.

 Current some of the TEXT and VARCHAR use in WP Core appear to be based on
 old information - that is, restrictions and behaviour from MySQL versions
 long ago. This has consequences for performance.

 First, VARCHAR.

 In MySQL 4.0, VARCHAR used to be restricted to 255 max. In MySQL 4.1
 character sets such as UTF8 were introduced and MySQL 5.1 supports
 VARCHARs up to 64K-1 in byte length. Thus, any occurrence of VARCHAR(255)
 indicates some old style logic that needs to be reviewed.

 Why not just set the maximum length possible? Well...
 A VARCHAR is subject to the character set it's in, for UTF8 this means
 either 3 or 4 (utf8mb4) bytes per character can be used. So if one
 specifies VARCHAR(50) CHARSET utf8mb4, the actual byte length of the
 stored string can be up to 200 bytes.
 In stored row format, MySQL uses 1 byte for VARCHAR length when possible
 (depending on the column definition), and up to 2 bytes if necessary. So,
 specifying VARCHAR(255) unnecessarily means that the server has to use a 2
 byte length in the stored row.
 This may be viewed as nitpicking in the margin, however storage efficiency
 affects the number of rows that can fit on a data page and thus the amount
 of I/O required to manage a certain amount of rows. It all adds up, so
 having little unnecessary inefficiencies will cost - particularly for
 larger sites.

 Best practice is to set VARCHAR to the maximum necessary, not the maximum
 possible - otherwise, as per the above, the maximum possible is about
 16000 for utf8mb4, not 255 - and nobody would propose setting it to 16000,
 would they? But it's not much different, in stored row space a
 VARCHAR(255) requires a 2 byte length indicator just like VARCHAR(16000)
 would.

 So please review VARCHAR columns and set their definition to the maximum
 actually necessary, this is very unlikely to come out as 255. If 255, why
 not 300? Or rather 200? Or 60? Setting a proper number indicates that
 thought and data analysis has gone into the design. 255 looks sloppy.

 On to TEXT.

 TEXT (and LONGTEXT) columns are handled different in MySQL/MariaDB. First,
 a recap of some facts related to TEXT columns.

 The db server often needs to create a temporary table while processing a
 query. MEMORY tables cannot contain TEXT type columns, thus the temporary
 table created will be a disk-based one. Admittedly this will likely remain
 in the disk cache and never actually touch a disk, however it goes through
 file I/O functions and thus causes overhead - unnecessarily. Queries will
 be slower.

 InnoDB can store a TEXT column on a separate page, and only retrieve it
 when necessary (this also means that using SELECT * is needlessly
 inefficient - it's almost always better to specify only the columns that
 are required - this also makes code maintenance easier).

 A TEXT column can contain up to 64k-1 in byte length (4G for LONGTEXT). So
 essentially a TEXT column can store the same amount of data as a VARCHAR
 column (since MySQL 5.0), and we know that VARCHAR offers us benefits in
 terms of server behaviour. Thus, any instance of TEXT should be carefully
 reviewed and generally the outcome is to change to an appropriate VARCHAR.

 Using LONGTEXT is ok, if necessary. If the amount of data is not going to
 exceed say 16KB character length, using LONGTEXT is not warranted and
 again VARCHAR (not TEXT) is the most suitable column type.

 Applications don't need to care, so the db definition can be altered
 without any application impact.

 Summary:
 Particularly when combined with the best practice of not using SELECT *,
 using appropriately defined VARCHAR columns (rather than VARCHAR(255) or
 TEXT) can have a measurable and even significant performance impact on
 application environments. It is a worthwhile effort.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/40150>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list