[wp-trac] [WordPress Trac] #39963: MIME Alias Handling

WordPress Trac noreply at wordpress.org
Fri Mar 17 17:11:14 UTC 2017


#39963: MIME Alias Handling
-------------------------+------------------------------
 Reporter:  blobfolio    |       Owner:
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  Media        |     Version:
 Severity:  normal       |  Resolution:
 Keywords:               |     Focuses:
-------------------------+------------------------------

Comment (by blobfolio):

 I put together a proof of concept patch:
 `wp-includes/functions.php` now contains:
   -- `wp_check_real_filetype()` (function and filter)
   -- `wp_check_mime_alias()` (function and filter)
   -- `wp_check_application_octet_stream` (filter)
   -- updated `wp_check_filetype_and_ext()` (see below)
 `wp-includes/media-mimes.php`
   -- `wp_get_mime_aliases()` (function and filter)
 `tests/phpunit/tests/functions.php`
   -- updated `big5.txt` test

 This demonstrates the benefits of MIME alias handling by allowing for more
 robust type matching, increased upload file validation (ALL files are
 subject to type evaluation when possible), and provides some UX
 improvements (ALL incorrectly named files, if otherwise valid, are renamed
 with the correct extension).

 `wp_check_real_filetype()` begins with a name-based approach (i.e.
 `wp_check_filetype()`). If that fails, the failure is passed on. If it
 succeeds, it attempts to evaluate the "real" type using EXIF (not yet
 implemented, waiting on #40017), or that failing, FILEINFO. If either
 evaluation succeeds, the "real" type is compared against the known aliases
 for the file extension. If the alias is good, the name-based type is
 returned (i.e. WordPress' hardcoded definitions take priority). If the
 real MIME does not match the extension, but it ''is'' whitelisted, that
 MIME and the *correct* extension are returned. If the "real" MIME is not
 whitelisted, `false` is returned. If no content-based evaluation can be
 performed, the name-based results are returned.

 `wp_check_mime_alias()` has automatic handling for temporary
 `x-subtype`/`subtype` variations (e.g. `application/font-woff` and
 `application/x-font-woff` are considered equivalent). By default it also
 soft-matches `application/octet-stream` against any extension, as that
 tends to be the response returned by a server when it doesn't know what a
 file is. That behavior can be overridden using the
 `wp_check_application_octet_stream` filter. All checks are case-
 insensitive and strip out invalid characters.

 `wp_check_filetype_and_ext()` is updated to call
 `wp_check_real_filetype()` instead of `wp_check_filetype()`. This covers
 whitelist checks and type-based evaluation. Renaming is now applied to all
 files, not just the small subset of image types in the original version.
 `image/*` and `application/*` checks are removed as unnecessary (all
 content is evaluated now). The ultimate determinations are still
 filterable as before.

 All original PHPUnit tests pass, with the exception of test which passes
 `big5.txt` as a JPEG; because of the improvements, the result is correctly
 identified and accepted as a `text/plain` file. ;) This patch updates the
 test accordingly.


 {{{
 $ phpunit tests/phpunit/tests/functions.php
 Installing...
 Running as single site... To run multisite, use -c
 tests/phpunit/multisite.xml
 Not running ajax tests. To execute these, use --group ajax.
 Not running ms-files tests. To execute these, use --group ms-files.
 Not running external-http tests. To execute these, use --group external-
 http.
 PHPUnit 5.4.6 by Sebastian Bergmann and contributors.

 ................................................................. 65 / 83
 ( 78%)
 ..................                                                83 / 83
 (100%)

 Time: 918 ms, Memory: 24.00Mb

 OK (83 tests, 565 assertions)
 }}}

 It is a lot to digest, I know. But the benefits are numerous. It mitigates
 the issues in #40175, but happens to do so in a way that ''increases''
 upload security, and file integrity more generally (for example, users
 won't accidentally hear a screeching "MP3" because that MP3 is really an
 OGG).

 The MIME database (`media-mimes.php`) will need to be indefinitely
 maintained, as MIME data is always changing. That data, however, is
 automatically, regularly re-built independently of WordPress. I would
 propose we aim to update the data once per major release, a task I am more
 than happy to adopt.

 Speaking of, the MIME data in this patch is at roughly 1750 entries. While
 that data will never be 100% complete, as-is it already improves
 WordPress' chances of correctly identifying a file by over 2000%.

--
Ticket URL: <https://core.trac.wordpress.org/ticket/39963#comment:14>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list