[wp-trac] [WordPress Trac] #60788: Content-Disposition support in download_url() seems broken
WordPress Trac
noreply at wordpress.org
Sat Mar 16 01:12:30 UTC 2024
#60788: Content-Disposition support in download_url() seems broken
--------------------------+-----------------------------
Reporter: siliconforks | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: HTTP API | Version: 6.4.3
Severity: normal | Keywords:
Focuses: |
--------------------------+-----------------------------
In https://core.trac.wordpress.org/changeset/51939 a change was made to
resolve ticket https://core.trac.wordpress.org/ticket/38231 to make
`download_url()` use the `Content-Disposition` header to specify the name
of the downloaded file. I realize I'm a bit late to the party here since
the ticket was opened more than 7 years ago and the change was made more
than 2 years ago, but I think that this change was flawed and needs to be
reconsidered (and possibly reverted entirely). The way it was implemented
seems fundamentally broken and has the effect of making `download_url()`
basically impossible to use reliably.
Consider the following simple PHP script named `echo.php`:
{{{#!php
<?php
header( 'Content-Type: text/plain' );
header( 'Content-Disposition: attachment; filename="echo.txt"' );
if ( isset( $_GET['echo'] ) ) {
echo $_GET['echo'];
}
}}}
This script just takes a query string parameter `echo` and outputs it, in
plain text format, with suggested filename `echo.txt`.
Now consider the following file, `download.php`, which is intended to be
used with WP-CLI:
{{{#!php
<?php
$foo = download_url( 'http://example.test/echo.php?echo=foo' );
echo file_get_contents( $foo );
echo "\n";
unlink( $foo );
$bar = download_url( 'http://example.test/echo.php?echo=bar' );
echo file_get_contents( $bar );
echo "\n";
unlink( $bar );
}}}
If you run this, it works as expected:
{{{
$ wp eval-file download.php
foo
bar
}}}
Now consider the following file, `download-bad.php`:
{{{#!php
<?php
$foo = download_url( 'http://example.test/echo.php?echo=foo' );
$bar = download_url( 'http://example.test/echo.php?echo=bar' );
echo file_get_contents( $foo );
echo "\n";
echo file_get_contents( $bar );
echo "\n";
unlink( $foo );
unlink( $bar );
}}}
If you try to run this, the results are not good:
{{{
$ wp eval-file download-bad.php
bar
bar
PHP Warning: unlink(...tmp/echo.txt): No such file or directory...
}}}
You might argue that the code for `download-bad.php` is flawed, and should
be rewritten to look more like `download.php`, and I would agree.
However, note that `download-bad.php` would have worked fine before the
change https://core.trac.wordpress.org/changeset/51939 was made.
Furthermore, even `download.php` is likely to display buggy behavior if
two users attempt to run it at the same time. I don't think there is any
reliable way to use `download_url()` in a manner that will protect against
concurrent access (short of adding some kind of external locking
mechanism). Consider the following PHP files, `foo.php`...
{{{#!php
<?php
$foo = download_url( 'http://example.test/echo.php?echo=foo' );
sleep( 10 );
echo file_get_contents( $foo );
echo "\n";
sleep( 10 );
@ unlink( $foo );
}}}
...and `bar.php`:
{{{#!php
<?php
sleep( 5 );
$bar = download_url( 'http://example.test/echo.php?echo=bar' );
sleep( 10 );
echo file_get_contents( $bar );
echo "\n";
@ unlink( $bar );
}}}
Now try running them both at the same time:
{{{
$ wp eval-file foo.php & wp eval-file bar.php
bar
bar
}}}
Obviously the output might vary because there's a race condition here, but
I've added `sleep()` calls in such a way that the output will usually be
`bar` and `bar`.
I'm not really sure what should be done with `download_url()`, but as it
currently stands it seems essentially impossible to use in a reliable
manner. Personally I would probably recommend just removing all `Content-
Disposition` support entirely. That would technically be a backward
compatibility break, but I'm not sure anyone is actually using this
functionality? To me it doesn't really even make any sense as currently
implemented. Why would you want a temporary file stored under a fixed
name?
It looks like the `Content-Disposition` feature was originally proposed as
a solution to the problem that `download_url()` uses `wp_tempnam()` in
such a way that it can create very long filenames for some URLs, as
described in https://core.trac.wordpress.org/ticket/34938 - but that
doesn't make much sense either:
1. The change to use the `Content-Disposition` header bears no resemblance
to the change that was originally requested in
https://core.trac.wordpress.org/ticket/34938
2. Adding support for the `Content-Disposition` header doesn't actually
fix the original problem (what if the URL being downloaded doesn't have a
`Content-Disposition` header?)
3. Even if the URL being downloaded has a `Content-Disposition` header,
the way the `Content-Disposition` support was implmented means that it
still creates a file based on the URL first, then renames it later, so the
first file could still have a filename which is too long
4. Another change was made in
https://core.trac.wordpress.org/changeset/37598 which should (mostly)
address the original problem
That having been said, the original patch
https://core.trac.wordpress.org/attachment/ticket/38231/38231.diff
actually makes a lot more sense than what was finally committed. The
original patch takes the `Content-Disposition` filename and feeds it as
input into `wp_tempnam()`, so you will get a random filename which should
prevent the issues in the examples I've given. Somehow, in the multiple
iterations of the patch, the call to `wp_tempnam()` got lost, so the
current code just uses the `Content-Disposition` filename directly instead
of using a random filename based on the `Content-Disposition` filename.
I suppose the behavior could be changed to use a random filename based on
the `Content-Disposition` filename, as was originally intended, but that
would also technically be a backward compatibility break (assuming anyone
is actually using this functionality and expecting the current behavior).
To sum up - I'm not sure exactly why this `Content-Disposition`
functionality was added in the first place, but it should probably just be
removed. If it can't be removed, then the behavior should be changed so
that it uses a random filename based on the `Content-Disposition` filename
rather than using the `Content-Disposition` filename directly. If that
can't be done either, perhaps there could be another argument added to the
`download_url()` function to turn the `Content-Disposition` functionality
on or off? Ideally the default would be off.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/60788>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list