[wp-trac] [WordPress Trac] #22363: Accents in attachment filenames should be sanitized
WordPress Trac
noreply at wordpress.org
Thu Nov 14 20:23:53 UTC 2013
#22363: Accents in attachment filenames should be sanitized
----------------------------------------+------------------
Reporter: tar.gz | Owner:
Type: defect (bug) | Status: new
Priority: normal | Milestone: 3.8
Component: Upload | Version: 3.4
Severity: normal | Resolution:
Keywords: has-patch needs-unit-tests |
----------------------------------------+------------------
Comment (by p_enrique):
I've been experimenting with this issue. Now, I know I'm repeating most of
the things said above, but I'd like to confirm the following:
1. As for filenames:
- Non-ASCII filenames will break on Windows. This is fundamentally a PHP
bug.
- *nix filesystems allow any characters in filenames (minus the reserved
characters, of course).
- WordPress allows uploaded files to have any characters in their names
(minus some reserved characters).
- UTF-8 filenames work OK on WordPress on a *nix platform.
- Percent-encoded filenames work on any platform, but make the filename
6 times longer in certain scripts, such as Cyrillic ('%D0%B6' for the
character 'ж', for example).
- remove_accents() is only a partial solution since 1) it has no support
at all for non-Latin based scripts and 2) it doesn't do anything with such
things as curly quotes, dashes, copyright symbols or other similar things
(which are included in sanitize_title_with_dashes())
2. As for URLs:
- The only allowed characters in URLs according to
[http://tools.ietf.org/rfc/rfc3986.txt RFC3986] are ''only alphanumerics,
the special characters "$-_.+!*'(),", and reserved characters used for
their reserved purposes''. Everything else should be encoded.
- The attachment anchor href and img src attributes are '''not'''
encoded on WP.
- Not encoding the above may break things in some browsers.
- Not encoding an URI may break other software that recognizes pasted
URIs.
- An encoded anchor href is handled transparently by modern browsers.
For examples, see [http://ru.wikipedia.org The Russian Wikipedia], where
hovering on a link shows Cyrillic text but in the source, the URI is
%-encoded.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/22363#comment:32>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list