[wp-trac] [WordPress Trac] #63140: Unicode Chars (Icons) in the URL are possible, but break WordPress

WordPress Trac noreply at wordpress.org
Fri Mar 21 08:56:03 UTC 2025


#63140: Unicode Chars (Icons) in the URL are possible, but break WordPress
--------------------------+------------------------------
 Reporter:  Stefan M.     |       Owner:  (none)
     Type:  defect (bug)  |      Status:  new
 Priority:  normal        |   Milestone:  Awaiting Review
Component:  Permalinks    |     Version:  6.7.2
 Severity:  minor         |  Resolution:
 Keywords:                |     Focuses:
--------------------------+------------------------------

Comment (by tusharaddweb):

 Replying to [ticket:63140 Stefan M.]:
 > My client used Unicode Chars (Icons) in the URL. WordPress doesnt seam
 to filter them.
 >
 > So they where saved. Emediatly after, the page didnt work anymore. Even
 back in draft, the page delivered a white page and not the page content.
 > I did remove the icons. But page was still broken.
 >
 > Needed to move page content in a "new" page and save it to reenable it
 again. Added icons to the URL and the same issue again.
 >
 > Why are Unicode Icons not filtered from the URL? Can you please apply a
 filterin mechanism for only valid char in the url? Icons are not supposed
 to be in the url I think.

 In WordPress, Unicode characters (including icons and emojis) are not
 automatically filtered from URLs (post slugs) because:
 1. WordPress Allows Unicode in URLs for Internationalization

     WordPress supports multilingual slugs to accommodate non-English
 languages (e.g., Japanese, Arabic, Cyrillic).
     Unicode is essential for SEO and accessibility in non-Latin character-
 based languages.

 2. No Built-in Restriction on Special Unicode Characters

     While WordPress sanitizes URLs using sanitize_title(), it does not
 explicitly remove all Unicode symbols, only certain special characters.
     Some symbols might pass through if they don’t match WordPress’s
 default filtering rules.

 3. Some Unicode Characters Can Break URLs

     Certain Unicode characters (like icons or control characters) may
 cause issues with browsers, servers, or plugins.
     If a theme or plugin doesn’t properly handle encoded URLs, it could
 result in broken pages or white screens (as you experienced).

     Solution: Apply a Custom Filter

     you can restrict unwanted Unicode characters in slugs by adding this
 custom function in functions.php

 function filter_unicode_from_slug($slug) {
     // Remove all non-alphanumeric characters except dashes and
 underscores
     $slug = preg_replace('/[^\p{L}\p{N}_-]+/u', '', $slug);
     return sanitize_title($slug);
 }
 add_filter('sanitize_title', 'filter_unicode_from_slug', 10, 1);
 This ensures that only valid letters, numbers, dashes, and underscores
 remain in URLs. You can adjust the regex pattern to allow or disallow
 specific characters as needed.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/63140#comment:2>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list