[wp-hackers] non-ascii characters at URL and pasrsing those chars at string level

Haluk Karamete halukkaramete at gmail.com
Tue Sep 9 22:03:47 UTC 2014


First off, I need to get you what non-ascii chacters I'm talking about.

For instance, just type in 'Slobodan Milosevic' in Google Search and go to
the first suggested wikipedia link.

You will see that the URL contains very unusual characters that is well
beyond the common ASCII set. I'm simply curious if WordPress support that.

Though this is not a feature I particularly like (to say the least), I do
confess that I find it quite interesting from an HTTP point of view.

But my real question (or pain to better put) is this.
Say you are scraping that data and you came across that title with those
funny characers...  and you want to create a tag out of that.

Is there a conversion function that I can pass in that string and get back
the ASCII 128 or below translated version?

So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
'Slobodan Milosevic'

Does such a function exist? Or how do you deal with that situation?


More information about the wp-hackers mailing list