[wp-hackers] Helping international users by fixing this code

Milan Dinić liste at srpski.biz
Fri Mar 20 16:38:04 GMT 2009


Hi everyone

I didn't post to this list before, but since I believe this is the only
place where I could get solution, I'll ask here about one issue.

I already asked about this on forum (
http://wordpress.org/support/topic/249624 ) two weeks ago but didn't get any
reply, so here is copy of that post, with detailed description of problem.

As it is known, when using custom permalinks, titles of posts, categories,
tags are used in urls. There are some functions for Latin scripts that do
converting from non-English letters to English one (i.e. Šljivić becomes
sljivic in URL).

But problem is with non-Latin script. For example, in Serbian language we
use Cyrillic script but in URLs we use English letters, so to achieve this
in WordPress we manually transliterate to English letters, so that *Здраво
свете* became zdravo-svete, not здраво-свете, which is actually shown as
%D0%B7%D0%B4%D1%80%D0%B0%D0%B2%D0%BE-%D1%81%D0%B2%D0%B5%D1%82%D0%B5 . So we
want that this transliteration is done automatically.

Since this is not exclusively related to Serbian language, developers from
other languages already worked with this and made several plugins but all of
them have one issue that should be fixed.

Two plugins that have slightly different code make same results and they are
close to how they should work are MK to Lat (
http://wordpress.org/extend/plugins/mk-to-lat/ ) and Rus-to-Lat (
http://mywordpress.ru/plugins/rustolat/ ).

If we install any of this plugins, all Cyrillic letters from titles of
posts/pages/tags/categories are transliterated to Latin one in URL slugs for
new posts/pages/tags/categories, but problem is that if there are already
posts/pages/tags/categories with Cyrillic letters, those pages return 404
errors. (just to note that if we turn off plugin everything works without
problem)

Example:
Lets say we have post with title *Један* and URL
example.com/један<http://example.com/%D1%98%D0%B5%D0%B4%D0%B0%D0%BD>(which
will in address bar look like
example.com/%D1%98%D0%B5%D0%B4%D0%B0%D0%BD ). This page is opened without
problem. Now we install one of two plugins above; we make new post titled *
Два* but we now don't get URL
example.com/два<http://example.com/%D0%B4%D0%B2%D0%B0>(
example.com/%D0%B4%D0%B2%D0%B0 ), but instead get what we wanted,
example.com/dva .
Until now everything looks like there is no problem. But if we try to access
page example.com/један
<http://example.com/%D1%98%D0%B5%D0%B4%D0%B0%D0%BD>we get 404 error.
If we turn off plugin, we could again access that page
(and also second page) without problems. If we again turn on plugin, we
again can't access first page.

So, problem is with old slugs that contain Cyrillic letters.

How should plugin work: for new slugs it should work as it works now, but it
shouldn't touch other slugs made before plugin's installation.

I will put code of both plugins below (without comments from top of files;
code also available on http://wordpress.org/support/topic/249624 ). If there
is need for further explanation I'll try to explain it. Thanks in advance


MK to Lat:

function sanitize_term_translate ($title) {
	$mk2lat_table = array(
   "А"=>"A","Б"=>"B","В"=>"V","Г"=>"G","Д"=>"D",
   "Ѓ"=>"Gj","Е"=>"E","Ж"=>"Zh","З"=>"Z","Ѕ"=>"Dz",
   "И"=>"I","Ј"=>"J","К"=>"K","Л"=>"L","Љ"=>"Lj",
   "М"=>"M","Н"=>"N","Њ"=>"Nj","О"=>"O","П"=>"P",
   "Р"=>"R","С"=>"S","Т"=>"T","Ќ"=>"Kj","У"=>"U",
   "Ф"=>"F","Х"=>"H","Ц"=>"C","Ч"=>"Ch","Џ"=>"Dzh",
   "Ш"=>"Sh","а"=>"a","б"=>"b","в"=>"v","г"=>"g",
   "д"=>"d","ѓ"=>"gj","е"=>"e","ж"=>"zh","з"=>"z",
   "ѕ"=>"dz","и"=>"i","ј"=>"j","к"=>"k","л"=>"l",
   "љ"=>"lj","м"=>"m","н"=>"n","њ"=>"nj","о"=>"o",
   "п"=>"p","р"=>"r","с"=>"s","т"=>"t","ќ"=>"kj",
   "у"=>"u","ф"=>"f","х"=>"h","ц"=>"c","ч"=>"ch",
   "џ"=>"dzh","ш"=>"sh"
	);

	global $wpdb;
	if ($term = $wpdb->get_var("SELECT slug FROM $wpdb->terms WHERE
name='$title'")) return $term; else return
strtr($title,$mk2lat_table);
}

add_action('sanitize_title', 'sanitize_term_translate', 0);

Rus-to-Lat:

$tr = array(
   "Ґ"=>"G","Ё"=>"YO","Є"=>"E","Ї"=>"YI","І"=>"I",
   "і"=>"i","ґ"=>"g","ё"=>"yo","№"=>"#","є"=>"e",
   "ї"=>"yi","А"=>"A","Б"=>"B","В"=>"V","Г"=>"G",
   "Д"=>"D","Е"=>"E","Ж"=>"ZH","З"=>"Z","И"=>"I",
   "Й"=>"Y","К"=>"K","Л"=>"L","М"=>"M","Н"=>"N",
   "О"=>"O","П"=>"P","Р"=>"R","С"=>"S","Т"=>"T",
   "У"=>"U","Ф"=>"F","Х"=>"H","Ц"=>"TS","Ч"=>"CH",
   "Ш"=>"SH","Щ"=>"SCH","Ъ"=>"'","Ы"=>"YI","Ь"=>"",
   "Э"=>"E","Ю"=>"YU","Я"=>"YA","а"=>"a","б"=>"b",
   "в"=>"v","г"=>"g","д"=>"d","е"=>"e","ж"=>"zh",
   "з"=>"z","и"=>"i","й"=>"y","к"=>"k","л"=>"l",
   "м"=>"m","н"=>"n","о"=>"o","п"=>"p","р"=>"r",
   "с"=>"s","т"=>"t","у"=>"u","ф"=>"f","х"=>"h",
   "ц"=>"ts","ч"=>"ch","ш"=>"sh","щ"=>"sch","ъ"=>"'",
   "ы"=>"yi","ь"=>"","э"=>"e","ю"=>"yu","я"=>"ya"
  );

function sanitize_title_with_translit($title) {
	global $tr;
   	return strtr($title,$tr);
}

add_action('sanitize_title', 'sanitize_title_with_translit', 0);


More information about the wp-hackers mailing list