2004-12-19

Transcription example.

As an example of the language-dependent transcription (or transliteration) I talked about a few posts ago... a PHP function to change Unicode Russian text into ASCII. (Following the directions on the Wikipedia page.)

static function Russ2Asc ($str) {
	$str = str_replace(' ','-');

	$str = preg_replace('/(?<!Б|б|В|в|Г|г|Д|д|Ж|ж|З|з|Й|й|К|к|Л|л|М|м|Н|н|П|п|Р|р|С|с|Т|т|Ф|ф|Х|х|Ц|ц|Ч|ч|Ш|ш|Щ|щ)(Е|е)/','ye',$str);
	$str = preg_replace('/(ъ|ь)(?=(А|а|О|о|У|у|Ы|ы|Э|э|Я|я|Ё|ё|Ю|ю|И|и))/','y',$str);
	$str = preg_replace('/(И|и|Ы|ы)(Й|й)(?=\.|,|-|;|:|!|\?|\Z)/','y',$str);
	$str = str_replace(array('А','а'),'a',$str);
	$str = str_replace(array('Б','б'),'b',$str);
	$str = str_replace(array('В','в'),'v',$str);
	$str = str_replace(array('Г','г'),'g',$str);
	$str = str_replace(array('Д','д'),'d',$str);
	$str = str_replace(array('Е','е'),'e',$str);
	$str = str_replace(array('Ё','ё'),'yo',$str);
	$str = str_replace(array('Ж','ж'),'zh',$str);
	$str = str_replace(array('З','з'),'z',$str);
	$str = str_replace(array('И','и'),'i',$str);
	$str = str_replace(array('Й','й'),'y',$str);
	$str = str_replace(array('К','к'),'k',$str);
	$str = str_replace(array('Л','л'),'l',$str);
	$str = str_replace(array('Э','э'),'e',$str);
	$str = str_replace(array('Ю','ю'),'yu',$str);
	$str = str_replace(array('Я','я'),'ya',$str);
	$str = str_replace(array('М','м'),'m',$str);
	$str = str_replace(array('Н','н'),'n',$str);
	$str = str_replace(array('О','о'),'o',$str);
	$str = str_replace(array('П','п'),'p',$str);
	$str = str_replace(array('Р','р'),'r',$str);
	$str = str_replace(array('С','с'),'s',$str);
	$str = str_replace(array('Т','т'),'t',$str);
	$str = str_replace(array('У','у'),'u',$str);
	$str = str_replace(array('Ф','ф'),'f',$str);
	$str = str_replace(array('Х','х'),'kh',$str);
	$str = str_replace(array('Ц','ц'),'ts',$str);
	$str = str_replace(array('Ч','ч'),'ch',$str);
	$str = str_replace(array('Ш','ш'),'sh',$str);
	$str = str_replace(array('Щ','щ'),'shch',$str);
	$str = str_replace(array('ъ','ь'),'',$str);
	$str = str_replace(array('Ы','ы'),'y',$str);
	return $str;
}

Early testing shows that it correctly transliterates "Союз Советских Социалистических Республик" into a nice, URL-friendly "soyuz-sovetskikh-sotsialisticheskikh-respublik". I've got a similar function working (or at least as far as I can tell) for Japanese katakana/hiragana, based on the Hepburn system.

Update: I've found a very good resource for transliteration tables. However, as this is not a critical feature, I'll be delaying it until later in development.

1 Comments:

Anonymous -Zverik- said...

meh. not to effective. try using this

$text = 'Переведём-ка мы всё это хозяйство на латиницу. Тест раз два три четыре пять.';

function ru2Lat($string)
{
$rus = array(' ', 'ъ', 'ь', 'ж', 'ц', 'ч', 'ш', 'щ', 'ю', 'я');
$lat = array('_', '', '', 'zh', 'ts', 'ch', 'sh', 'sh', 'yu', 'ya');
$string = str_replace($rus, $lat, strtolower($string));
$string = strtr($string,
"абвгдеёзийклмнопрстуфхыэ",
"abvgdeezijklmnoprstufhye");

return preg_replace("/[^a-z0-9_-]/", "", $string);
}

echo ru2lat($text);


_________

this should work faster ;)

why are you interested in this in the first place, though?

8:55 PM  

Post a Comment

<< Home