转换来自斯拉夫语系的西里尔字母为拉丁字母 ICU4j java

16

我需要做一些相当简单的事情,但不希望使用哈希映射硬编码。

我有一个字符串 s,它是用西里尔字母写的。我需要一些示例来说明如何使用自定义过滤器将其转换为拉丁字符(为了避免混淆,给出一个纯拉丁文的示例。如果字符串 s = 'sniff',我想将它们查找并更改为其他内容(可能还有组合)。

我可以看到 ICU4j 可以做到这种事情,但我不知道如何实现,因为我找不到任何可行的示例(或者说我太蠢了)。

任何帮助都将不胜感激。

谢谢。

最好的问候,

PS:我需要批量翻译。我不关心样式或动态转换,只需要一些基本示例以了解 ICU4j 批量转换器的外观。

好的,我已经明白了。

import com.ibm.icu.text.Transliterator;


public class BulgarianToLatin {


    public static String BULGARIAN_TO_LATIN = "Bulgarian-Latin/BGN";

    public static void main(String[] args) {
        String bgString = "Джокович";

        Transliterator bulgarianToLatin = Transliterator.getInstance(BULGARIAN_TO_LATIN);
        String result1 = bulgarianToLatin.transliterate(bgString);
        System.out.println("Bulgarian to Latin:" + result1);

    }

}

如果你不想使用现有的规则或者想要自定义一些东西,那么最后还可以对基于规则的转译进行一次修改。

import com.ibm.icu.text.Transliterator;

public class BulgarianToLatin {


    public static String BULGARIAN_TO_LATIN = "Bulgarian-Latin/BGN";

    public static void main(String[] args) {
        String bgString = "а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ю я  \n Юлиян Джокович";

        String rules="::[А-ЪЬЮ-ъьюяѢѣѪѫ];" +
        "Б > B;" +
        "б > b;" +
        "В > V;" +
        "ТС > TS;" +
        "Тс > Ts;" +
        "ч > ch;" +
        "ШТ > SHT;" +
        "Шт > Sht;" +
        "шт > sht;" +
        "{Ш}[[б-джзй-нп-тф-щь][аеиоуъюяѣѫ]] > Sh;" +
        "Я > YA;" +
        "я > ya;";
        Transliterator bulgarianToLatin = Transliterator.createFromRules("temp", rules, Transliterator.FORWARD);

        String result1 = bulgarianToLatin.transliterate(bgString);
        System.out.println("Bulgarian to Latin:" + result1);

    }

}
1个回答

25
我已经编写了一个将西里尔字母转换为拉丁字母的方法,也许对某些人有用。
public static String transliterate(String message){
    char[] abcCyr =   {' ','а','б','в','г','д','е','ё', 'ж','з','и','й','к','л','м','н','о','п','р','с','т','у','ф','х', 'ц','ч', 'ш','щ','ъ','ы','ь','э', 'ю','я','А','Б','В','Г','Д','Е','Ё', 'Ж','З','И','Й','К','Л','М','Н','О','П','Р','С','Т','У','Ф','Х', 'Ц', 'Ч','Ш', 'Щ','Ъ','Ы','Ь','Э','Ю','Я','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
    String[] abcLat = {" ","a","b","v","g","d","e","e","zh","z","i","y","k","l","m","n","o","p","r","s","t","u","f","h","ts","ch","sh","sch", "","i", "","e","ju","ja","A","B","V","G","D","E","E","Zh","Z","I","Y","K","L","M","N","O","P","R","S","T","U","F","H","Ts","Ch","Sh","Sch", "","I", "","E","Ju","Ja","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"};
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < message.length(); i++) {
        for (int x = 0; x < abcCyr.length; x++ ) {
            if (message.charAt(i) == abcCyr[x]) {
                builder.append(abcLat[x]);
            }
        }
    }
    return builder.toString();
}

非常适用于简单应用程序。谢谢! - Denis Kulagin
2
你的'abcCyr'数组中有拼写错误,你写成了'Б',应该是'Ь'。 - 阿尔曼
1
一旦找到匹配项,您可以停止搜索。否则,您会进行许多不必要的比较。HashMap应该比反复迭代同一数组提供更好的性能,特别是对于较长的字符串,但如果您想使用数组,则无需在两个数组中重复拉丁字符。如果没有找到匹配项,请将原始字符复制到StringBuilder中。 - Andrei Volgin

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接