我正在为ORM准备一些表名,并希望将复数表名转换为单个实体名称。我的唯一问题是找到一个可靠的算法来完成这个任务。以下是我目前正在做的:
- 如果一个单词以-ies结尾,我将结尾替换为-y
- 如果一个单词以-es结尾,我将此结尾删除。然而,这并不总是奏效 - 例如,它会将Types 替换为 Typ
- 否则,我只需删除尾随的-s
是否有人知道更好的算法?
我正在为ORM准备一些表名,并希望将复数表名转换为单个实体名称。我的唯一问题是找到一个可靠的算法来完成这个任务。以下是我目前正在做的:
是否有人知道更好的算法?
这些都是一般规则(而且很好),但英语并不是给胆小的人用的 :-)。
我个人更喜欢有一个转换引擎以及一组变换规则(惊讶吧),用于实际工作。你可以按照特定到一般的顺序运行这些变换规则,当找到匹配项时,应用规则到单词上并停止。
正则表达式是处理这个问题的理想方法,因为它们具有表达能力。以下是一个示例规则集:
1. If the word is fish, return fish.
2. If the word is sheep, return sheep.
3. If the word is "radii", return "radius".
4. If the word ends in "ii", replace that "ii" with "us" (octopii,virii).
5. If a word ends with -ies, replace the ending with -y
6. If a word ends with -es, remove it.
7. Otherwise, just remove any trailing -s.
请注意保持此转换集的最新状态。例如,假设有人添加了表名types
。这将被规则#6
捕获,您将得到错误的单数值typ
。
解决方案是在#6
之前的某个位置插入一个新规则,比如:
3.5: If the word is "types", return "type".
如果需要进行非常具体的转换,或者日后可以使其更加通用的话,你可能需要更新这个转换表格,以包含英语在过去几个世纪中所产生的所有令人惊叹的例外情况。
另一个可能性是根本不浪费时间去制定通用规则。
由于当前要求的用例仅仅是对表名进行单数化处理,而表名集合相对较小(至少与复数英语单词集合相比如此),因此可以创建另一个名为“singulars”的表格(或某种数据结构),将所有当前的复数表名(例如“employees”、“customers”)映射为相应的单数对象名(例如“employee”、“customer”)。
然后,每当添加一个新表时,请确保向“singulars”“表”中添加一条记录,以便您可以将其转换为单数形式。
Andrew Peters有一个名为Inflector.NET的类,它提供了复数转单数和单数转复数的方法。正如Tal所指出的那样,没有算法是万无一失的,但这个类可以涵盖相当数量的不规则英语名词。
NNS
以获得更准确的输出。$ cat test.txt
Types_NNS
Pies_NNS
Trees_NNS
Buses_NNS
Radii_NNS
Communities_NNS
Sheep_NNS
Fish_NNS
输出示例:
$ cat test.txt | ./morpha -c
Type
Pie
Tree
Bus
Radius
Community
Sheep
Fish
class BaseInflector
{
/**
* @var array the rules for converting a word into its plural form.
* The keys are the regular expressions and the values are the corresponding replacements.
*/
public static $plurals = [
'/([nrlm]ese|deer|fish|sheep|measles|ois|pox|media)$/i' => '\1',
'/^(sea[- ]bass)$/i' => '\1',
'/(m)ove$/i' => '\1oves',
'/(f)oot$/i' => '\1eet',
'/(h)uman$/i' => '\1umans',
'/(s)tatus$/i' => '\1tatuses',
'/(s)taff$/i' => '\1taff',
'/(t)ooth$/i' => '\1eeth',
'/(quiz)$/i' => '\1zes',
'/^(ox)$/i' => '\1\2en',
'/([m|l])ouse$/i' => '\1ice',
'/(matr|vert|ind)(ix|ex)$/i' => '\1ices',
'/(x|ch|ss|sh)$/i' => '\1es',
'/([^aeiouy]|qu)y$/i' => '\1ies',
'/(hive)$/i' => '\1s',
'/(?:([^f])fe|([lr])f)$/i' => '\1\2ves',
'/sis$/i' => 'ses',
'/([ti])um$/i' => '\1a',
'/(p)erson$/i' => '\1eople',
'/(m)an$/i' => '\1en',
'/(c)hild$/i' => '\1hildren',
'/(buffal|tomat|potat|ech|her|vet)o$/i' => '\1oes',
'/(alumn|bacill|cact|foc|fung|nucle|radi|stimul|syllab|termin|vir)us$/i' => '\1i',
'/us$/i' => 'uses',
'/(alias)$/i' => '\1es',
'/(ax|cris|test)is$/i' => '\1es',
'/s$/' => 's',
'/^$/' => '',
'/$/' => 's',
];
/**
* @var array the rules for converting a word into its singular form.
* The keys are the regular expressions and the values are the corresponding replacements.
*/
public static $singulars = [
'/([nrlm]ese|deer|fish|sheep|measles|ois|pox|media|ss)$/i' => '\1',
'/^(sea[- ]bass)$/i' => '\1',
'/(s)tatuses$/i' => '\1tatus',
'/(f)eet$/i' => '\1oot',
'/(t)eeth$/i' => '\1ooth',
'/^(.*)(menu)s$/i' => '\1\2',
'/(quiz)zes$/i' => '\\1',
'/(matr)ices$/i' => '\1ix',
'/(vert|ind)ices$/i' => '\1ex',
'/^(ox)en/i' => '\1',
'/(alias)(es)*$/i' => '\1',
'/(alumn|bacill|cact|foc|fung|nucle|radi|stimul|syllab|termin|viri?)i$/i' => '\1us',
'/([ftw]ax)es/i' => '\1',
'/(cris|ax|test)es$/i' => '\1is',
'/(shoe|slave)s$/i' => '\1',
'/(o)es$/i' => '\1',
'/ouses$/' => 'ouse',
'/([^a])uses$/' => '\1us',
'/([m|l])ice$/i' => '\1ouse',
'/(x|ch|ss|sh)es$/i' => '\1',
'/(m)ovies$/i' => '\1\2ovie',
'/(s)eries$/i' => '\1\2eries',
'/([^aeiouy]|qu)ies$/i' => '\1y',
'/([lr])ves$/i' => '\1f',
'/(tive)s$/i' => '\1',
'/(hive)s$/i' => '\1',
'/(drive)s$/i' => '\1',
'/([^fo])ves$/i' => '\1fe',
'/(^analy)ses$/i' => '\1sis',
'/(analy|diagno|^ba|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => '\1\2sis',
'/([ti])a$/i' => '\1um',
'/(p)eople$/i' => '\1\2erson',
'/(m)en$/i' => '\1an',
'/(c)hildren$/i' => '\1\2hild',
'/(n)ews$/i' => '\1\2ews',
'/(n)etherlands$/i' => '\1\2etherlands',
'/eaus$/' => 'eau',
'/^(.*us)$/' => '\\1',
'/s$/i' => '',
];
/**
* @var array the special rules for converting a word between its plural form and singular form.
* The keys are the special words in singular form, and the values are the corresponding plural form.
*/
public static $specials = [
'atlas' => 'atlases',
'beef' => 'beefs',
'brother' => 'brothers',
'cafe' => 'cafes',
'child' => 'children',
'cookie' => 'cookies',
'corpus' => 'corpuses',
'cow' => 'cows',
'curve' => 'curves',
'foe' => 'foes',
'ganglion' => 'ganglions',
'genie' => 'genies',
'genus' => 'genera',
'graffito' => 'graffiti',
'hoof' => 'hoofs',
'loaf' => 'loaves',
'man' => 'men',
'money' => 'monies',
'mongoose' => 'mongooses',
'move' => 'moves',
'mythos' => 'mythoi',
'niche' => 'niches',
'numen' => 'numina',
'occiput' => 'occiputs',
'octopus' => 'octopuses',
'opus' => 'opuses',
'ox' => 'oxen',
'penis' => 'penises',
'sex' => 'sexes',
'soliloquy' => 'soliloquies',
'testis' => 'testes',
'trilby' => 'trilbys',
'turf' => 'turfs',
'wave' => 'waves',
'Amoyese' => 'Amoyese',
'bison' => 'bison',
'Borghese' => 'Borghese',
'bream' => 'bream',
'breeches' => 'breeches',
'britches' => 'britches',
'buffalo' => 'buffalo',
'cantus' => 'cantus',
'carp' => 'carp',
'chassis' => 'chassis',
'clippers' => 'clippers',
'cod' => 'cod',
'coitus' => 'coitus',
'Congoese' => 'Congoese',
'contretemps' => 'contretemps',
'corps' => 'corps',
'debris' => 'debris',
'diabetes' => 'diabetes',
'djinn' => 'djinn',
'eland' => 'eland',
'elk' => 'elk',
'equipment' => 'equipment',
'Faroese' => 'Faroese',
'flounder' => 'flounder',
'Foochowese' => 'Foochowese',
'gallows' => 'gallows',
'Genevese' => 'Genevese',
'Genoese' => 'Genoese',
'Gilbertese' => 'Gilbertese',
'graffiti' => 'graffiti',
'headquarters' => 'headquarters',
'herpes' => 'herpes',
'hijinks' => 'hijinks',
'Hottentotese' => 'Hottentotese',
'information' => 'information',
'innings' => 'innings',
'jackanapes' => 'jackanapes',
'Kiplingese' => 'Kiplingese',
'Kongoese' => 'Kongoese',
'Lucchese' => 'Lucchese',
'mackerel' => 'mackerel',
'Maltese' => 'Maltese',
'mews' => 'mews',
'moose' => 'moose',
'mumps' => 'mumps',
'Nankingese' => 'Nankingese',
'news' => 'news',
'nexus' => 'nexus',
'Niasese' => 'Niasese',
'Pekingese' => 'Pekingese',
'Piedmontese' => 'Piedmontese',
'pincers' => 'pincers',
'Pistoiese' => 'Pistoiese',
'pliers' => 'pliers',
'Portuguese' => 'Portuguese',
'proceedings' => 'proceedings',
'rabies' => 'rabies',
'rice' => 'rice',
'rhinoceros' => 'rhinoceros',
'salmon' => 'salmon',
'Sarawakese' => 'Sarawakese',
'scissors' => 'scissors',
'series' => 'series',
'Shavese' => 'Shavese',
'shears' => 'shears',
'siemens' => 'siemens',
'species' => 'species',
'swine' => 'swine',
'testes' => 'testes',
'trousers' => 'trousers',
'trout' => 'trout',
'tuna' => 'tuna',
'Vermontese' => 'Vermontese',
'Wenchowese' => 'Wenchowese',
'whiting' => 'whiting',
'wildebeest' => 'wildebeest',
'Yengeese' => 'Yengeese',
];
/**
* @var array fallback map for transliteration used by [[transliterate()]] when intl isn't available.
*/
public static $transliteration = [
'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A', 'Æ' => 'AE', 'Ç' => 'C',
'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I',
'Ð' => 'D', 'Ñ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ő' => 'O',
'Ø' => 'O', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ű' => 'U', 'Ý' => 'Y', 'Þ' => 'TH',
'ß' => 'ss',
'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'æ' => 'ae', 'ç' => 'c',
'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i',
'ð' => 'd', 'ñ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ő' => 'o',
'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'ű' => 'u', 'ý' => 'y', 'þ' => 'th',
'ÿ' => 'y',
];
/**
* Shortcut for `Any-Latin; NFKD` transliteration rule. The rule is strict, letters will be transliterated with
* the closest sound-representation chars. The result may contain any UTF-8 chars. For example:
* `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
* `huò qǔ dào dochira Ukraí̈nsʹka: g̀,ê, Srpska: đ, n̂, d̂! ¿Español?`
*
* Used in [[transliterate()]].
* For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
* @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
* @see transliterate()
* @since 2.0.7
*/
const TRANSLITERATE_STRICT = 'Any-Latin; NFKD';
/**
* Shortcut for `Any-Latin; Latin-ASCII` transliteration rule. The rule is medium, letters will be
* transliterated to characters of Latin-1 (ISO 8859-1) ASCII table. For example:
* `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
* `huo qu dao dochira Ukrainsʹka: g,e, Srpska: d, n, d! ¿Espanol?`
*
* Used in [[transliterate()]].
* For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
* @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
* @see transliterate()
* @since 2.0.7
*/
const TRANSLITERATE_MEDIUM = 'Any-Latin; Latin-ASCII';
/**
* Shortcut for `Any-Latin; Latin-ASCII; [\u0080-\uffff] remove` transliteration rule. The rule is loose,
* letters will be transliterated with the characters of Basic Latin Unicode Block.
* For example:
* `获取到 どちら Українська: ґ,є, Српска: ђ, њ, џ! ¿Español?` will be transliterated to
* `huo qu dao dochira Ukrainska: g,e, Srpska: d, n, d! Espanol?`
*
* Used in [[transliterate()]].
* For detailed information see [unicode normalization forms](http://unicode.org/reports/tr15/#Normalization_Forms_Table)
* @see http://unicode.org/reports/tr15/#Normalization_Forms_Table
* @see transliterate()
* @since 2.0.7
*/
const TRANSLITERATE_LOOSE = 'Any-Latin; Latin-ASCII; [\u0080-\uffff] remove';
/**
* @var mixed Either a [[\Transliterator]], or a string from which a [[\Transliterator]] can be built
* for transliteration. Used by [[transliterate()]] when intl is available. Defaults to [[TRANSLITERATE_LOOSE]]
* @see http://php.net/manual/en/transliterator.transliterate.php
*/
public static $transliterator = self::TRANSLITERATE_LOOSE;
/**
* Converts a word to its plural form.
* Note that this is for English only!
* For example, 'apple' will become 'apples', and 'child' will become 'children'.
* @param string $word the word to be pluralized
* @return string the pluralized word
*/
public static function pluralize($word)
{
if (isset(static::$specials[$word])) {
return static::$specials[$word];
}
foreach (static::$plurals as $rule => $replacement) {
if (preg_match($rule, $word)) {
return preg_replace($rule, $replacement, $word);
}
}
return $word;
}
/**
* Returns the singular of the $word
* @param string $word the english word to singularize
* @return string Singular noun.
*/
public static function singularize($word)
{
$result = array_search($word, static::$specials, true);
if ($result !== false) {
return $result;
}
foreach (static::$singulars as $rule => $replacement) {
if (preg_match($rule, $word)) {
return preg_replace($rule, $replacement, $word);
}
}
return $word;
}
/**
* Converts an underscored or CamelCase word into a English
* sentence.
* @param string $words
* @param boolean $ucAll whether to set all words to uppercase
* @return string
*/
public static function titleize($words, $ucAll = false)
{
$words = static::humanize(static::underscore($words), $ucAll);
return $ucAll ? ucwords($words) : ucfirst($words);
}
/**
* Returns given word as CamelCased
* Converts a word like "send_email" to "SendEmail". It
* will remove non alphanumeric character from the word, so
* "who's online" will be converted to "WhoSOnline"
* @see variablize()
* @param string $word the word to CamelCase
* @return string
*/
public static function camelize($word)
{
return str_replace(' ', '', ucwords(preg_replace('/[^A-Za-z0-9]+/', ' ', $word)));
}
/**
* Converts a CamelCase name into space-separated words.
* For example, 'PostTag' will be converted to 'Post Tag'.
* @param string $name the string to be converted
* @param boolean $ucwords whether to capitalize the first letter in each word
* @return string the resulting words
*/
public static function camel2words($name, $ucwords = true)
{
$label = trim(strtolower(str_replace([
'-',
'_',
'.'
], ' ', preg_replace('/(?<![A-Z])[A-Z]/', ' \0', $name))));
return $ucwords ? ucwords($label) : $label;
}
/**
* Converts a CamelCase name into an ID in lowercase.
* Words in the ID may be concatenated using the specified character (defaults to '-').
* For example, 'PostTag' will be converted to 'post-tag'.
* @param string $name the string to be converted
* @param string $separator the character used to concatenate the words in the ID
* @param boolean|string $strict whether to insert a separator between two consecutive uppercase chars, defaults to false
* @return string the resulting ID
*/
public static function camel2id($name, $separator = '-', $strict = false)
{
$regex = $strict ? '/[A-Z]/' : '/(?<![A-Z])[A-Z]/';
if ($separator === '_') {
return trim(strtolower(preg_replace($regex, '_\0', $name)), '_');
} else {
return trim(strtolower(str_replace('_', $separator, preg_replace($regex, $separator . '\0', $name))), $separator);
}
}
/**
* Converts an ID into a CamelCase name.
* Words in the ID separated by `$separator` (defaults to '-') will be concatenated into a CamelCase name.
* For example, 'post-tag' is converted to 'PostTag'.
* @param string $id the ID to be converted
* @param string $separator the character used to separate the words in the ID
* @return string the resulting CamelCase name
*/
public static function id2camel($id, $separator = '-')
{
return str_replace(' ', '', ucwords(implode(' ', explode($separator, $id))));
}
/**
* Converts any "CamelCased" into an "underscored_word".
* @param string $words the word(s) to underscore
* @return string
*/
public static function underscore($words)
{
return strtolower(preg_replace('/(?<=\\w)([A-Z])/', '_\\1', $words));
}
/**
* Returns a human-readable string from $word
* @param string $word the string to humanize
* @param boolean $ucAll whether to set all words to uppercase or not
* @return string
*/
public static function humanize($word, $ucAll = false)
{
$word = str_replace('_', ' ', preg_replace('/_id$/', '', $word));
return $ucAll ? ucwords($word) : ucfirst($word);
}
/**
* Same as camelize but first char is in lowercase.
* Converts a word like "send_email" to "sendEmail". It
* will remove non alphanumeric character from the word, so
* "who's online" will be converted to "whoSOnline"
* @param string $word to lowerCamelCase
* @return string
*/
public static function variablize($word)
{
$word = static::camelize($word);
return strtolower($word[0]) . substr($word, 1);
}
/**
* Converts a class name to its table name (pluralized)
* naming conventions. For example, converts "Person" to "people"
* @param string $className the class name for getting related table_name
* @return string
*/
public static function tableize($className)
{
return static::pluralize(static::underscore($className));
}
/**
* Returns a string with all spaces converted to given replacement,
* non word characters removed and the rest of characters transliterated.
*
* If intl extension isn't available uses fallback that converts latin characters only
* and removes the rest. You may customize characters map via $transliteration property
* of the helper.
*
* @param string $string An arbitrary string to convert
* @param string $replacement The replacement to use for spaces
* @param boolean $lowercase whether to return the string in lowercase or not. Defaults to `true`.
* @return string The converted string.
*/
public static function slug($string, $replacement = '-', $lowercase = true)
{
$string = static::transliterate($string);
$string = preg_replace('/[^a-zA-Z0-9=\s—–-]+/u', '', $string);
$string = preg_replace('/[=\s—–-]+/u', $replacement, $string);
$string = trim($string, $replacement);
return $lowercase ? strtolower($string) : $string;
}
/**
* Returns transliterated version of a string.
*
* If intl extension isn't available uses fallback that converts latin characters only
* and removes the rest. You may customize characters map via $transliteration property
* of the helper.
*
* @param string $string input string
* @param string|\Transliterator $transliterator either a [[Transliterator]] or a string
* from which a [[Transliterator]] can be built.
* @return string
* @since 2.0.7 this method is public.
*/
public static function transliterate($string, $transliterator = null)
{
if (static::hasIntl()) {
if ($transliterator === null) {
$transliterator = static::$transliterator;
}
return transliterator_transliterate($transliterator, $string);
} else {
return strtr($string, static::$transliteration);
}
}
/**
* @return boolean if intl extension is loaded
*/
protected static function hasIntl()
{
return extension_loaded('intl');
}
/**
* Converts a table name to its class name. For example, converts "people" to "Person"
* @param string $tableName
* @return string
*/
public static function classify($tableName)
{
return static::camelize(static::singularize($tableName));
}
/**
* Converts number to its ordinal English form. For example, converts 13 to 13th, 2 to 2nd ...
* @param integer $number the number to get its ordinal value
* @return string
*/
public static function ordinalize($number)
{
if (in_array($number % 100, range(11, 13))) {
return $number . 'th';
}
switch ($number % 10) {
case 1:
return $number . 'st';
case 2:
return $number . 'nd';
case 3:
return $number . 'rd';
default:
return $number . 'th';
}
}
/**
* Converts a list of words into a sentence.
*
* Special treatment is done for the last few words. For example,
*
* ```php
* $words = ['Spain', 'France'];
* echo Inflector::sentence($words);
* // output: Spain and France
*
* $words = ['Spain', 'France', 'Italy'];
* echo Inflector::sentence($words);
* // output: Spain, France and Italy
*
* $words = ['Spain', 'France', 'Italy'];
* echo Inflector::sentence($words, ' & ');
* // output: Spain, France & Italy
* ```
*
* @param array $words the words to be converted into an string
* @param string $twoWordsConnector the string connecting words when there are only two
* @param string $lastWordConnector the string connecting the last two words. If this is null, it will
* take the value of `$twoWordsConnector`.
* @param string $connector the string connecting words other than those connected by
* $lastWordConnector and $twoWordsConnector
* @return string the generated sentence
* @since 2.0.1
*/
public static function sentence(array $words, $twoWordsConnector = ' and ', $lastWordConnector = null, $connector = ', ')
{
if ($lastWordConnector === null) {
$lastWordConnector = $twoWordsConnector;
}
switch (count($words)) {
case 0:
return '';
case 1:
return reset($words);
case 2:
return implode($twoWordsConnector, $words);
default:
return implode($connector, array_slice($words, 0, -1)) . $lastWordConnector . end($words);
}
}
}
这里有一个例子。
echo "Inflector Test";
require('PhInflector.php');
echo "<hr>";
echo PhInflector::slug('Höäpeäöäich Médsui27:;;,.1! *"29p');
echo "<hr>";
echo PhInflector::slug('HIJO"$(/&T §!"(/&T"§:;;,.1! *"29p');
echo "<hr>";
echo PhInflector::slug('38917 jiodj d ! *"29p');
echo "<hr>";
echo PhInflector::slug('каи циефле ///!!!');
点击这里以转到Github链接。
我打算尝试这个MorphAdorner: http://morphadorner.northwestern.edu/morphadorner/download/ (Java)。 它是一个包含不同类型的NLP处理工具的集合,你可以通过在线示例来测试它们。 对于你的问题(也是我的问题),有一个复数工具:http://morphadorner.northwestern.edu/morphadorner/pluralizer/example/
我刚刚遇到这个问题,并在10分钟内开发出了解决方案。
我认为@paxdiablo提供了一个好的思路来构建转换引擎并添加规则。我制定了一个字典规则和三个通用规则。字典规则到字典文件中查找异常情况,而三个通用规则分别处理“ies”、“es”和“s”。
然而,将所有例外情况添加到字典可能需要太多时间,例如pies/trees/bus等。我做的一项改进是确保可以将其转换回来。
例如,如果我们错误地将“trees”应用于删除“es”规则并将其转换为“tre”,当尝试添加复数形式时,您将获得“tres”,它不等于原始的“tree”,因此您知道不应该应用“es”规则。这种方法可以解决上述例外情况,而无需将它们添加到字典文件中。
最终,我得到了一个包含42个真正特殊单词的字典文件,并且它可以处理大多数情况。