有没有一种方法将数字单词转换为整数?

104

我需要将one转换为1two转换为2等。

有没有使用库、类或其他方法来完成此操作的方式?


3
好的,我会尽力为您翻译。以下是需要翻译的内容:参见:https://dev59.com/LnVD5IYBdhLWcg3wJIIK - tzot
也许这个会有帮助:http://pastebin.com/WwFCjYtt - alvas
5
如果有人仍在寻找答案,我已经从下面所有答案中获得灵感,并创建了一个Python包: https://github.com/careless25/text2digits - stackErr
1
我已经使用以下示例来开发和扩展此过程,但是为了将来的参考,我将其翻译成了西班牙语:https://github.com/elbaulp/text2digits_es - Alejandro Alcalde
1
任何不寻找Python解决方案的人,这里有一个并行的C#问题:将单词(字符串)转换为Int,这里是Java的一个:在Java中将单词转换为数字 - Tomerikoo
19个回答

1

进行了更改,使得text2int(scale)返回正确的转换。例如,text2int("hundred") => 100。

import re

numwords = {}


def text2int(textnum):

    if not numwords:

        units = [ "zero", "one", "two", "three", "four", "five", "six",
                "seven", "eight", "nine", "ten", "eleven", "twelve",
                "thirteen", "fourteen", "fifteen", "sixteen", "seventeen",
                "eighteen", "nineteen"]

        tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", 
                "seventy", "eighty", "ninety"]

        scales = ["hundred", "thousand", "million", "billion", "trillion", 
                'quadrillion', 'quintillion', 'sexillion', 'septillion', 
                'octillion', 'nonillion', 'decillion' ]

        numwords["and"] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 
            'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]
    current = result = 0
    tokens = re.split(r"[\s-]+", textnum)
    for word in tokens:
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if word not in numwords:
                raise Exception("Illegal word: " + word)

            scale, increment = numwords[word]

        if scale > 1:
            current = max(1, current)

        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current

我认为100的正确英文拼写是“one hundred”。 - recursive
@recursive 你说得完全正确,但这段代码的优点在于它处理了“hundredth”(也许这就是Dawa试图强调的)。从描述听起来,其他类似的代码需要“one hundredth”,而这并不总是常用术语(例如,“她挑选出第一百个要丢弃的物品”)。 - Neil

1
我正在寻找一个能帮助我支持上述所有以及更多边缘情况的库,比如序数词(第一,第二),更大的数字,运算符等等,我找到了这个numwords-to-nums
你可以通过以下方式进行安装。
pip install numwords_to_nums

这是一个基本的例子
from numwords_to_nums.numwords_to_nums import NumWordsToNum
num = NumWordsToNum()
   
result = num.numerical_words_to_numbers("twenty ten and twenty one")
print(result)  # Output: 2010 and 21
   
eval_result = num.evaluate('Hey calculate 2+5')
print(eval_result) # Output: 7

result = num.numerical_words_to_numbers('first')
print(result) # Output: 1st

1
一个快速的解决方案是使用inflect.py生成翻译字典。
inflect.py有一个number_to_words()函数,可以将数字(例如2)转换为它的单词形式(例如'two')。不幸的是,它的反向功能(可以避免使用翻译字典)没有提供。尽管如此,您仍然可以使用该函数来构建翻译字典:
>>> import inflect
>>> p = inflect.engine()
>>> word_to_number_mapping = {}
>>>
>>> for i in range(1, 100):
...     word_form = p.number_to_words(i)  # 1 -> 'one'
...     word_to_number_mapping[word_form] = i
...
>>> print word_to_number_mapping['one']
1
>>> print word_to_number_mapping['eleven']
11
>>> print word_to_number_mapping['forty-three']
43

如果您愿意花些时间,或许可以检查inflect.py中的`number_to_words()`函数的内部工作,并构建自己的代码以实现动态转换(我尚未尝试过这样做)。

0

这个程序处理印度风格的数字,一些分数,数字和文字的组合以及加法。

def words_to_number(words):
    numbers = {"zero":0, "a":1, "half":0.5, "quarter":0.25, "one":1,"two":2,
               "three":3, "four":4,"five":5,"six":6,"seven":7,"eight":8,
               "nine":9, "ten":10,"eleven":11,"twelve":12, "thirteen":13,
               "fourteen":14, "fifteen":15,"sixteen":16,"seventeen":17,
               "eighteen":18,"nineteen":19, "twenty":20,"thirty":30, "forty":40,
               "fifty":50,"sixty":60,"seventy":70, "eighty":80,"ninety":90}

    groups = {"hundred":100, "thousand":1_000, 
              "lac":1_00_000, "lakh":1_00_000, 
              "million":1_000_000, "crore":10**7, 
              "billion":10**9, "trillion":10**12}
    
    split_at = ["and", "plus"]
    
    n = 0
    skip = False
    words_array = words.split(" ")
    for i, word in enumerate(words_array):
        if not skip:
            if word in groups:
                n*= groups[word]
            elif word in numbers:
                n += numbers[word]
            elif word in split_at:
                skip = True
                remaining = ' '.join(words_array[i+1:])
                n+=words_to_number(remaining)
            else:
                try:
                    n += float(word)
                except ValueError as e:
                    raise ValueError(f"Invalid word {word}") from e
    return n

测试:

print(words_to_number("a million and one"))
>> 1000001

print(words_to_number("one crore and one"))
>> 1000,0001

print(words_to_number("0.5 million one"))
>> 500001.0

print(words_to_number("half million and one hundred"))
>> 500100.0

print(words_to_number("quarter"))
>> 0.25

print(words_to_number("one hundred plus one"))
>> 101

我进行了更多的测试,“seventeen hundred”等于1700,“one thousand and seven hundred”也等于1700,但是“one thousand seven hundred”则等于(一千零七)百,即1007 * 100 = 100700。在技术上说,“one thousand seven hundred”和“one thousand AND seven hundred”是否正确?! - Hemant Hegde

0
我拿 @recursive 的 logic 并将其转换为 Ruby。 我还硬编码了查找表,因此它不太酷,但可能有助于新手理解正在发生的事情。
WORDNUMS = {"zero"=> [1,0], "one"=> [1,1], "two"=> [1,2], "three"=> [1,3],
            "four"=> [1,4], "five"=> [1,5], "six"=> [1,6], "seven"=> [1,7], 
            "eight"=> [1,8], "nine"=> [1,9], "ten"=> [1,10], 
            "eleven"=> [1,11], "twelve"=> [1,12], "thirteen"=> [1,13], 
            "fourteen"=> [1,14], "fifteen"=> [1,15], "sixteen"=> [1,16], 
            "seventeen"=> [1,17], "eighteen"=> [1,18], "nineteen"=> [1,19], 
            "twenty"=> [1,20], "thirty" => [1,30], "forty" => [1,40], 
            "fifty" => [1,50], "sixty" => [1,60], "seventy" => [1,70], 
            "eighty" => [1,80], "ninety" => [1,90],
            "hundred" => [100,0], "thousand" => [1000,0], 
            "million" => [1000000, 0]}

def text_2_int(string)
  numberWords = string.gsub('-', ' ').split(/ /) - %w{and}
  current = result = 0
  numberWords.each do |word|
    scale, increment = WORDNUMS[word]
    current = current * scale + increment
    if scale > 100
      result += current
      current = 0
    end
  end
  return result + current
end

我想要处理像 two thousand one hundred and forty-six 这样的字符串。


0
这是一个很酷的解决方案,所以我采用了@recursive的Python代码,并借助ChatGPT将其转换为C#,并对其进行了简化、格式化和紧凑处理。
是的,我不得不给ChatGPT提供了大量的指令。这花费了我一些时间,但现在它已经完成了。
我相信这段代码以及算法的工作原理更清晰、更易于理解。
public class Parser
{
    public static int ParseInt(string s)
    {
        Dictionary<string, (int scale, int increment)> numwords = new Dictionary<string, (int, int)>
        {
            {"and", (1, 0)}, {"zero", (1, 0)}, {"one", (1, 1)}, {"two", (1, 2)}, {"three", (1, 3)},
            {"four", (1, 4)}, {"five", (1, 5)}, {"six", (1, 6)}, {"seven", (1, 7)}, {"eight", (1, 8)},
            {"nine", (1, 9)}, {"ten", (1, 10)}, {"eleven", (1, 11)}, {"twelve", (1, 12)}, {"thirteen", (1, 13)},
            {"fourteen", (1, 14)}, {"fifteen", (1, 15)}, {"sixteen", (1, 16)}, {"seventeen", (1, 17)}, {"eighteen", (1, 18)},
            {"nineteen", (1, 19)}, {"twenty", (1, 20)}, {"thirty", (1, 30)}, {"forty", (1, 40)}, {"fifty", (1, 50)},
            {"sixty", (1, 60)}, {"seventy", (1, 70)}, {"eighty", (1, 80)}, {"ninety", (1, 90)}, {"hundred", (100, 0)},
            {"thousand", (1000, 0)}, {"million", (1000000, 0)}, {"billion", (1000000000, 0)}
        };

        int current = 0;
        int result = 0;

        foreach (string word in s.Replace("-", " ").Split())
        {
            var (scale, increment) = numwords[word];

            current = current * scale + increment;

            if (scale > 100)
            {
                result += current;
                current = 0;
            }
        }

        return result + current;
    }
}

-1

我发现了更快的方法:

Da_Unità_a_Cifre = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11,
 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': 17, 'eighteen': 18, 'nineteen': 19}

Da_Lettere_a_Decine = {"tw": 20, "th": 30, "fo": 40, "fi": 50, "si": 60, "se": 70, "ei": 80, "ni": 90, }

elemento = input("insert the word:")
Val_Num = 0
try:
    elemento.lower()
    elemento.strip()
    Unità = elemento[elemento.find("ty")+2:] # è uguale alla str: five

    if elemento[-1] == "y":
        Val_Num = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
        print(Val_Num)
    elif elemento == "onehundred":
        Val_Num = 100
        print(Val_Num)
    else:
        Cifre_Unità = int(Da_Unità_a_Cifre[Unità])
        Cifre_Decine = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
        Val_Num = int(Cifre_Decine + Cifre_Unità)
        print(Val_Num)
except:
    print("invalid input")

-2

这段代码适用于一系列数据:

import pandas as pd
mylist = pd.Series(['one','two','three'])
mylist1 = []
for x in range(len(mylist)):
    mylist1.append(w2n.word_to_num(mylist[x]))
print(mylist1)

什么是 w2n?它在任何地方都没有被定义。 - Tomerikoo

-3

此代码仅适用于小于99的数字。对于其他数字的转换,需要实现10-20行代码和简单的逻辑。这只是初学者的简单代码:

num = input("Enter the number you want to convert : ")
mydict = {'1': 'One', '2': 'Two', '3': 'Three', '4': 'Four', '5': 'Five','6': 'Six', '7': 'Seven', '8': 'Eight', '9': 'Nine', '10': 'Ten','11': 'Eleven', '12': 'Twelve', '13': 'Thirteen', '14': 'Fourteen', '15': 'Fifteen', '16': 'Sixteen', '17': 'Seventeen', '18': 'Eighteen', '19': 'Nineteen'}
mydict2 = ['', '', 'Twenty', 'Thirty', 'Fourty', 'fifty', 'sixty', 'Seventy', 'Eighty', 'Ninty']

if num.isdigit():
    if(int(num) < 20):
        print(" :---> " + mydict[num])
    else:
        var1 = int(num) % 10
        var2 = int(num) / 10
        print(" :---> " + mydict2[int(var2)] + mydict[str(var1)])
else:
    num = num.lower()
    dict_w = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11, 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': '17', 'eighteen': '18', 'nineteen': '19'}
    mydict2 = ['', '', 'twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninty']
    divide = num[num.find("ty")+2:]
    if num:
        if(num in dict_w.keys()):
            print(" :---> " + str(dict_w[num]))
        elif divide == '' :
            for i in range(0, len(mydict2)-1):
                if mydict2[i] == num:
                    print(" :---> " + str(i * 10))
        else :
            str3 = 0
            str1 = num[num.find("ty")+2:]
            str2 = num[:-len(str1)]
            for i in range(0, len(mydict2)):
                if mydict2[i] == str2:
                    str3 = i
            if str2 not in mydict2:
                print("----->Invalid Input<-----")                
            else:
                try:
                    print(" :---> " + str((str3*10) + dict_w[str1]))
                except:
                    print("----->Invalid Input<-----")
    else:
        print("----->Please Enter Input<-----")

1
请解释一下这段代码是做什么的,以及它是如何实现的。这样你的回答对那些对编程不太了解的人更有价值。 - Luuklag
如果用户输入数字,程序将返回其对应的单词,反之亦然。例如,5->five,而Five->5。该程序适用于小于100的数字,但只需添加几行代码即可扩展到任何范围。 - Shriram Jadhav

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接