unicodedata.digit和unicodedata.numeric有什么区别？

Question

unicodedata.digit和unicodedata.numeric有什么区别？

pythonpython-3.xunicodecpythonpython-module-unicodedata

3

unicodedata.digit(chr[, default]) 返回字符chr分配的数字值，返回整数。如果没有定义该值，则返回default，如果未给出，则引发ValueError。

unicodedata.numeric(chr[, default]) 返回分配给字符chr的数字值，返回浮点数。如果没有定义该值，则返回default，如果未给出，则引发ValueError。

有人能解释一下这两个函数的区别吗？

在这里可以阅读两个函数的实现，但是对于我来说，从快速查看中并不明显，因为我不熟悉CPython的实现。

编辑1：

一个显示差异的示例将很好。

编辑2：

这里有一些有用的示例，可以补充评论和@user2357112的答案：

print(unicodedata.digit('1')) # Decimal digit one.
print(unicodedata.digit('١')) # ARABIC-INDIC digit one
print(unicodedata.digit('¼')) # Not a digit, so "ValueError: not a digit" will be generated.

print(unicodedata.numeric('Ⅱ')) # Roman number two.
print(unicodedata.numeric('¼')) # Fraction to represent one quarter.

- user1785721

1

我相信 numeric 适用于除阿拉伯数字以外的其他数字字符，例如 DEVANAGIRI ONE 等。 - cs95

2

从类型和描述来看，digits 用于实际数字，而 numeric 可以处理像分数（例如 ¾）这样的东西。 - weirdan

1

@gsi-frank，它们接受的内容相同，但它们在返回的内容上有所不同。 - weirdan

2

@weirdan的¾的例子似乎很合适 - 它是一个单一的Unicode字符（代码点U+00BE），其数值为3/4，但没有数字值。 - Peter DeGlopper

@PeterDeGlopper 你是对的。unicodedata.numeric('¼') 和 unicodedata.digit('¼') 是明显说明这一点的例子。感谢所有耐心回答这个问题的人。 - user1785721

显示剩余5条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user2357112 · Accepted Answer

简短回答：

如果一个字符代表十进制数字，例如 1、¹（上标一）、①（带圈数字一）、١（阿拉伯-印度数字一）、unicodedata.digit 将返回该字符所代表的数字作为 int 类型（因此所有这些示例都是 1）。

如果该字符代表任何数值，例如 ⅐（分数一七分之一）和所有十进制数字示例，unicodedata.numeric 将作为 float 类型给出该字符的数值。

由于技术原因，更近期的数字字符，例如（负圈无衬线数字零）可能会从 unicodedata.digit 引发 ValueError。

长答案：

Unicode字符都有一个Numeric_Type属性。该属性可以有4种可能的值：Numeric_Type=Decimal，Numeric_Type=Digit，Numeric_Type=Numeric或Numeric_Type=None。

引用Unicode标准，版本10.0.0，第4.6节，

Numeric_Type=Decimal属性值（与General_Category=Nd属性值相关）仅限于在十进制数中使用的数字字符，并且已经编码了完整的数字集合，其数字值按升序排列，并且数字零作为范围内的第一个代码点。

Numeric_Type=Decimal字符因此是符合一些特定技术要求的十进制数字。

十进制数字是由Unicode标准通过这些属性分配定义的，排除了一些字符，例如CJK表意数字（请参见表4-5中的前十个条目），它们没有编码为连续序列。十进制数字还排除了兼容的上下标数字，以防止简单的解析器在上下文中错误地解释它们的值。（有关上标和下标的更多信息，请参见第22.4节，“上标和下标符号”）。传统上，Unicode字符数据库将这些非连续或兼容数字集合赋予Numeric_Type = Digit的值，以承认它们由数字值组成，但不一定满足Numeric_Type = Decimal的所有标准。然而，在实现中，Numeric_Type = Digit和更通用的Numeric_Type = Numeric之间的区别已经被证明是无用的。因此，未来可能添加到标准中且不符合Numeric_Type = Decimal标准的数字集合将仅被赋予Numeric_Type = Numeric的值。

Numeric_Type=Digit历史上被用于表示不符合Numeric_Type=Decimal技术要求的其他数字，但他们认为这并没有用处，并且不符合Numeric_Type=Decimal要求的数字字符自Unicode 6.3.0以来被分配为Numeric_Type=Numeric。例如，Unicode 7.0中引入的（DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO）具有Numeric_Type=Numeric。

Numeric_Type=Numeric适用于所有代表数字且不符合其他类别的字符，而Numeric_Type=None适用于不代表数字（或至少在正常使用情况下不代表数字）的字符。

所有具有非None Numeric_Type属性的字符都具有表示其数字值的Numeric_Value属性。unicodedata.digit将返回具有Numeric_Type=Decimal或Numeric_Type=Digit的字符的int值，unicodedata.numeric将返回任何非None Numeric_Type的字符的float值。