将Unicode减号转换为Matplotlib刻度标签

Question

将Unicode减号转换为Matplotlib刻度标签

14

我遇到了一个问题，与matplotlib用于表示刻度标签的文本对象有关。为了测试目的，我需要检查在绘制中创建的刻度标签的值。如果标签是字符串或正数，则没有问题：返回一个Unicode字符串，我测试它（或根据情况将其转换为数字），一切都很好。但是，如果标签是负数，我收到了无法理解原因的乱码Unicode字符串。让我们看看这个示例代码：

import pylab as plt
fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
labels = ax.get_xticklabels()

现在，如果我要求获取第二个标签（即0）的文本内容，我会得到一个普通的Unicode字符串：

labels[1].get_text()
# u'0.0'

但第一个字符（-1）的Unicode值很奇怪。

labels[1].get_text()
# u'\u22121'

这段文字在终端中打印是正确的，但在这种情况下，我需要将其与数值进行比较，但是每次转换都失败，无论使用 int 还是 float。

我尝试使用 UTF-8 字符串进行转换，但仍然失败。

text = labels[1].get_text()
text.encode('utf8')
# '\xe2\x88\x921'

但是，这仍然是一个正确打印并在转换时引发错误的问题。我也查看了unicodedata模块，但似乎它只能转换单个字符，因此在这种情况下无用。我还尝试使用unicodedata.normalize和任何可能的格式来规范化字符串，但仍然没有成功。

我转向了pipy模块unidecode（如Python and character normalization所建议的），但同样没有成功。

from unidecode import unidecode
unidecode(text)
# '[?]1'

我尝试使用Matplotlib中的非ASCII字符解决字体问题，但结果相同（我不确定它是否应该与可视化问题有关...）。问题Matplotlib中的重音字符也有类似的问题，因为它涉及到可视化而不是值本身。

我开始感到有点迷失了...我知道Python 2.7存在一些Unicode“困难”，但通常我可以以某种方式避免它们。

我知道问题是减号，因为我可以通过暴力替换罪犯来避免问题：

text.replace(u'\u2212', '-')
# u'-1'

But this is more of a hack than a solution, and I'm almost certain that it's not stable across different systems, so I would like something closer to a solution.

I'm working with:

python 2.7.3
matplotlib 1.2.0
pylab 1.7.0
IPython 0.13.1

on Kubuntu 12.10.

Thank you very much for your help!

EDIT:

Corrected the order of the plot, as I got the x and y inverted, sorry.

EDIT2:

A similar info is present at this link: http://www.coniferproductions.com/2012/12/17/unicode-character-dump-in-python/ In the end, it shows how in some books the minus sign used is a more aesthetically pleasant one but not recognized by the python interpreter as a valid character.

EDIT3:

谜底揭晓。Matplotlib返回的字符是“MINUS SIGN”，即减号的正确印刷符号。键盘创建的字符实际上是“HYPHEN-MINUS”，这是常用但不是印刷上正确的符号。请参见维基百科的解释http://en.wikipedia.org/wiki/Hyphen-minus。

因此，我使用的简单替换实际上是正确的实用方法，但从“道德”上讲，这是Python（2.7和3.x同样）中的一个错误，它不能识别减号符号的正确表示法。

请参见http://bugs.python.org/issue6632中的错误跟踪。

编辑4：

要禁用此行为，在Matplotlib上有一个简单的解决方案，只需修改rcparams，可以在.matplotlibrc或编程方式中进行修改。

import matplotlib as mpl
mpl.rcParams['axes.unicode_minus']=False

- EnricoGiampieri

6

您的最后一次编辑解决了我的减号无法显示的问题，谢谢。 - Mark

2

我建议你把你的EDIT4变成一个答案，这样搜索它的人就可以更容易地找到它！ - Konstantin

我只在使用agg后端和Arial字体族保存为pdf时遇到了这个问题。png格式的输出正常。你有什么想法吗？你也是因为pdf格式出了问题吗？（但你的EDIT4对我有效，谢谢！） - aseagram

你最后的编辑实际上应该是一个答案。 - MERose

2个回答

1

所有有效的Unicode字符都有名称。我们可以检查已识别数字单词（DIGIT.keys()）的名称，并在此基础上将“正常”的数字字符（DIGIT.values()）替换为给定的Unicode标签。

import matplotlib.pyplot as plt
import unicodedata as UD

DIGIT = {
    'MINUS': u'-',
    'ZERO': u'0',
    'ONE': u'1',
    'TWO': u'2',
    'THREE': u'3',
    'FOUR': u'4',
    'FIVE': u'5',
    'SIX': u'6',
    'SEVEN': u'7',
    'EIGHT': u'8',
    'NINE': u'9',
    'STOP': u'.'
    }

def guess(unistr):
    return ''.join([value for u in unistr
                    for key,value in DIGIT.iteritems()
                    if key in UD.name(u)])

fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
plt.savefig('/tmp/test.png')
labels = ax.get_xticklabels()
for label in labels:
    label = label.get_text()
    print(guess(label))

产生

-1.0
-0.5
0.0
0.5
1.0
1.5
2.0

- unutbu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- unutbu · Accepted Answer

使用plt.xticks()代替ax.get_xticklabels()：

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
plt.savefig('/tmp/test.png')
loc, labels = plt.xticks()
print(type(loc))
# <type 'numpy.ndarray'>
print(loc)
# [-1.  -0.5  0.   0.5  1.   1.5  2. ]