我的输入:
??AAAAT 66.5939
??AAAAW 63.3312
??AAAAZ 63.3312
??AAAĄB 58.0579
??AAAĄD 81.3312
??AAAĄF 87.3312
??AAAĄG 64.5562
??AAAĄH 63.3687
??AAAĄK 81.3312
??AAAĄL 81.3312
??AAAĄM 81.3312
??AAAĄN 79.3312
我有一个脚本,它会对第二列的数值取平均值,将这个平均值从原始数值中减去,并将修改后的列保存到另一个文件中:
import numpy as np
def calculateAverage():
'''real values of leaves should be averaged over all possible leaves'''
values = np.loadtxt("input/leaves.txt", usecols=(1,))
leaves = np.loadtxt("input/leaves.txt", dtype='str', usecols=(0,))
values -= np.mean(values)
outputFile = open("output/leaves.txt", 'w')
for i, elem in enumerate(leaves):
outputFile.write('%s %f\n' % (leaves[i], values[i]))
outputFile.close()
现在,我正在尝试使用记录数组完成同样的操作:
import numpy as np
def calculateAverage1():
'''real values of leaves should be averaged over all possible leaves'''
values = np.loadtxt("input/leaves.txt", dtype=[('key', 'S8'), ('val', 'f8')])
values['val'] -= np.mean(values['val'])
np.savetxt("output/leaves.txt", values, fmt='%s %f')
第一个脚本的输出在使用emacs或其他编辑器打开时与输入内容看起来一样。然而,第二个脚本的输出默认为utf-8解码字符:
??AAAAT -11.730239
??AAAAW -14.992939
??AAAAZ -14.992939
??AAA\304\204B -20.266239
??AAA\304\204D 3.007061
??AAA\304\204F 9.007061
??AAA\304\204G -13.767939
??AAA\304\204H -14.955439
??AAA\304\204K 3.007061
??AAA\304\204L 3.007061
??AAA\304\204M 3.007061
??AAA\304\204N 1.007061
我必须在编辑器中明确选择utf-8才能使它们正确显示(即编码)。
如何强制numpy保存一个文件,使其以utf-8编码?这是numpy的问题还是可能与操作系统有关?我使用的是Ubuntu 14.04,python 2.7.6,numpy 1.8.1。