Python的split()函数数据直方图

Question

Python的split()函数数据直方图

4

我正在尝试制作一个直方图，这个直方图是基于包含浮点数的文本文件创建的：

import matplotlib.pyplot as plt

c1_file = open('densEst1.txt','r')
c1_data =  c1_file.read().split()    
c1_sum = float(c1_data.__len__())

plt.hist(c1_data)
plt.show()

c1_data.__len__() 的输出正常，但是 hist() 出错了：

C:\Python27\python.exe "C:/x.py"
Traceback (most recent call last):
  File "C:/x.py", line 7, in <module>
    plt.hist(c1_data)
  File "C:\Python27\lib\site-packages\matplotlib\pyplot.py", line 2958, in hist
    stacked=stacked, data=data, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\__init__.py", line 1812, in inner
    return func(ax, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\axes\_axes.py", line 5995, in hist
    if len(xi) > 0:
TypeError: len() of unsized object

- nik.yan

你的数据长什么样？ - n1tk

计算单个数字的直方图是没有意义的。请提供完整的问题描述，包括您想要实现什么。参见[ask]和[mcve]。 - ImportanceOfBeingErnest

1

@ImportanceOfBeingErnest 你为什么认为它是单个数字呢？ - MSeifert

就像我说的那样，这是一个包含浮点数的文本文件，它们之间用空格分隔 :) - nik.yan

2个回答

1

指向一个使用numpy的示例...很容易，代码结果如下。 pandas也可以工作，分割和数据类型在读取时可用（即使是列数据），还可以作为向量读取（取决于数据大小）。

# !/usr/bin/env python
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np

# will be better to read with numpy because you use float ...
#a = np.fromfile(open('from_file', 'r'), sep='\n') 

from_file = np.array([1, 2, 2.5]) #sample data a
c1_data = from_file.astype(float) # convert the data in float

plt.hist(c1_data)  # plt.hist passes it's arguments to np.histogram
plt.title("Histogram without 'auto' bins")
plt.show()

plt.hist(c1_data, bins='auto')  # plt.hist passes it's arguments to np.histogram
plt.title("Histogram with 'auto' bins")
plt.show()

- n1tk

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MSeifert · Accepted Answer

plt.hist调用失败的主要原因是参数c1_data是一个包含字符串的列表。当你打开并读取一个文件时，结果将是一个包含文件内容的字符串：

要读取文件的内容，请调用f.read（size），它会读取一些数据并返回作为字符串（文本模式）或字节对象（二进制模式）。

以上是我强调的。

当你现在对这个长字符串进行split操作时，你会得到一个包含字符串的列表:

使用sep作为分隔符字符串，在字符串中返回一组单词。

然而，一个字符串列表不是plt.hist的有效输入:

>>> import matplotlib.pyplot as plt
>>> plt.hist(['1', '2'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
      1 import matplotlib.pyplot as plt
----> 2 plt.hist(['1', '2'])

C:\...\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, data, **kwargs)
   3079                       histtype=histtype, align=align, orientation=orientation,
   3080                       rwidth=rwidth, log=log, color=color, label=label,
-> 3081                       stacked=stacked, data=data, **kwargs)
   3082     finally:
   3083         ax._hold = washold

C:\...\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
   1895                     warnings.warn(msg % (label_namer, func.__name__),
   1896                                   RuntimeWarning, stacklevel=2)
-> 1897             return func(ax, *args, **kwargs)
   1898         pre_doc = inner.__doc__
   1899         if pre_doc is None:

C:\...\lib\site-packages\matplotlib\axes\_axes.py in hist(***failed resolving arguments***)
   6178             xmax = -np.inf
   6179             for xi in x:
-> 6180                 if len(xi) > 0:
   6181                     xmin = min(xmin, xi.min())
   6182                     xmax = max(xmax, xi.max())

TypeError: len() of unsized object

解决方案：

您可以将其简单地转换为浮点数数组：

>>> import numpy as np
>>> plt.hist(np.array(c1_data, dtype=float))