如何从字符串列表中制作直方图

Question

如何从字符串列表中制作直方图

44

我有一个字符串列表：

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']

我想制作一个直方图，以显示字母的频率分布。我可以使用以下代码创建包含每个字母计数的列表：

from itertools import groupby
b = [len(list(group)) for key, group in groupby(a)]

我该如何制作直方图？我的列表a中可能有一百万个这样的元素。

- Gray

9个回答

40

这里是一个简洁的全Pandas方法：

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
pd.Series(a).value_counts(sort=False).plot(kind='bar')

- drammock

14

如@notconfusing所指出的那样，这可以通过Pandas和Counter解决。如果由于任何原因你不能使用Pandas，你可以只使用matplotlib函数，在以下代码中使用：

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)

def plot_bar_from_counter(counter, ax=None):
    """"
    This function creates a bar plot from a counter.

    :param counter: This is a counter object, a dictionary with the item as the key
     and the frequency as the value
    :param ax: an axis of matplotlib
    :return: the axis wit the object in it
    """

    if ax is None:
        fig = plt.figure()
        ax = fig.add_subplot(111)

    frequencies = counter.values()
    names = counter.keys()

    x_coordinates = np.arange(len(counter))
    ax.bar(x_coordinates, frequencies, align='center')

    ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates))
    ax.xaxis.set_major_formatter(plt.FixedFormatter(names))

    return ax

plot_bar_from_counter(letter_counts)
plt.show()

这将产生

- Heberto Mayorquin

11

不要使用groupby()（需要输入已排序），而是使用collections.Counter（）；这样就不必创建中间列表来计算输入：

from collections import Counter

counts = Counter(a)

您并没有明确指定什么是 'histogram'（直方图）。假设您想在终端上执行此操作：

width = 120  # Adjust to desired width
longest_key = max(len(key) for key in counts)
graph_width = width - longest_key - 2
widest = counts.most_common(1)[0][1]
scale = graph_width / float(widest)

for key, size in sorted(counts.items()):
    print('{}: {}'.format(key, int(size * scale) * '*'))

演示：

>>> from collections import Counter
>>> a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
>>> counts = Counter(a)
>>> width = 120  # Adjust to desired width
>>> longest_key = max(len(key) for key in counts)
>>> graph_width = width - longest_key - 2
>>> widest = counts.most_common(1)[0][1]
>>> scale = graph_width / float(widest)
>>> for key, size in sorted(counts.items()):
...     print('{}: {}'.format(key, int(size * scale) * '*'))
... 
a: *********************************************************************************************
b: **********************************************
c: **********************************************************************
d: ***********************
e: *********************************************************************************************************************

更复杂的工具可以在numpy.histogram()和matplotlib.pyplot.hist()函数中找到。这些函数为您进行了总计，而matplotlib.pyplot.hist()还提供了图形输出。

- Martijn Pieters

5

使用NumPy

使用1.9或更高版本的NumPy：

import numpy as np
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
labels, counts = np.unique(a,return_counts=True)

可以使用以下方法绘制：

import matplotlib.pyplot as plt 
ticks = range(len(counts))
plt.bar(ticks,counts, align='center')
plt.xticks(ticks, labels)

- G M

1

请查看matplotlib.pyplot.bar。如果您想要更宽的箱子，还有更灵活的numpy.histogram。

- tommyo

1

在Python中制作字符直方图的简单有效方法

import numpy as np

import matplotlib.pyplot as plt

from collections import Counter



a = []
count =0
d = dict()
filename = raw_input("Enter file name: ")
with open(filename,'r') as f:
    for word in f:
        for letter  in word:
            if letter not in d:
                d[letter] = 1
            else:
                d[letter] +=1
num = Counter(d)
x = list(num.values())
y = list(num.keys())

x_coordinates = np.arange(len(num.keys()))
plt.bar(x_coordinates,x)
plt.xticks(x_coordinates,y)
plt.show()
print x,y

- Mitul Panchal

1

从技术上讲，这是离散分类值的频率计数。
- 直方图是连续数值的频率分布。
可以通过将列表传递给seaborn.countplot、seaborn.histplot或sns.displot的x=或y=参数，并使用kind='hist'来实现。
- Seaborn是Matplotlib的高级API。
请参见如何在条形图上添加值标签以将值标签注释添加到条形图的顶部。

import seaborn as sns

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']

ax = sns.countplot(x=a)

ax = sns.countplot(y=a)

ax = sns.histplot(x=a)

g = sns.displot(kind='hist', x=a)

- Trenton McKinney

1

这个问题好久之前就提出来了，我不确定你是否还需要帮助，但其他人可能需要，所以我在这里。如果你可以使用Matplotlib，我认为有一个更简单的解决方案！

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']

import matplotlib.pyplot as plt
plt.hist(a) #gives you a histogram of your array 'a'
plt.show() #finishes out the plot

这应该可以为您生成一个漂亮的直方图！如果您愿意，还可以进行更多的编辑来清理图表。

- sofia

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- notconfusing · Accepted Answer

使用 Pandas 很容易。

import pandas
from collections import Counter
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
df = pandas.DataFrame.from_dict(letter_counts, orient='index')
df.plot(kind='bar')

请注意Counter正在创建一个频率计数，因此我们的图表类型应该是'bar'而不是'hist'。

letter counts的直方图