统计单词中字母的出现次数并转化成pandas DataFrame

Question

统计单词中字母的出现次数并转化成pandas DataFrame

4

我有一个带有单词的pandas数据框，第一列是单词。我想在同一数据框中创建列，每个单词中每个字母出现次数的数量。

数据框应该长这样：

Word    A    B    C    D    E  ...  
BED     0    1    0    1    1

有没有一种简单的方法来做这件事，并为添加到数据框中的新单词更新它？如果不存在，则应创建一个代表该字母的列。

我尝试了这个-

for i in range(len(df)):
   u = df.iat[i, 0]
   for j in u:
      df.iat[i, j] = u.count(j)

无法使用...

- Rajat Patil

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Chris Adams · Accepted Answer

您可以在列表推导式中使用 collections.Counter，然后使用 string.ascii_uppercase 重新索引：

from collections import Counter
from string import ascii_uppercase

df = df[['Word']].join(pd.DataFrame([Counter(word) for word in df['Word'].str.upper()])
                       .reindex(list(ascii_uppercase), axis=1).fillna(0).astype(int))

[输出]

print(df)

  Word  A  B  C  D  E  F  G  H  I  ...  Q  R  S  T  U  V  W  X  Y  Z
0  BED  0  1  0  1  1  0  0  0  0  ...  0  0  0  0  0  0  0  0  0  0

[1 rows x 27 columns]