Python - 在文件中查找单词的出现次数

12
我将尝试找到文件中出现的单词数量。我有一个文本文件(TEST.txt),文件内容如下:
ashwin programmer india
amith programmer india
我期望的结果是:
{ 'ashwin':1, 'programmer ':2,'india':2, 'amith ':1}
我正在使用的代码是:
for line in open(TEST.txt,'r'):
    word = Counter(line.split())
    print word
我得到的结果是:
Counter({'ashwin': 1, 'programmer': 1,'india':1})
Counter({'amith': 1, 'programmer': 1,'india':1})

请问有人能帮我吗?提前感谢。

6个回答

18
使用 Counter 的 update 方法。示例:
from collections import Counter

data = '''\
ashwin programmer india
amith programmer india'''

c = Counter()
for line in data.splitlines():
    c.update(line.split())
print(c)

输出:

Counter({'india': 2, 'programmer': 2, 'amith': 1, 'ashwin': 1})

3
正是我想发布的内容 - 这很好地利用了专门的 Counter.update 方法,并且不需要将整个文件读入内存... - Jon Clements

8
from collections import Counter;
cnt = Counter ();

for line in open ('TEST.txt', 'r'):
  for word in line.split ():
    cnt [word] += 1

print cnt

5
您正在迭代每一行并每次调用计数器。您希望计数器运行整个文件。请尝试以下代码:
from collections import Counter

with open("TEST.txt", "r") as f:
    # Used file context read and save into contents
    contents = f.read().split()
print Counter(contents)

@jadkik94 如果他无论如何都要处理该块内的每一行,那为什么会有区别呢? - Anorov
2
@Anorov 如果你有一个50GB的文件需要计数,会发生什么?(这个文件恰好只有3个唯一的单词)... - Jon Clements
@JonClements 我也想这么说,即使在这里可能不太可能发生。但是最佳实践就是最佳实践... - jadkik94
是的,你们说得对。我忘记了默认生成器的行为。 - Anorov

1
使用Defaultdict:
from collections import defaultdict 

def read_file(fname):

    words_dict = defaultdict(int)
    fp = open(fname, 'r')
    lines = fp.readlines()
    words = []

    for line in lines:
        words += line.split(' ')

    for word in words:
        words_dict[word] += 1

    return words_dict

0
FILE_NAME = 'file.txt'

wordCounter = {}

with open(FILE_NAME,'r') as fh:
  for line in fh:
    # Replacing punctuation characters. Making the string to lower.
    # The split will spit the line into a list.
    word_list = line.replace(',','').replace('\'','').replace('.','').lower().split()
    for word in word_list:
      # Adding  the word into the wordCounter dictionary.
      if word not in wordCounter:
        wordCounter[word] = 1
      else:
        # if the word is already in the dictionary update its count.
        wordCounter[word] = wordCounter[word] + 1

print('{:15}{:3}'.format('Word','Count'))
print('-' * 18)

# printing the words and its occurrence.
for  (word,occurance)  in wordCounter.items(): 
  print('{:15}{:3}'.format(word,occurance))

0
f = open('input.txt', 'r')
data=f.read().lower()
list1=data.split()

d={}
for i in set(list1):
    d[i]=0

for i in list1:
    for j in d.keys():
       if i==j:
          d[i]=d[i]+1
print(d)

f = open('input.txt', 'r') # 打开文件 data=f.read().lower() list1=data.split() ##创建包含所有单词的列表d={} # 空字典 for i in set(list1): d[i]=0 #将列表中的所有元素添加到字典中,并将其值赋为零。for i in list1: for j in d.keys(): if i==j: d[i]=d[i]+1 #检查和计算值。 print(d)#示例文件内容(input.txt)---“Return all non-overlapping matches of pattern return pattern” #程序输出:{'non-overlapping': 1, 'of': 1, 'matches': 1, 'return': 2, 'pattern': 2, 'all': 1} - Karthic Kannan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,