如何在Python中计算特定数组中指定单词的出现次数？

Question

如何在Python中计算特定数组中指定单词的出现次数？

3

我正在编写一个小程序，用户输入文本后，我希望能够检查给定的单词在输入中出现的次数。

# Read user input
print("Input your code: \n")

user_input = sys.stdin.read()
print(user_input)

例如，我在程序中输入的文本是：

a=1
b=3
if (a == 1):
    print("A is a number 1")
elif(b == 3):
    print ("B is 3")
else: 
    print("A isn't 1 and B isn't 3")

需要查找的单词在数组中指定。

wordsToFind = ["if", "elif", "else", "for", "while"]

我想要打印出用户输入中出现了多少个"if", "elif" 和 "else"。

如何统计用户输入字符串中诸如 "if", "elif", "else", "for", "while" 等单词的出现次数？

- Sisimośki

1

将字符串转换为带有单词边界的正则表达式。然后您可以找到所有匹配项，并使用collections.Counter()进行计数。 - Barmar

你需要单词边界，这样 if 就不会匹配 elif。 - Barmar

因此正则表达式应为\b(if|elif|else|for|while)\b。 - Barmar

@Barmar 我写了几个'if'语句，使用了findall()方法并想要打印出长度，但它显示为0。print(len(re.findall(regexp, user_input))) - Sisimośki

听起来你的正则表达式不正确。别忘了使用原始字符串，这样\b才能被正确解释。 - Barmar

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jorge Morgado · Accepted Answer

我认为最好的选择是使用Python的内置模块tokenize：

# Let's say this is tokens.py
import sys
from collections import Counter
from io import BytesIO
from tokenize import tokenize

# Get input from stdin
code_text = sys.stdin.read()

# Tokenize the input as python code
tokens = tokenize(BytesIO(code_text.encode("utf-8")).readline)

# Filter the ones in wordsToFind
wordsToFind = ["if", "elif", "else", "for", "while"]
words = [token.string for token in tokens if token.string in wordsToFind]

# Count the occurrences
counter = Counter(words)

print(counter)

测试

假设你有一个名为test.py的文件：

a=1
b=3
if (a == 1):
    print("A is a number 1")
elif(b == 3):
    print ("B is 3")
else: 
    print("A isn't 1 and B isn't 3")

然后你运行：

cat test.py | python tokens.py

输出：

Counter({'if': 1, 'elif': 1, 'else': 1})

优点

只有正确的Python语法将被解析
您只需计算Python关键字（而不是代码文本中每个if出现的次数，例如，您可能会有这样一行：

a = "if inside str"

我认为那个if不应该被计算在内