在字符串中找到平均单词长度

Question

在字符串中找到平均单词长度

4

def word_count (x: str) -> str:
    characters = len(x)
    word = len(x.split())
    average = sum(len(x) for x in word)/len(word)
    print('Characters: ' + str(char) + '\n' + 'Words: ' + str(word) + '\n' + 'Avg word length: ' + str(avg) + '\n')

这段代码对于普通字符串运行正常，但对于以下这种字符串：

'***The ?! quick brown cat:  leaps over the sad boy.'

如何编辑代码，使得像“***”和“?!”这样的符号不被计入代码中？上面句子的平均单词数应该是3.888889，但我的代码给出了另一个数字。

- Ramon Hallan

你需要更加精确地说明你想要过滤掉什么。但基本思路是从 x.split() 中移除被拒绝的“单词”，并使用这个减少后的列表。 - Scott Hunter

如果问题是从某些单词中删除不需要的字符，那么你必须明确说明。 - Scott Hunter

使用 re 过滤掉不需要的内容是实现这一目标的相对简单的方法（例如双空格、特殊字符等）。 - Demian Brecht

除了实际的字母，我相信在平均计算中所有东西都被过滤掉了。 - Ramon Hallan

4个回答

1

试试这个：

import re

def avrg_count(x):
    total_chars = len(re.sub(r'[^a-zA-Z0-9]', '', x))
    num_words = len(re.sub(r'[^a-zA-Z0-9 ]', '', x).split())
    print "Characters:{0}\nWords:{1}\nAverage word length: {2}".format(total_chars, num_words, total_chars/float(num_words))


phrase = '***The ?! quick brown cat:  leaps over the sad boy.'

avrg_count(phrase)

输出：

Characters:34
Words:9
Average word length: 3.77777777778

- flamenco

0

import re

full_sent = '***The ?! quick brown cat:  leaps over the sad boy.'
alpha_sent = re.findall(r'\w+',full_sent)
print(alpha_sent)

将输出：

['The', 'quick', 'brown', 'cat', 'leaps', 'over', 'the', 'sad', 'boy']

要获取平均值，您可以执行以下操作：

average = sum(len(word) for word in alpha_sent)/len(alpha_sent)

这将会得到：3.77

- Leb

我在将这个东西融入我的函数时遇到了麻烦 - 你介意把它简要地插入到我上面的代码中吗？ - Ramon Hallan

如果你在谈论其他的打印内容，那么你不需要将它合并。此时 word 将会是 len(alpha_sent)，而 char 则是 sum(len(word) for word in alpha_sent)。 - Leb

0

你应该能够从每个单词中删除所有非字母数字字符，然后仅在长度仍大于0时使用该单词。我找到的第一个解决方案是正则表达式解决方案，但你可能能够找到其他方法来完成它。

Python中从字符串中剥离除字母数字以外的所有内容

- Andrew Shirley

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- thebjorn · Accepted Answer

如果您知道要删除的所有字符，可以使用字符串的.translate()方法进行删除：

>>> "***foo ?! bar".translate(None, "*?!")
'foo  bar'