我正在使用
输出结果:
代码如下:
这真的很慢吗,还是我在这里做错了什么?
nltk
生成n-gram,首先删除给定的停止词。然而,在我的CPU(Intel i7)上,nltk.pos_tag()
非常缓慢,最高可达0.6秒。输出结果:
['The first time I went, and was completely taken by the live jazz band and atmosphere, I ordered the Lobster Cobb Salad.']
0.620481014252
["It's simply the best meal in NYC."]
0.640982151031
['You cannot go wrong at the Red Eye Grill.']
0.644664049149
代码如下:
for sentence in source:
nltk_ngrams = None
if stop_words is not None:
start = time.time()
sentence_pos = nltk.pos_tag(word_tokenize(sentence))
print time.time() - start
filtered_words = [word for (word, pos) in sentence_pos if pos not in stop_words]
else:
filtered_words = ngrams(sentence.split(), n)
这真的很慢吗,还是我在这里做错了什么?