使用NLTK生成二元组

Question

使用NLTK生成二元组

25

我正在尝试为给定句子生成二元组列表，例如，如果我键入：

    To be or not to be

我希望您的程序能够生成。

     to be, be or, or not, not to, to be

我尝试了下面的代码，但只是给了我

<generator object bigrams at 0x0000000009231360>

这是我的代码：

    import nltk
    bigrm = nltk.bigrams(text)
    print(bigrm)

那么我该怎么做才能得到想要的东西？我想要一个像上面那样的单词组合列表（例如，to be, be or, or not, not to, to be）。

- Nikhil Raghavendra

1

尝试：list(bigrm) - alvas

1

只因为我热爱编程：这里有一个漂亮的NLTK独立的一行代码生成二元组的方法。 - patrick

3个回答

12

以下代码用于生成给定句子的二元组列表。

>>> import nltk
>>> from nltk.tokenize import word_tokenize
>>> text = "to be or not to be"
>>> tokens = nltk.word_tokenize(text)
>>> bigrm = nltk.bigrams(tokens)
>>> print(*map(' '.join, bigrm), sep=', ')
to be, be or, or not, not to, to be

- Ashok Kumar Jayaraman

1

相当晚了，但这是另一种方法。

>>> from nltk.util import ngrams
>>> text = "I am batman and I like coffee"
>>> _1gram = text.split(" ")
>>> _2gram = [' '.join(e) for e in ngrams(_1gram, 2)]
>>> _3gram = [' '.join(e) for e in ngrams(_1gram, 3)]
>>> 
>>> _1gram
['I', 'am', 'batman', 'and', 'I', 'like', 'coffee']
>>> _2gram
['I am', 'am batman', 'batman and', 'and I', 'I like', 'like coffee']
>>> _3gram
['I am batman', 'am batman and', 'batman and I', 'and I like', 'I like coffee']

- Shashwat

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ilja Everilä · Accepted Answer

nltk.bigrams()函数返回二元组的迭代器（一个生成器），如果要得到列表，将迭代器传递给list()函数即可。该函数还需要从一个序列中生成二元组，因此在传递文本之前必须对其进行拆分（如果您尚未这样做）：

bigrm = list(nltk.bigrams(text.split()))

要在Python 3中将它们用逗号分隔打印出来，你可以这样做：

print(*map(' '.join, bigrm), sep=', ')

如果在Python 2上，那么例如：

print ', '.join(' '.join((a, b)) for a, b in bigrm)

请注意，仅用于打印时无需生成列表，只需使用迭代器即可。