使用WordNet查找同义词、定义和例句

15

我需要输入一个单词的文本文件。然后使用WordNet找到该单词的lemma_names、definition和synset的示例。我已经阅读了《Python Text Processing with NLTK 2.0 Cookbook》和《Natural Language Processing using NLTK》这两本书来帮助我完成这个任务。虽然我理解了如何在终端中完成这个任务,但我无法在文本编辑器中完成。

例如,如果输入的文本中有单词"flabbergasted",那么输出应该是这样的:

flabbergasted (verb) flabbergast, boggle, bowl over - 充满惊奇;"This boggles the mind!" (adjective) dumbfounded , dumfounded , flabbergasted , stupefied , thunderstruck , dumbstruck , dumbstricken - 像被惊讶和惊喜击中而哑口无言; "a circle of policement stood dumbfounded by her denial of having seen the accident"; "the flabbergasted aldermen were speechless"; "was thunderstruck by the news of his promotion"

同义词集、定义和示例句子直接从WordNet获得!

我有以下代码:


from __future__ import division
import nltk
from nltk.corpus import wordnet as wn


tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()

#to tokenize input text into sentences

print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences

#to tokenize the tokenized sentences into words

tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]  
print words     #to print the tokens

for a in words:
    print a

syns = wn.synsets(a)
print "synsets:", syns

for s in syns:
    for l in s.lemmas:
        print l.name
    print s.definition
    print s.examples

我得到以下输出:


flabbergasted

['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']
有没有一种方法可以获取词性以及词元组成的部分?

1
如果您重新登录SO,应该接受Andrey的答案,尤其是他不仅回答了您的问题,还回复了您的评论以帮助您解决问题。 - Ram Narasimhan
4个回答

22
def synset(word):
    wn.synsets(word)

没有返回任何内容,因此默认情况下会得到 None

你应该编写

def synset(word):
    return wn.synsets(word)
提取词形归并后的单词:
from nltk.corpus import wordnet
syns = wordnet.synsets('car')
syns[0].lemmas[0].name
>>> 'car'
[s.lemmas[0].name for s in syns]
>>> ['car', 'car', 'car', 'car', 'cable_car']


[l.name for s in syns for l in s.lemmas]
>>>['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']

有没有一种方法可以从synset中仅提取单词并将其作为参数传递?例如,对于单词flabbergasted,您会得到Synset('flabbergast.v.01')和Synset('dumbfounded.s.01')。我如何将它们作为参数传递给lemma_name函数? - aks
1
从nltk.corpus中导入wordnetsyns = wordnet.synsets('car')[s.lemmas[0].name for s in syns]
['汽车', '轿车', '小汽车', '车辆', '缆车']
- Andrey Sboev
非常感谢!我已经更新了代码并得到了输出。有没有办法将词性和词元组分开检索?例如,flabbergast、boggle和bowl over都是动词。是否有办法在输出中获取它们的词性信息? - aks
要获取词性部分,请使用 [l.synset.pos for l in s.lemmas for s in syns]。 - D-Nice
打印[s.lemmas()中的l.name() for s in syns for l] - bob90937
我如何使用自己的训练数据? - Sahil Desai

5

我这里创建了一个模块,可以轻松使用(导入),并通过传递一个字符串,返回该字符串的所有词形变化。

模块:

#!/usr/bin/python2.7
''' pass a string to this funciton ( eg 'car') and it will give you a list of
words which is related to cat, called lemma of CAT. '''
from nltk.corpus import wordnet as wn
import sys
#print all the synset element of an element
def lemmalist(str):
    syn_set = []
    for synset in wn.synsets(str):
        for item in synset.lemma_names:
            syn_set.append(item)
    return syn_set

使用方法:

注意:模块名称为lemma.py,因此使用时需写成"from lemma import lemmalist"

>>> from lemma import lemmalist
>>> lemmalist('car')
['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']

欢呼!

出现错误 没有 ImportError: 找不到名为lemma的模块 - Sahil Desai

1
synonyms = []
for syn in wordnet.synsets("car"):
    for l in syn.lemmas():
        synonyms.append(l.name())
print synonyms

请编辑您的答案以包含更多信息。仅包含代码和“尝试此”答案是不鼓励的,因为它们不包含可搜索的内容,并且不解释为什么有人应该“尝试此”。 - BrokenBinary

0
NLTK 3.0 中,lemma_names 已从属性更改为方法。因此,如果您收到错误消息:
TypeError: 'method' object is not iterable

您可以使用以下方法进行修复:

>>> from nltk.corpus import wordnet as wn
>>> [item for sysnet in wn.synsets('car') for item in sysnet.lemma_names()]

这将输出:

>>> [
       'car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 
       'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 
       'car', 'elevator_car', 'cable_car', 'car'
    ]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接