您也可以尝试在pywsd
包中使用NLTK的WordNetLemmatizer
的包装器,具体来说是https://github.com/alvations/pywsd/blob/master/pywsd/utils.py#L129
安装:
pip install -U nltk
python -m nltk.downloader popular
pip install -U pywsd
代码:
>>> from pywsd.utils import lemmatize_sentence
>>> lemmatize_sentence('These are foo bar sentences.')
['these', 'be', 'foo', 'bar', 'sentence', '.']
>>> lemmatize_sentence('These are foo bar sentences running.')
['these', 'be', 'foo', 'bar', 'sentence', 'run', '.']
针对您的问题:
from __future__ import print_function
from pywsd.util import lemmatize_sentence
with open('file.txt') as fin, open('outputfile.txt', 'w') as fout
for line in fin:
print(' '.join(lemmatize_sentence(line.strip()), file=fout, end='\n')