我已经下载了en_core_web_lg
模型,并试图找出两个句子之间的相似性:
nlp = spacy.load('en_core_web_lg')
search_doc = nlp("This was very strange argument between american and british person")
main_doc = nlp("He was from Japan, but a true English gentleman in my eyes, and another one of the reasons as to why I liked going to school.")
print(main_doc.similarity(search_doc))
这会返回非常奇怪的值:
0.9066019751888448
这两个句子不应该相似度达到90%,它们的含义非常不同。
为什么会出现这种情况?我需要添加一些额外的词汇来使相似度结果更合理吗?