在阅读了文档并完成了教程之后,我决定制作一个小型演示。结果发现我的模型无法训练。以下是代码:
import spacy
import random
import json
TRAINING_DATA = [
["My little kitty is so special", {"KAT": True}],
["Dude, Totally, Yeah, Video Games", {"KAT": False}],
["Should I pay $1,000 for the iPhone X?", {"KAT": False}],
["The iPhone 8 reviews are here", {"KAT": False}],
["Noa is a great cat name.", {"KAT": True}],
["We got a new kitten!", {"KAT": True}]
]
nlp = spacy.blank("en")
category = nlp.create_pipe("textcat")
nlp.add_pipe(category)
category.add_label("KAT")
# Start the training
nlp.begin_training()
# Loop for 10 iterations
for itn in range(100):
# Shuffle the training data
random.shuffle(TRAINING_DATA)
losses = {}
# Batch the examples and iterate over them
for batch in spacy.util.minibatch(TRAINING_DATA, size=2):
texts = [text for text, entities in batch]
annotations = [{"textcat": [entities]} for text, entities in batch]
nlp.update(texts, annotations, losses=losses)
if itn % 20 == 0:
print(losses)
当我运行这个程序时,输出表明很少有东西被学习了。
{'textcat': 0.0}
{'textcat': 0.0}
{'textcat': 0.0}
{'textcat': 0.0}
{'textcat': 0.0}
这感觉很不对。应该有一个错误或有意义的标签。预测结果也证实了这一点。
for text, d in TRAINING_DATA:
print(text, nlp(text).cats)
# Dude, Totally, Yeah, Video Games {'KAT': 0.45303162932395935}
# The iPhone 8 reviews are here {'KAT': 0.45303162932395935}
# Noa is a great cat name. {'KAT': 0.45303162932395935}
# Should I pay $1,000 for the iPhone X? {'KAT': 0.45303162932395935}
# We got a new kitten! {'KAT': 0.45303162932395935}
# My little kitty is so special {'KAT': 0.45303162932395935}
感觉我的代码缺失了一些东西,但我无法找出是什么。
{"textcat": [entities]}
更改为{"cats": entities}
(如果你正在传递注释字典,请参见此处的预期键)。当你更新文本分类器时,它会查找键"cats"
,但是只有"textcat"
。因此,你基本上是在没有任何内容的情况下更新文本分类器,最终只得到从nlp.begin_training
随机初始化的权重。 - Ines Montani