如何使用示例解决Spacy3.0中nlp.update问题

21

我正在尝试使用Spacy v3.0训练我的数据,但显然nlp.update不接受任何元组。这是我的一段代码:

import spacy
import random
import json
nlp = spacy.blank("en")
ner = nlp.create_pipe("ner")
nlp.add_pipe('ner')
ner.add_label("label")
# Start the training
nlp.begin_training()
# Loop for 40 iterations
for itn in range(40):
    # Shuffle the training data
    random.shuffle(TRAINING_DATA)
    losses = {}
# Batch the examples and iterate over them
    for batch in spacy.util.minibatch(TRAINING_DATA, size=2):
        texts = [text for text, entities in batch]
        annotations = [entities for text, entities in batch]
# Update the model
        nlp.update(texts, annotations, losses=losses, drop=0.3)
    print(losses)

我遇到了错误

ValueError                                Traceback (most recent call last)
<ipython-input-79-27d69961629b> in <module>
     18         annotations = [entities for text, entities in batch]
     19 # Update the model
---> 20         nlp.update(texts, annotations, losses=losses, drop=0.3)
     21     print(losses)

~\Anaconda3\lib\site-packages\spacy\language.py in update(self, examples, _, drop, sgd, losses, component_cfg, exclude)
   1086         """
   1087         if _ is not None:
-> 1088             raise ValueError(Errors.E989)
   1089         if losses is None:
   1090             losses = {}

ValueError: [E989] `nlp.update()` was called with two positional arguments. This may be due to a backwards-incompatible change to the format of the training data in spaCy 3.0 onwards. The 'update' function should now be called with a batch of Example objects, instead of `(text, annotation)` tuples. 

我设置了我的训练数据格式:

TRAINING_DATA = []
for entry in labeled_data:
    entities = []
    for e in entry['labels']:
        entities.append((e[0], e[1],e[2]))
    spacy_entry = (entry['text'], {"entities": entities})
    TRAINING_DATA.append(spacy_entry)

我的训练数据长这样:

[('Part List', {'entities': []}), ('pending', {'entities': []}), ('3D Printing', {'entities': [(0, 11, 'Process')]}), ('Recommended to use a FDM 3D printer with PLA material.', {'entities': [(25, 36, 'Process'), (41, 44, 'Material')]}), ('', {'entities': []}), ('No need supports or rafts.', {'entities': []}), ('Resolution: 0.20mm', {'entities': []}), ('Fill density 20%', {'entities': []}), ('As follows from the analysis, part of the project is devoted to 3D', {'entities': [(64, 66, 'Process')]}), ('printing, as all static components were created using 3D modelling and', {'entities': [(54, 66, 'Process')]}), ('subsequent printing.', {'entities': []}), ('', {'entities': []}), ('In our project, we created several versions of the', {'entities': []}), ('model during modelling, which we will describe and document in the', {'entities': []}), ('following subchapters. As a tool for 3D modelling, we used the Sketchup', {'entities': [(37, 49, 'Process')]}), ('Make tool, version from 2017. The main reason was the high degree of', {'entities': []}), ('intuitiveness and simplicity of the tool, as we had not encountered 3D', {'entities': [(68, 70, 'Process')]}), ('modelling before and needed a relatively flexible and efficient tool to', {'entities': []}), ('guarantee the desired result. with zero previous experience.', {'entities': []}), ('In this version, which is shown in the figures Figure 13 - Version no. 2 side view and Figure 24 - Version no. 2 - front view, for the first time, the specific dimensions of the infuser were clarified and', {'entities': []}), ('modelled. The details of the lower servo attachment, the cable hole in', {'entities': []}), ('the main mast, the winding cylinder mounting, the protrusion on the', {'entities': [(36, 44, 'Process')]}), ('winding cylinder for holding the tea bag, the preparation for fitting', {'entities': []}), ('the wooden and aluminium plate and the shape of the cylinder end that', {'entities': [(15, 25, 'Material')]}), ('exactly fit the servo were also reworked.', {'entities': []}), ('After the creation of this', {'entities': []}), ('version of the model, this model was subsequently officially consulted', {'entities': []}), ('and commented on for the first time.', {'entities': []}), ('In this version, which is shown in the figures Figure 13 - Version no. 2 side view and Figure 24 - Version no. 2 - front view, for the first time, the specific dimensions of the infuser were clarified and', {'entities': []}), ('modelled. The details of the lower servo attachment, the cable hole in', {'entities': []}), ('the main mast, the winding cylinder mounting, the protrusion on the', {'entities': [(36, 44, 'Process')]})]

作为新的贡献者,我将非常感谢您的帮助。非常感谢!


如果您阅读了错误信息并查看了spacy3.0的文档(https://spacy.io/api/language),您会发现在spacy3.0中,`nlp.update`的输入已经改变,您需要构建`Example`对象并发送它。该页面有一些示例供您参考。 - TYZ
4个回答

39

由于您没有提供您的TRAIN_DATA,所以我无法复制它。不过,您应该尝试类似于以下内容:

from spacy.training.example import Example

for batch in spacy.util.minibatch(TRAINING_DATA, size=2):
    for text, annotations in batch:
        # create Example
        doc = nlp.make_doc(text)
        example = Example.from_dict(doc, annotations)
        # Update the model
        nlp.update([example], losses=losses, drop=0.3)

1
嘿,非常感谢您的建议。我已经在我的问题中添加了我的训练数据,并尝试创建示例,但是我遇到了错误,即doc = nlp.make_doc(texts)的输入应该是字符串,但我有列表。 - TanrCans
2
我编辑了答案。现在它将一个字符串传递给示例。 - krisograbek

6
for batch in batches:
    texts, annotations = zip(*batch)
    
    example = []
    # Update the model with iterating each text
    for i in range(len(texts)):
        doc = nlp.make_doc(texts[i])
        example.append(Example.from_dict(doc, annotations[i]))
    
    # Update the model
    nlp.update(example, drop=0.5, losses=losses)

这段代码在Spacy 3上已经成功运行。 请注意,这里使用了一个字符串元组。如果您只想使用字符串,则不需要使用for循环。


5

自spaCy 3.0版本起,他们已从旧的“简单训练风格”迁移到使用Example对象。

from spacy.training import Example

example = Example.from_dict(nlp.make_doc(text), annotations)
nlp.update([example])

你可以参考官方spaCy网站上的这个页面。 https://spacy.io/usage/training

0
我认为你仍在尝试使用2xx版本的方法,你可以尝试这个例子,它能够与当前版本的SpaCy一起工作。
import spacy
from spacy.training.example import Example

nlp = spacy.load("en_core_web_sm")

# Data pelatihan dalam bentuk batch dari objek Example
train_data = [
    (text1, {"entities": [(start1, end1, "LABEL1"), (start2, end2, "LABEL2")]}),
    (text2, {"entities": [(start3, end3, "LABEL1"), (start4, end4, "LABEL3")]})
]

# Konversi data pelatihan menjadi batch dari objek Example
examples = []
for text, annotations in train_data:
    example = Example.from_dict(nlp.make_doc(text), annotations)
    examples.append(example)

# Pembaruan model dengan batch dari objek Example
nlp.update(examples, drop=0.5, losses={})

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接