使用Transformer对整个数据集进行预测。

Question

使用Transformer对整个数据集进行预测。

pythondeep-learningnlphuggingface-transformershuggingface-datasets

3

我正在尝试对一个包含5000条记录的数据集进行零样本分类。目前，我正在使用普通的Python循环，但速度非常慢。是否有使用Transformers或Datasets结构加快处理速度的方法？以下是我的代码：

classifier = pipeline("zero-shot-classification", model='cross-encoder/nli-roberta-base')

# Create prediction list
candidate_labels = ["Self-direction: action", "Achievement", "Security: personal", "Security: societal", "Benevolence: caring", "Universalism: concern"]
predictions = []

for index, row in reduced_dataset.iterrows():
    res = classifier(row["text"], candidate_labels)
    partial_prediction = []
    for score in res["scores"]:
        if score >= 0.5:
            partial_prediction.append(1)
        else:
            partial_prediction.append(0)
    
    if index % 100 == 0:
        print(index)
    predictions.append(partial_prediction)

partial_prediction

- ignacioct

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jindřich · Accepted Answer

批处理能够并行处理句子，从而提高效率。根据文档，您可以提供一个句子列表（或者更准确地说是一个Iterable），而不是单个输入句子，它会自动处理与批处理相关的所有麻烦事情（将句子填充到相同的长度、估计适合内存的批大小等），并返回预测结果的Iterable。

文档甚至建议使用数据集对象作为管道的输入。