我正在尝试对一个包含5000条记录的数据集进行零样本分类。目前,我正在使用普通的Python循环,但速度非常慢。是否有使用Transformers或Datasets结构加快处理速度的方法?以下是我的代码:
classifier = pipeline("zero-shot-classification", model='cross-encoder/nli-roberta-base')
# Create prediction list
candidate_labels = ["Self-direction: action", "Achievement", "Security: personal", "Security: societal", "Benevolence: caring", "Universalism: concern"]
predictions = []
for index, row in reduced_dataset.iterrows():
res = classifier(row["text"], candidate_labels)
partial_prediction = []
for score in res["scores"]:
if score >= 0.5:
partial_prediction.append(1)
else:
partial_prediction.append(0)
if index % 100 == 0:
print(index)
predictions.append(partial_prediction)
partial_prediction