预训练模型用于文本分类。

Question

预训练模型用于文本分类。

pythonmachine-learningkerastext-classificationpre-trained-model

7

我有一些没有标签的词语，需要将它们分类成4-5个类别。虽然我没有训练数据，但我可以使用预训练模型对这些单词进行分类。哪个模型适用于这种情况，并且已经在哪个数据集上进行了训练？

谢谢。

- scifi_bot

属于哪些类别？ - Viktoriya Malyasova

类别如工艺品、动物、食品和鸟类（如果可能）。 - scifi_bot

除非数据来自某些知名的数据集，否则你很难找到针对你的数据和类别的预训练模型。你可以尝试聚类，但它不会产生你期望的确切类别。 - Erwan

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- David Alami · Accepted Answer

我们正在讨论的任务称为零样本主题分类 - 预测模型未经过训练的主题。这种范式由Hugging Face库支持，您可以在这里阅读更多信息。最常见的预训练模型是Bart Large MNLI - 在MNLI数据集上训练后的bart-large检查点。以下是一个简单的示例，展示了短语“I like hot dogs”的分类，而没有任何预备训练：

First of all, please install the transformers library:
```
pip install --upgrade transformers
```

Then import and initialize the pipeline:

from transformers import pipeline

classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')

Enter our toy dataset:

 labels = ["artifacts", "animals", "food", "birds"]
 hypothesis_template = 'This text is about {}.'
 sequence = "I like hot dogs"

Predict the label:

prediction = classifier(sequence, labels, hypothesis_template=hypothesis_template, multi_class=True)

print(prediction)

输出将类似于：

`{'sequence': 'i like hot dogs', 
'labels': ['food', 'animals', 'artifacts', 'birds'], 
'scores': [0.9971900582313538, 0.00529429130256176, 0.0020991512574255466, 
0.00023589911870658398]}`

可以这样解释，模型将最高概率（0.997...）分配给标签“food”，这是正确答案。