HuggingFace Transformer 问答置信度得分

5
我们如何从Huggingface Transformer问答的示例代码中获取答案置信度分数?我看到pipeline确实返回了得分,但是下面的核心代码也可以返回置信度分数吗?
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

text = r"""
 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""

questions = [
    "How many pretrained models are available in Transformers?",
    "What does Transformers provide?",
    "Transformers provides interoperability between which frameworks?",
]

for question in questions:
    inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="tf")
    input_ids = inputs["input_ids"].numpy()[0]

    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(inputs)

    answer_start = tf.argmax(
        answer_start_scores, axis=1
    ).numpy()[0]  # Get the most likely beginning of answer with the argmax of the score
    answer_end = (
        tf.argmax(answer_end_scores, axis=1) + 1
    ).numpy()[0]  # Get the most likely end of answer with the argmax of the score
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

    print(f"Question: {question}")
    print(f"Answer: {answer}\n")

代码提取自https://huggingface.co/transformers/usage.html

1个回答

5
得分是对softmax函数应用后的答案起始标记和答案结束标记的logits乘积。请看下面的示例: 管道输出:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")

text = r"""
 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""

question = "How many pretrained models are available in Transformers?"

question_answerer = pipeline("question-answering", model = model, tokenizer= tokenizer)

print(question_answerer(question=question, context = text))

输出:

{'score': 0.5254509449005127, 'start': 256, 'end': 264, 'answer': 'over 32+'}

没有管道:

inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
outputs = model(**inputs)

首先,我们创建一个掩码,其中每个上下文标记都为1,否则为0(问题标记和特殊标记除外)。 我们使用batchencoding.sequence_ids方法来实现:

non_answer_tokens = [x if x in [0,1] else 0 for x in inputs.sequence_ids()]
non_answer_tokens = torch.tensor(non_answer_tokens, dtype=torch.bool)
non_answer_tokens

输出:

tensor([False, False, False, False, False, False, False, False, False, False,
        False, False, False,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True, False])

我们使用此掩码将特殊标记和问题标记的对数设为负无穷,并在之后应用softmax(负无穷可以防止这些标记影响softmax结果):
from torch.nn.functional import softmax

potential_start = torch.where(non_answer_tokens, outputs.start_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_end = torch.where(non_answer_tokens, outputs.end_logits, torch.tensor(float('-inf'),dtype=torch.float))

potential_start = softmax(potential_start, dim = 1)
potential_end = softmax(potential_end, dim = 1)
potential_start

输出:

tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 1.0567e-04, 9.7031e-05, 1.9445e-06, 1.5849e-06, 1.2075e-07,
         3.1704e-08, 4.7796e-06, 1.8712e-07, 6.2977e-08, 1.5481e-07, 8.0004e-08,
         3.7896e-07, 1.6438e-07, 9.7762e-08, 1.0898e-05, 1.6518e-07, 5.6349e-08,
         2.4848e-07, 2.1459e-07, 1.3785e-06, 1.0386e-07, 1.8803e-07, 8.1887e-08,
         4.1088e-07, 1.5618e-07, 2.5624e-06, 1.8526e-06, 2.6710e-06, 6.8466e-08,
         1.7953e-07, 3.6242e-07, 2.2788e-07, 2.3384e-06, 1.2147e-05, 1.6065e-07,
         3.3257e-07, 2.6021e-07, 2.8140e-06, 1.3698e-07, 1.1066e-07, 2.8436e-06,
         1.2171e-07, 9.9341e-07, 1.1684e-07, 6.8935e-08, 5.6335e-08, 1.3314e-07,
         1.3038e-07, 7.9560e-07, 1.0671e-07, 9.1864e-08, 5.6394e-07, 3.0210e-08,
         7.2176e-08, 5.4452e-08, 1.2873e-07, 9.2636e-08, 9.6012e-07, 7.8008e-08,
         1.3124e-07, 1.3680e-06, 8.8716e-07, 8.6627e-07, 6.4750e-06, 2.5951e-07,
         6.1648e-07, 8.7724e-07, 1.0796e-05, 2.6633e-07, 5.4644e-07, 1.7553e-07,
         1.6015e-05, 5.0054e-07, 8.2263e-07, 2.6336e-06, 2.0743e-05, 4.0008e-07,
         1.9330e-06, 2.0312e-04, 6.0256e-01, 3.9638e-01, 3.1568e-04, 2.2009e-05,
         1.2485e-06, 2.4744e-06, 1.0092e-05, 3.1047e-06, 1.3597e-04, 1.5105e-06,
         1.4960e-06, 8.1164e-08, 1.6534e-06, 4.6181e-07, 8.7354e-08, 2.2356e-07,
         9.1145e-07, 8.8194e-06, 4.4202e-07, 1.9238e-07, 2.8077e-07, 1.4117e-05,
         2.0613e-07, 1.2676e-06, 8.1317e-08, 2.2337e-06, 1.2399e-07, 6.1745e-08,
         3.4725e-08, 2.7878e-07, 4.1457e-07, 0.0000e+00]],
       grad_fn=<SoftmaxBackward>)

现在可以使用这些概率来提取答案的开始和结束令牌,并计算答案分数:

answer_start = torch.argmax(potential_start)
answer_end = torch.argmax(potential_end)
answer = tokenizer.decode(inputs.input_ids.squeeze()[answer_start:answer_end+1])

print(potential_start.squeeze()[answer_start])
print(potential_end.squeeze()[answer_end])
print(potential_start.squeeze()[answer_start] *potential_end.squeeze()[answer_end])
print(answer)

输出:

tensor(0.6026, grad_fn=<SelectBackward>)
tensor(0.8720, grad_fn=<SelectBackward>)
tensor(0.5255, grad_fn=<MulBackward0>)
over 32 +

请记住,本回答不涵盖任何特殊情况(开始标记之前结束标记等)。


@croinoik,感谢您提供有用的代码。 您说得对,这里没有涵盖的情况在管道中得到了解决。 此外,例如,如果您在上下文之前粘贴了500个无意义的标记,则管道可能会找到正确的答案,但是此技术可能会失败。 还有其他因素,例如管道具有辍学等。 如果是这样,您知道如何访问实际的中间管道输入,例如潜在开始和结束的softmax吗? 这只是一个例子,我真的想为任何管道做到这一点。 - Sam A.
我认为以下更改使得分数更接近管道预测的结果。基本上,找到answer_start,然后在该索引之后出现的所有potential_end值中将所有值设置为Inf,因为它们是不可能的。现在你有了P(end = j | start = i),所以P(end = j&start = i)是条件概率的乘积。 non_answer_tokens2 = torch.tensor([xx if ii > answer_start else 0 for ii, xx in enumerate(non_answer_tokens.tolist())], dtype=torch.bool) potential_end = torch.where(non_answer_tokens2, outputs.end_logits, torch.tensor(float('-inf'),dtype=torch.float)) - Sam A.
https://huggingface.co/course/chapter6/3b展示了如何实现这个功能的代码(但如果您更改问题,它将不再完全有效,因此显然这不是确切发生的情况)。他们表明,与独立地取argmax不同,只有在每个选择的起始索引之后的softmax应该被考虑。在这种情况下,总体最佳结果不总是两个独立选择的argmax,而是在从所有可能的有效对执行乘积后的argmax。请注意,由于某种原因,他们没有对第二个argmax向量进行归一化(除以总和)以使其成为概率。 - Sam A.
@SamA。很抱歉,我还没有时间回复你的所有评论。我会在周末处理它们。但我认为其中一些已经超出了应该在回答问题下讨论的范围。也许可以创建一个单独的问题来解决你的问题。 - cronoik

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接