PyTorch模型总结与HuggingFace模型不兼容。

6
我需要从huggingface下载一个PyTorch模型,并得到一个摘要。我在这里做错了什么吗?
from torchinfo import summary
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
summary(model, input_size=(16, 512))

会出现以下错误:
---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/torchinfo/torchinfo.py in forward_pass(model, x, batch_dim, cache_forward_pass, device, **kwargs)
    257             if isinstance(x, (list, tuple)):
--> 258                 _ = model.to(device)(*x, **kwargs)
    259             elif isinstance(x, dict):

11 frames

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1530             output_hidden_states=output_hidden_states,
-> 1531             return_dict=return_dict,
   1532         )

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1070 
-> 1071         result = forward_call(*input, **kwargs)
   1072         if _global_forward_hooks or self._forward_hooks:

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    988             inputs_embeds=inputs_embeds,
--> 989             past_key_values_length=past_key_values_length,
    990         )

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1070 
-> 1071         result = forward_call(*input, **kwargs)
   1072         if _global_forward_hooks or self._forward_hooks:

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    214         if inputs_embeds is None:
--> 215             inputs_embeds = self.word_embeddings(input_ids)
    216         token_type_embeddings = self.token_type_embeddings(token_type_ids)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1070 
-> 1071         result = forward_call(*input, **kwargs)
   1072         if _global_forward_hooks or self._forward_hooks:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py in forward(self, input)
    159             input, self.weight, self.padding_idx, self.max_norm,
--> 160             self.norm_type, self.scale_grad_by_freq, self.sparse)
    161 

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2042         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2043     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2044 

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)


The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)

<ipython-input-8-4f70d4e6fa82> in <module>()
      5 else:
      6     # Can't get this working
----> 7     summary(model, input_size=(16, 512)) #, device='cpu')
      8     #print(model)

/usr/local/lib/python3.7/dist-packages/torchinfo/torchinfo.py in summary(model, input_size, input_data, batch_dim, cache_forward_pass, col_names, col_width, depth, device, dtypes, row_settings, verbose, **kwargs)
    190     )
    191     summary_list = forward_pass(
--> 192         model, x, batch_dim, cache_forward_pass, device, **kwargs
    193     )
    194     formatting = FormattingOptions(depth, verbose, col_names, col_width, row_settings)

/usr/local/lib/python3.7/dist-packages/torchinfo/torchinfo.py in forward_pass(model, x, batch_dim, cache_forward_pass, device, **kwargs)
    268             "Failed to run torchinfo. See above stack traces for more details. "
    269             f"Executed layers up to: {executed_layers}"
--> 270         ) from e
    271     finally:
    272         if hooks is not None:

RuntimeError: Failed to run torchinfo. See above stack traces for more details. Executed layers up to: []
1个回答

7

torchinfo库[torchinfo.py]的最后一行中报告了一个错误[也有人报告过]。当dtypesNone时,它默认创建torch.float张量,而bert模型的forward方法使用torch.nn.embedding,它只期望int/long类型的张量。

def process_input(
    input_data: Optional[INPUT_DATA_TYPE],
    input_size: Optional[INPUT_SIZE_TYPE],
    batch_dim: Optional[int],
    device: Union[torch.device, str],
    dtypes: Optional[List[torch.dtype]] = None,
) -> Tuple[CORRECTED_INPUT_DATA_TYPE, Any]:
    """Reads sample input data to get the input size."""

    if input_size is not None:
        if dtypes is None:
            dtypes = [torch.float] * len(input_size)

如果您尝试将该行修改为以下内容,则可以正常运行。

dtypes = [torch.int] * len(input_size)

编辑(不更改其内部代码的直接解决方案):

from torchinfo import summary
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
summary(model, input_size=(2, 512), dtypes=['torch.IntTensor'])

替代方案:

为了简单概述,您可以使用print(model)代替summary函数。


1
首先,print(model) 模式不包含 summary 中的信息。您建议的解决方案是修改 pytorch 库代码吗? - James Hirschorn
这是否是一个 bug,取决于您的选择,是吗?如果我理解正确的话,它仍然需要由 PyTorch 开发人员进行审核。 - James Hirschorn
1
是的,这是torchinfo中的一个bug。对于具有浮点类型的其他模型,dtypes可能已经在工作,但看起来他们没有为像bert这样的huggingface模型进行测试。他们可能需要添加一些其他检查来解决此错误。顺便说一下,我已经在他们的存储库上报告了这个错误。 - kkgarg
1
是的,我们可以跟踪它。 - kkgarg
2
修改后的答案看起来像是正确的解决方案。不幸的是,我无法验证,因为出现了一个新错误:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument index in method wrapper_index_select)。仍在研究中... - James Hirschorn
显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接