连接矩阵乘法比多个非连接矩阵乘法快吗？如果是，为什么？

Question

连接矩阵乘法比多个非连接矩阵乘法快吗？如果是，为什么？

tensorflowmatrixlstmpytorchgpu

3

LSTM单元的定义包括4次矩阵乘法操作，其中输入和输出各有4次。我们可以通过连接4个小矩阵来简化表达式，从而使用单个矩阵乘法（现在矩阵的大小增加了4倍）。

我的问题是：这样做是否提高了矩阵乘法的效率？如果是，为什么？因为我们可以将它们放入连续的内存中吗？还是因为代码更加简洁？

无论是否连接矩阵，我们要乘的项数都不会改变。（因此复杂度不应改变。）所以我想知道为什么我们要这么做。。

下面是pytorch文档中的一段摘录：torch.nn.LSTM(*args, **kwargs)。W_ii, W_if, W_ig, W_io被连接起来。

weight_ih_l[k] – the learnable input-hidden weights of the \text{k}^{th}k 
th
  layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)

weight_hh_l[k] – the learnable hidden-hidden weights of the \text{k}^{th}k 
th
  layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)

bias_ih_l[k] – the learnable input-hidden bias of the \text{k}^{th}k 
th
  layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)

bias_hh_l[k] – the learnable hidden-hidden bias of the \text{k}^{th}k 
th
  layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

- aerin

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Leo Lee · Answer 1

LSTM 的结构并不是为了提高乘法效率，而更多地是为了绕过梯度消失/爆炸问题（https://stats.stackexchange.com/questions/185639/how-does-lstm-prevent-the-vanishing-gradient-problem）。目前有一些研究正在进行中，以减轻梯度消失的影响，GRU/LSTM 单元+窥视孔是其中的几个尝试。