为什么Keras的Conv1D层的输出张量没有输入维度？

Question

为什么Keras的Conv1D层的输出张量没有输入维度？

8

根据keras文档(https://keras.io/layers/convolutional/)，Conv1D输出张量的形状为(batch_size, new_steps, filters)，而输入张量的形状为(batch_size, steps, input_dim)。我不明白这是怎么回事，因为这意味着如果您传递一个长度为8000的1D输入，其中batch_size = 1并且steps = 1（我听说steps表示输入中的通道数），那么这个层将具有形状为(1,1,X)的输出，其中X是Conv层中的滤波器数量。但是输入维度怎么办？由于层中的X滤波器应用于整个输入维度，因此一个输出维度不应该是8000（根据填充的情况可能会更少），类似于(1,1,8000,X)这样的形状吗？我检查了一下，Conv2D层的行为更合理，它们的output_shape是(samples，filters，new_rows，new_cols)，其中new_rows和new_cols是根据填充调整后的输入图像的尺寸。如果Conv2D层保留其输入维度，为什么Conv1D层不保留？我有什么遗漏吗？

我想要可视化我的CNN的一维卷积层激活结果，但我找到的大多数在线工具似乎只适用于二维卷积层，因此我决定自己编写代码。我对其如何工作有很好的理解，这是我目前的代码：

# all the model's activation layer output tensors
activation_output_tensors = [layer.output for layer in model.layers if type(layer) is keras.layers.Activation]

# make a function that computes activation layer outputs
activation_comp_function = K.function([model.input, K.learning_phase()], activation_output_tensors)

# 0 means learning phase = False (i.e. the model isn't learning right now)
activation_arrays = activation_comp_function([training_data[0,:-1], 0])

这段代码基于julienr在thread中的第一条评论，针对当前版本的keras进行了一些修改。然而，当我使用它时，所有激活数组的形状都是(1,1,X)...我昨天花了一整天的时间试图弄清楚为什么，但没有成功，非常感谢任何帮助。

更新：事实证明我误解了输入维度和步数维度的含义。这主要是因为我使用的架构来自另一个团队，他们在mathematica中构建了他们的模型，在mathematica中，将(X,Y)的输入形状传递给Conv1D层意味着X个“通道”（或X的输入维度）和Y个步骤。感谢gionni帮助我认识到这一点，并很好地解释了“输入维度”如何变成“过滤器”维度。

- profPlum

2个回答

0

谢谢，非常有用。

这里使用最近版本的tensorflow + keras来调整相同的代码，并在轴0上堆叠以构建4D。

# %%
from tensorflow.keras.layers import Conv1D, Conv2D
from tensorflow.keras.backend import eval
import tensorflow as tf
import numpy as np

# %%
# create an 3D input with format BLC (Batch, Layer, Channel)
batch = 10
layers = 3
channels = 5
kernel = 2

val3D = np.random.randint(0, 100, size=(batch, layers, channels))
x = tf.Variable(val3D.astype('float32'))

# %%
# 1D convolution. Initialize the kernels to ones so that it's easier to compute the result by hand / compare
conv1d = Conv1D(filters=layers, kernel_size=kernel, kernel_initializer='ones')(x)

# %%
# 2D convolution that replicates the 1D one

# need to add a dimension to your input since conv2d expects 4D inputs. I add it at axis 0 since my keras is setup with `channel_last`
# stack 3 time the same
val4D = np.stack([val3D,val3D,val3D], axis=0)
x1 = tf.Variable(val4D.astype('float32'))

# %%
# 2D convolution. Initialize the kernel_size to one for the 1st kernel size so that replicate the conv1D
conv2d = Conv2D(filters=layers, kernel_size=(1, kernel), kernel_initializer='ones')(x1)

# %%
# evaluate and print the outputs

print(eval(conv1d))
print('---------------------------------------------')
# display only one of the stacked
print(eval(conv2d)[0])

- Stephane

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- gionni · Accepted Answer

我曾经也遇到过2D卷积的同样问题。事实上，当你应用卷积层时，你应用的卷积核不是大小为(kernel_size, 1)，而是(kernel_size, input_dim)。

如果你想一下，如果不是这样，一个kernel_size = 1的1D卷积层对它接收到的输入什么也不会做。

相反，它在每个时间步骤计算输入特征的加权平均值，对于每个时间步骤使用相同的权重（尽管每个滤波器使用不同的权重集）。我认为将input_dim视为图像2D卷积中的通道数是有帮助的，在那种情况下，同样的推理适用（在那种情况下，是channels被“丢失”并转换为过滤器数量）。

为了使自己信服，你可以使用kernel_size=(1D_kernel_size, input_dim)和相同数量的滤波器来重现1D卷积的2D卷积层。这里有一个例子：

from keras.layers import Conv1D, Conv2D
import keras.backend as K
import numpy as np

# create an input with 4 steps and 5 channels/input_dim
channels = 5
steps = 4
filters = 3
val = np.array([list(range(i * channels, (i + 1) * channels)) for i in range(1, steps + 1)])
val = np.expand_dims(val, axis=0)
x = K.variable(value=val)

# 1D convolution. Initialize the kernels to ones so that it's easier to compute the result by hand

conv1d = Conv1D(filters=filters, kernel_size=1, kernel_initializer='ones')(x)

# 2D convolution that replicates the 1D one

# need to add a dimension to your input since conv2d expects 4D inputs. I add it at axis 4 since my keras is setup with `channel_last`
val1 = np.expand_dims(val, axis=3)
x1 = K.variable(value=val1)

conv2d = Conv2D(filters=filters, kernel_size=(1, 5), kernel_initializer='ones')(x1)

# evaluate and print the outputs

print(K.eval(conv1d))
print(K.eval(conv2d))

正如我所说，我也花了一段时间才理解这个问题，我认为主要原因是没有教程清晰地解释它。