卷积神经网络中1D、2D和3D卷积的直观理解

Question

卷积神经网络中1D、2D和3D卷积的直观理解

machine-learningdeep-learningsignal-processingconv-neural-networkconvolution

198

请问有人能够清楚地解释卷积神经网络（在深度学习中）中的1D、2D和3D卷积之间的区别，并且用例子加以说明吗？

- xlax

3

我正在投票关闭这个问题，因为 Stack Overflow 不接受机器学习 (ML) 理论问题，建议将其迁移到 Cross-Validated 网站中。 - Daniel F

4个回答

23

根据@runhani的回答，我补充了一些细节，以使解释更加清晰，并尝试更详细地解释这个问题（当然包括TF1和TF2的示例）。

我要包含的主要内容如下：

强调应用程序
使用tf.Variable
更清晰地解释输入/核心/输出1D/2D/3D卷积
步幅/填充的效果

1D卷积

以下是您可能使用TF 1和TF 2进行1D卷积的方式。

具体而言，我的数据具有以下形状：

1D向量 - [批大小，宽度，通道数]（例如1, 5, 1）
核心 - [宽度，通道数，输出通道数]（例如5, 1, 4）
输出 - [批大小，宽度，输出通道数]（例如1, 5, 4）

TF1示例

import tensorflow as tf
import numpy as np

inp = tf.placeholder(shape=[None, 5, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  print(sess.run(out, feed_dict={inp: np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]])}))

TF2示例

import tensorflow as tf
import numpy as np

inp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')
print(out)

TF2相比其他技术，例如不需要使用Session和variable_initializer等，工作量更少。

在实际应用中会是什么样子呢？

让我们通过一个信号平滑处理的例子来理解它。左边是原始数据，右边是具有3个输出通道的Convolution 1D的输出结果。

多通道是什么意思？

多通道基本上是输入的多个特征表示。在这个例子中，你有三个通过三个不同滤波器获得的表示。第一个通道是等权重平滑滤波器。第二个是将滤波器中间的权重比边界更高的滤波器。最后一个滤波器与第二个相反。因此，您可以看到这些不同的滤波器带来不同的效果。

1D卷积的深度学习应用

1D卷积已成功用于句子分类任务。

2D卷积

接下来是2D卷积。如果你是一个深度学习人员，你没有接触过2D卷积的几率就…嗯，大约为零。它被用于CNNs进行图像分类、目标检测等，以及涉及图像的NLP问题（例如图像字幕生成）。

让我们试一个例子，我有一个具有以下滤波器的卷积核：

边缘检测核（3x3窗口）
模糊核（3x3窗口）
锐化核（3x3窗口）

而且具体来说，我的数据有以下形状：

图像（黑白）- [batch_size, height, width, 1]（例如1, 340, 371, 1）
卷积核（又称过滤器）- [height, width, in channels, out channels]（例如3, 3, 1, 3）
输出（又称特征映射）- [batch_size, height, width, out_channels]（例如1, 340, 371, 3）

TF1 示例，

import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0

kernel_init = np.array(
    [
     [[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
     [[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
     [[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
     ])

inp = tf.placeholder(shape=[None, image_height, image_width, 1], dtype=tf.float32)
kernel = tf.Variable(kernel_init, dtype=tf.float32)
out = tf.nn.conv2d(inp, kernel, strides=[1,1,1,1], padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  res = sess.run(out, feed_dict={inp: np.expand_dims(np.expand_dims(im,0),-1)})

TF2示例

import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0
x = np.expand_dims(np.expand_dims(im,0),-1)

kernel_init = np.array(
    [
     [[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
     [[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
     [[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
     ])

kernel = tf.Variable(kernel_init, dtype=tf.float32)

out = tf.nn.conv2d(x, kernel, strides=[1,1,1,1], padding='SAME')

这在现实生活中会是什么样子？

在这里，您可以看到上述代码生成的输出。第一张图片是原始图像，顺时针方向依次为第1个滤波器、第2个滤波器和第3个滤波器的输出。

多通道是什么意思？

在2D卷积的背景下，理解这些多个通道要容易得多。比如你正在进行人脸识别。可以将（这是一个非常不切实际的简化，但能够说明问题）每个滤波器视为眼睛、嘴巴、鼻子等。因此，每个特征映射都是图像中是否存在该特征的二进制表示。我认为对于人脸识别模型来说，这些特征非常有价值。更多信息请参见article。

这是我试图表达的内容的插图。

2D卷积的深度学习应用

在深度学习领域，2D卷积非常普遍。

CNN（卷积神经网络）几乎在所有计算机视觉任务中使用2D卷积操作（例如图像分类、目标检测、视频分类）。

3D卷积

随着维度数量的增加，说明正在发生的事情变得越来越困难。但是，如果能够很好地理解1D和2D卷积的工作原理，则可以将这种理解推广到3D卷积。因此，接下来就是3D卷积。

具体而言，我的数据具有以下形状：

3D数据（LIDAR）- [batch size, height, width, depth, in channels]（例如1, 200, 200, 200, 1）
核心 - [height, width, depth, in channels, out channels]（例如5, 5, 5, 1, 3）
输出 - [batch size, width, height, width, depth, out_channels]（例如1, 200, 200, 2000, 3）

TF1示例

import tensorflow as tf
import numpy as np

tf.reset_default_graph()

inp = tf.placeholder(shape=[None, 200, 200, 200, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(inp, kernel, strides=[1,1,1,1,1], padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  res = sess.run(out, feed_dict={inp: np.random.normal(size=(1,200,200,200,1))})

TF2 示例

import tensorflow as tf
import numpy as np

x = np.random.normal(size=(1,200,200,200,1))
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(x, kernel, strides=[1,1,1,1,1], padding='SAME')

三维卷积的深度学习应用

在开发涉及三维数据的机器学习应用中，使用了三维卷积来处理LIDAR（光探测与测距）数据。

什么是...更多术语？：步幅和填充

好的，你快到了。那么我们来看看步幅和填充是什么。如果你认真思考，它们就很直观。

如果你跨越一条走廊，你会更快地到达目的地，但这也意味着你观察到的周围环境比你穿过整个房间时要少。现在让我们通过一个漂亮的图片来加强对二维卷积的理解。

理解步幅

当你使用tf.nn.conv2d时，例如，你需要将它设置为一个4个元素的向量。没有理由感到害怕。它只包含以下顺序的步幅。

2D卷积 - [batch stride, height stride, width stride, channel stride]。在这里，批次步幅和通道步幅只需设置为1（我已经实现了5年的深度学习模型，从未将它们设置为除1以外的任何值）。所以你只需要设置2个步幅。
3D卷积 - [batch stride, height stride, width stride, depth stride, channel stride]。在这里，你只需要关注高度/宽度/深度步幅。

理解填充

现在，你会注意到，无论步幅有多小（即1），在卷积过程中都会发生不可避免的尺寸缩小（例如，在卷积4个单位宽度的图像后，宽度变成了3）。这是不可取的，特别是在构建深度卷积神经网络时。这就是填充的作用。有两种最常用的填充类型。

SAME 和 VALID

下面你可以看到它们的区别。

最后一句话: 如果你非常好奇，你可能会想知道。我们刚刚对整个自动降维进行了重大突破，现在又谈论不同的步幅。但步幅的最好之处在于，您可以控制何时、何地和如何降低维度。

- thushv89

5

总之，在一维卷积神经网络中，核心沿着一个方向移动。一维卷积神经网络的输入和输出数据是二维的。通常用于时间序列数据。

在二维卷积神经网络中，核心沿着两个方向移动。二维卷积神经网络的输入和输出数据是三维的。通常用于图像数据。

在三维卷积神经网络中，核心沿着三个方向移动。三维卷积神经网络的输入和输出数据是四维的。通常用于三维图像数据（MRI、CT扫描）。

您可以在这里找到更多详细信息：https://medium.com/@xzz201920/conv1d-conv2d-and-conv3d-8a59182c4d6

- zz x

1

或许需要提到的是，在CNN架构中，即使输入仅为1D，中间层通常也会具有2D输出。 - dmedine

1

CNN 1D、2D 或 3D 是指卷积方向，而不是输入或过滤器维度。
对于单通道输入，当卷积内核长度等于输入长度时，CNN2D 等同于 CNN1D。（1个卷积方向）

- Jerry Liu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- runhani · Accepted Answer

本文将通过C3D的图片进行解释。

简而言之，卷积方向和输出形状非常重要！

↑↑↑↑↑ 1D卷积 - 基础版 ↑↑↑↑↑

只有1个方向（时间轴）进行计算
输入 = [W]，滤波器= [k]，输出= [W]
例如：输入=[1,1,1,1,1]，滤波器=[0.25,0.5,0.25]，输出=[1,1,1,1,1]
输出形状是1D数组
例如：图形平滑处理

tf.nn.conv1d代码玩具示例

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)

↑↑↑↑↑ 2D卷积基础 ↑↑↑↑↑

使用2个方向(x，y)进行卷积计算
输出形状是二维矩阵
输入为[W,H]，滤波器为[k,k]，输出为[W,H]
例如) Sobel边缘滤波器

tf.nn.conv2d - 玩具示例

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

↑↑↑↑↑ 3D卷积 - 基础 ↑↑↑↑↑

使用3个方向(x,y,z)进行卷积计算
输出形状是3D体积
输入为[W,H,L]，滤波器为[k,k,d]，输出为[W,H,M]
d < L 是生成体积输出的重要条件
例如）C3D

tf.nn.conv3d - 玩具示例

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

↑↑↑↑↑ 使用3D输入的2D卷积 - LeNet, VGG, ..., ↑↑↑↑↑

尽管输入是3D的，例如224x224x3、112x112x32
输出形状不是3D体积，而是2D矩阵
因为过滤器深度=L必须与输入通道=L匹配
使用2个方向(x,y)计算卷积！而不是3D
输入=[W,H,L]，过滤器=[k,k,L]，输出=[W,H]
输出形状是2D矩阵
如果我们想要训练N个过滤器（N是过滤器的数量）
那么输出形状就是(堆叠的2D)3D = 2D × N矩阵。

conv2d - LeNet, VGG, ...用于1个过滤器

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels)) 
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

使用N个过滤器的conv2d - LeNet、VGG等

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

↑↑↑↑↑ CNN中的奖励1x1卷积 - GoogLeNet，...，↑↑↑↑↑

当你像Sobel一样将1x1卷积视为2D图像过滤器时，它很容易让人感到困惑。
在CNN中，1x1卷积的输入是上面图片所示的三维形状。
它进行深度过滤计算。
输入 = [W,H,L]，过滤器 = [1,1,L]，输出= [W,H]。
输出堆叠形状是3D = 2D x N矩阵。

tf.nn.conv2d - 特殊情况下的1x1卷积

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

动画（2D Conv与3D输入）

原始链接：链接
作者：Martin Görner
推特：@martin_gorner
Google+：plus.google.com/+MartinGorne

奖励 1D卷积与2D输入

↑↑↑↑↑ 1D卷积与1D输入 ↑↑↑↑↑

↑↑↑↑↑ 1D卷积与2D输入 ↑↑↑↑↑

尽管输入为2D，例如20x14
输出形状不是2D，而是1D矩阵
因为过滤器高度= L必须与输入高度= L匹配
1-方向（x）计算卷积！不是2D
输入= [W，L]，过滤器= [k，L]，输出= [W]
输出形状是1D矩阵
如果我们要训练N个过滤器（N是过滤器数量），那怎么办？
那么输出形状就是（堆叠的1D）2D= 1D x N矩阵。

奖励C3D

in_channels = 32 # 3, 32, 64, 128, ... 
out_channels = 64 # 3, 32, 64, 128, ... 
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])

filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d   = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)

sess.close()

卷积神经网络中1D、2D和3D卷积的直观理解

tf.nn.conv1d代码玩具示例

tf.nn.conv2d - 玩具示例

tf.nn.conv3d - 玩具示例

conv2d - LeNet, VGG, ...用于1个过滤器

使用N个过滤器的conv2d - LeNet、VGG等

tf.nn.conv2d - 特殊情况下的1x1卷积

动画（2D Conv与3D输入）

奖励 1D卷积与2D输入

奖励C3D

TensorFlow中的输入和输出

摘要

1D卷积

TF1示例

TF2示例

在实际应用中会是什么样子呢？

多通道是什么意思？

1D卷积的深度学习应用

2D卷积

TF1 示例，

TF2示例

这在现实生活中会是什么样子？

多通道是什么意思？

2D卷积的深度学习应用

3D卷积

TF1示例

TF2 示例

三维卷积的深度学习应用

什么是...更多术语？：步幅和填充

理解步幅

理解填充