使用tensorflow中的tf.nn.conv2d_transpose获取反卷积层的输出形状

Question

使用tensorflow中的tf.nn.conv2d_transpose获取反卷积层的输出形状

neural-networktensorflowdeep-learningconv-neural-network

5

根据这篇论文，输出形状为N + H - 1，其中N是输入的高度或宽度，H是卷积核的高度或宽度。这是卷积的明显反向过程。这个教程给出了一个计算卷积输出形状的公式：(W−F+2P)/S+1，其中W是输入尺寸，F是滤波器大小，P是填充大小，S是步幅。但在Tensorflow中，有一些测试用例如下：

  strides = [1, 2, 2, 1]

  # Input, output: [batch, height, width, depth]
  x_shape = [2, 6, 4, 3]
  y_shape = [2, 12, 8, 2]

  # Filter: [kernel_height, kernel_width, output_depth, input_depth]
  f_shape = [3, 3, 2, 3]

因此，根据公式 (W−F+2P)/S+1 我们使用 y_shape、f_shape 和 x_shape 来计算填充大小 P。从 (12 - 3 + 2P) / 2 + 1 = 6 可以得知 P = 0.5，这不是一个整数。那么在Tensorflow中如何进行反卷积操作呢？

- Xiuyi Yang

3个回答

3

在这个教程中，输出大小的公式假定填充P在图像的左右或上下是相同的。那么，你需要将核放置的位置数目为： W（图像大小）- F（核的大小）+ P（添加的填充量之前）+ P（添加的填充量之后）。

但是tensorflow也处理这样一种情况：你需要在某一侧填充更多的像素，以便核正确地适配。您可以在文档中了解有关选择填充策略（"SAME"和"VALID"）的更多信息。你所说的测试使用了方法"VALID"。

- sygi

1

这个讨论非常有帮助。只需添加一些额外的信息。 padding='SAME'可以让底部和右侧得到一个额外的填充像素。根据TensorFlow文档，以及下面的测试用例。

strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 12, 8, 2]

# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]

正在使用padding='SAME'。我们可以将padding ='SAME'解释为：

(W−F+pad_along_height)/S+1 = out_height,
(W−F+pad_along_width)/S+1 = out_width.

所以，(12-3+pad_along_height)/2+1=6，我们得到pad_along_height=1。而且，pad_top=pad_along_height/2=1/2=0（整数除法），pad_bottom=pad_along_height-pad_top=1。

至于padding='VALID'，顾名思义，我们在适当的时候使用填充。首先，我们假设填充像素为0，如果这样不起作用，那么我们就在原始输入图像区域外的任何值处添加0填充。例如，下面的测试案例：

strides = [1, 2, 2, 1]

# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 13, 9, 2]

# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]

"

conv2d的输出形状为

"

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
           = ceil(float(13 - 3 + 1) / float(3)) = ceil(11/3) = 6
           = (W−F)/S + 1.

由于 (W-F)/S+1 = (13-3)/2+1 = 6，结果是一个整数，我们不需要在图像边框周围添加0像素，并且 TensorFlow文档 padding='VALID' 部分中的 pad_top=1/2、pad_left=1/2 都为0。

- Hui Xu

2

答案涉及到 tf.nn.conv2d，那么对于 tf.nn.conv2d_transpose，填充模式是如何工作的呢？tf.nn.conv2d_transpose 会使输出张量大于输入。 - gaussclb

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Vinoj John Hosan · Accepted Answer

对于反卷积，

output_size = strides * (input_size-1) + kernel_size - 2*padding

strides、input_size、kernel_size 和 padding 都是整数。

'valid' 的 padding 值为零。