将图像张量分割成小块补丁。

Question

将图像张量分割成小块补丁。

8

我有一张形状为(466, 394, 1)的图片，希望将其拆分成7x7的小块。

image = tf.placeholder(dtype=tf.float32, shape=[1, 466, 394, 1])

使用

image_patches = tf.extract_image_patches(image, [1, 7, 7, 1], [1, 7, 7, 1], [1, 1, 1, 1], 'VALID')
# shape (1, 66, 56, 49)

image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])
# shape (3696, 7, 7, 1)

很不幸，实践中并不能起作用，因为image_patches_reshaped混淆了像素顺序（如果你查看images_patches_reshaped，你只会看到噪点）。

所以我的新方法是使用tf.split：

image_hsplits = tf.split(1, 4, image_resized)
# [<tf.Tensor 'split_255:0' shape=(462, 7, 1) dtype=float32>,...]

image_patches = []

for split in image_hsplits:
    image_patches.extend(tf.split(0, 66, split))

image_patches
# [<tf.Tensor 'split_317:0' shape=(7, 7, 1) dtype=float32>, ...]

这确实保留了图像像素顺序，但不幸的是会创建很多操作，这不是很好。

如何将图像分成更少操作的小块？

更新1：

我将这个问题的答案从numpy移植到tensorflow：

def image_to_patches(image, image_height, image_width, patch_height, patch_width):
    height = math.ceil(image_height/patch_height)*patch_height
    width = math.ceil(image_width/patch_width)*patch_width

    image_resized = tf.squeeze(tf.image.resize_image_with_crop_or_pad(image, height, width))
    image_reshaped = tf.reshape(image_resized, [height // patch_height, patch_height, -1, patch_width])
    image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
    return tf.reshape(image_transposed, [-1, patch_height, patch_width, 1])

但我认为仍有改进的空间。

更新2：

这将把补丁转换回原始图像。

def patches_to_image(patches, image_height, image_width, patch_height, patch_width):
    height = math.ceil(image_height/patch_height)*patch_height
    width = math.ceil(image_width/patch_width)*patch_width

    image_reshaped = tf.reshape(tf.squeeze(patches), [height // patch_height, width // patch_width, patch_height, patch_width])
    image_transposed = tf.transpose(image_reshaped, [0, 2, 1, 3])
    image_resized = tf.reshape(image_transposed, [height, width, 1])
    return tf.image.resize_image_with_crop_or_pad(image_resized, image_height, image_width)

- bodokaiser

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- saeta · Accepted Answer

我认为你的问题出在其他地方。我写了下面这段代码（使用了一个更小的14x14图像，以便手动检查所有值），并确认你的初始代码执行了正确的操作：

import tensorflow as tf
import numpy as np

IMAGE_SIZE = [1, 14, 14, 1]
PATCH_SIZE = [1, 7, 7, 1]

input_image = np.reshape(np.array(xrange(14*14)), IMAGE_SIZE)
image = tf.placeholder(dtype=tf.int32, shape=IMAGE_SIZE)
image_patches = tf.extract_image_patches(
    image, PATCH_SIZE, PATCH_SIZE, [1, 1, 1, 1], 'VALID')
image_patches_reshaped = tf.reshape(image_patches, [-1, 7, 7, 1])

sess = tf.Session()

(output, output_reshaped) = sess.run(
    (image_patches, image_patches_reshaped),
    feed_dict={image: input_image})

print "Output (shape: %s):" % (output.shape,)
print output

print "Reshaped (shape: %s):" % (output_reshaped.shape,)
print output_reshaped

输出结果为：

python resize.py 
Output (shape: (1, 2, 2, 49)):
[[[[  0   1   2   3   4   5   6  14  15  16  17  18  19  20  28  29  30  31
     32  33  34  42  43  44  45  46  47  48  56  57  58  59  60  61  62  70
     71  72  73  74  75  76  84  85  86  87  88  89  90]
   [  7   8   9  10  11  12  13  21  22  23  24  25  26  27  35  36  37  38
     39  40  41  49  50  51  52  53  54  55  63  64  65  66  67  68  69  77
     78  79  80  81  82  83  91  92  93  94  95  96  97]]

  [[ 98  99 100 101 102 103 104 112 113 114 115 116 117 118 126 127 128 129
    130 131 132 140 141 142 143 144 145 146 154 155 156 157 158 159 160 168
    169 170 171 172 173 174 182 183 184 185 186 187 188]
   [105 106 107 108 109 110 111 119 120 121 122 123 124 125 133 134 135 136
    137 138 139 147 148 149 150 151 152 153 161 162 163 164 165 166 167 175
    176 177 178 179 180 181 189 190 191 192 193 194 195]]]]
Reshaped (shape: (4, 7, 7, 1)):
[[[[  0]
   [  1]
   [  2]
   [  3]
   [  4]
   [  5]
   [  6]]

  [[ 14]
   [ 15]
   [ 16]
   [ 17]
   [ 18]
   [ 19]
   [ 20]]

  [[ 28]
   [ 29]
   [ 30]
   [ 31]
   [ 32]
   [ 33]
   [ 34]]

  [[ 42]
   [ 43]
   [ 44]
   [ 45]
   [ 46]
   [ 47]
   [ 48]]

  [[ 56]
   [ 57]
   [ 58]
   [ 59]
   [ 60]
   [ 61]
   [ 62]]

  [[ 70]
   [ 71]
   [ 72]
   [ 73]
   [ 74]
   [ 75]
   [ 76]]

  [[ 84]
   [ 85]
   [ 86]
   [ 87]
   [ 88]
   [ 89]
   [ 90]]]


 [[[  7]
   [  8]
   [  9]
   [ 10]
   [ 11]
   [ 12]
   [ 13]]

  [[ 21]
   [ 22]
   [ 23]
   [ 24]
   [ 25]
   [ 26]
   [ 27]]

  [[ 35]
   [ 36]
   [ 37]
   [ 38]
   [ 39]
   [ 40]
   [ 41]]

  [[ 49]
   [ 50]
   [ 51]
   [ 52]
   [ 53]
   [ 54]
   [ 55]]

  [[ 63]
   [ 64]
   [ 65]
   [ 66]
   [ 67]
   [ 68]
   [ 69]]

  [[ 77]
   [ 78]
   [ 79]
   [ 80]
   [ 81]
   [ 82]
   [ 83]]

  [[ 91]
   [ 92]
   [ 93]
   [ 94]
   [ 95]
   [ 96]
   [ 97]]]


 [[[ 98]
   [ 99]
   [100]
   [101]
   [102]
   [103]
   [104]]

  [[112]
   [113]
   [114]
   [115]
   [116]
   [117]
   [118]]

  [[126]
   [127]
   [128]
   [129]
   [130]
   [131]
   [132]]

  [[140]
   [141]
   [142]
   [143]
   [144]
   [145]
   [146]]

  [[154]
   [155]
   [156]
   [157]
   [158]
   [159]
   [160]]

  [[168]
   [169]
   [170]
   [171]
   [172]
   [173]
   [174]]

  [[182]
   [183]
   [184]
   [185]
   [186]
   [187]
   [188]]]


 [[[105]
   [106]
   [107]
   [108]
   [109]
   [110]
   [111]]

  [[119]
   [120]
   [121]
   [122]
   [123]
   [124]
   [125]]

  [[133]
   [134]
   [135]
   [136]
   [137]
   [138]
   [139]]

  [[147]
   [148]
   [149]
   [150]
   [151]
   [152]
   [153]]

  [[161]
   [162]
   [163]
   [164]
   [165]
   [166]
   [167]]

  [[175]
   [176]
   [177]
   [178]
   [179]
   [180]
   [181]]

  [[189]
   [190]
   [191]
   [192]
   [193]
   [194]
   [195]]]]

基于重塑后的输出，您可以看到它是一个4x7x7x1的张量，其中第一个块的值为：[0-7)，[14-21)，[28-35)，[42-49)，[56-63)，[70-77)和[84-91)，对应于左上角的7x7网格。也许您可以进一步解释一下当它不能正常工作时发生了什么？