TensorFlow记录使用浮点数NumPy数组

Question

TensorFlow记录使用浮点数NumPy数组

24

我想创建 TensorFlow Records 以供模型使用；目前，我正在使用以下代码将 uint8 类型的 numpy 数组存储为 TFRecord 格式：

def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def _bytes_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def _floats_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))


def convert_to_record(name, image, label, map):
    filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR, name + '.' + params.DATA_EXT)

    writer = tf.python_io.TFRecordWriter(filename)

    image_raw = image.tostring()
    map_raw   = map.tostring()
    label_raw = label.tostring()

    example = tf.train.Example(features=tf.train.Features(feature={
        'image_raw': _bytes_feature(image_raw),
        'map_raw': _bytes_feature(map_raw),
        'label_raw': _bytes_feature(label_raw)
    }))        
    writer.write(example.SerializeToString())
    writer.close()

我用这个示例代码来阅读它

features = tf.parse_single_example(example, features={
  'image_raw': tf.FixedLenFeature([], tf.string),
  'map_raw': tf.FixedLenFeature([], tf.string),
  'label_raw': tf.FixedLenFeature([], tf.string),
})

image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image_, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))

map = tf.decode_raw(features['map_raw'], tf.uint8)
map.set_shape(params.MAP_HEIGHT*params.MAP_WIDTH*params.MAP_DEPTH)
map = tf.reshape(map, (params.MAP_HEIGHT,params.MAP_WIDTH,params.MAP_DEPTH))

label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)

现在情况良好。现在我想对我的数组“map”执行相同的操作，但它是一个浮点numpy数组，而不是uint8，并且我找不到如何执行此操作的示例; 我尝试了函数_floats_feature，如果我将标量传递给它，它可以工作，但不能使用数组; 对于uint8，可以通过方法tostring()进行序列化;

我如何序列化一个浮点numpy数组，以及如何读取它？

- bfra

6个回答

5

我会扩展Yaroslav的回答。

Int64List、BytesList和FloatList希望有一个基础元素的迭代器(repeated field)（iterator of the underlying elements）。在你的情况下，你可以使用列表作为迭代器。

你提到："如果我传递标量，它可以工作，但是不能用数组"。这是可以预期的，因为当你传递一个标量时，你的_floats_feature会在其中创建一个浮点数元素的数组(正如预期的那样)。但是，当你传递一个数组时，你会创建一个数组列表，并将其传递给一个期望浮点数列表的函数。

所以只需从函数中删除数组的构造：float_list=tf.train.FloatList(value=value)

- Salvador Dali

3

我在解决类似的问题时偶然发现了这个。由于原始问题的一部分是如何从tfrecords中读取float32特征，因此如果有帮助，我将在此处留下它:

如果将尺寸为[x, y, z]的map输入到_floats_feature中使用了map.ravel()：

features = {
    ...
    'map': tf.FixedLenFeature([x, y, z], dtype=tf.float32)
    ...
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
map = parsed_example['map']

- prouast

1

当输入为ndarray时，Yaroslav的示例失败了：

numpy_arr = np.ones((3,3)).astype(np.float)

我发现当我使用numpy_arr.ravel()作为输入时它可以正常工作。但是有没有更好的方法呢？

- Tsuan

1

Yaroslav提到你需要一个浮点数列表，num_arr不是一个列表，所以你需要先将其展开，然后在传递给模型之前修复其形状。 - bantmen

0

使用tfrmaker，一个TFRecord实用程序包。您可以使用pip安装该软件包：

pip install tfrmaker

然后你可以像这样创建tfrecords：

from tfrmaker import images

# mapping label names with integer encoding.
LABELS = {"bishop": 0, "knight": 1, "pawn": 2, "queen": 3, "rook": 4}

# specifiying data and output directories.
DATA_DIR = "datasets/chess/"
OUTPUT_DIR = "tfrecords/chess/"

# create tfrecords from the images present in the given data directory.
info = images.create(DATA_DIR, LABELS, OUTPUT_DIR)

# info contains a list of information (path: releative path, size: no of images in the tfrecord) about created tfrecords
print(info)

该软件包还具有一些很酷的功能，例如：

动态调整大小
将tfrecords分割成最佳碎片
将tfrecords的训练、验证和测试分开
计算tfrecords中图像的数量
异步tfrecord创建

注意：该软件包目前支持以类名为子目录名称组织的图像数据集。

- Basil C Sunny

0

首先，非常感谢Yaroslav和Salvador提供的启发性答案。

根据我的经验，他们的方法仅适用于输入为1D NumPy数组，大小为(n, )。当输入为具有超过2个维度的Numpy数组时，将出现以下错误信息：

def _float_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": 
_float_feature(numpy_arr)}))
print(example)


TypeError: array([[0., 1., 2.],
   [3., 4., 5.]]) has type numpy.ndarray, but expected one of: int, long, float

所以，我想进一步扩展Tsuan的答案，即在将输入馈送到TF示例之前对其进行平坦化处理。修改后的代码如下：

def _floats_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float).flatten()
example = tf.train.Example(features=tf.train.Features(feature={"bytes": 
_float_feature(numpy_arr)}))
print(example)

此外，np.flatten() 比 np.ravel() 更适用。

- Ruochen Li

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Yaroslav Bulatov · Accepted Answer

FloatList 和 BytesList 需要一个可迭代对象。所以您需要传递一个浮点数列表。从您的 _float_feature 中删除多余的括号，即

def _floats_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=value))

numpy_arr = np.ones((3,)).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": _floats_feature(numpy_arr)}))
print(example)

features {
  feature {
    key: "bytes"
    value {
      float_list {
        value: 1.0
        value: 1.0
        value: 1.0
      }
    }
  }
}