我有一个多类别分类的问题。我想编译我的模型:
feature_layer = DenseFeatures(feature_columns) # A layer that produces a dense Tensor
model = Sequential([
feature_layer,
Dense(32, activation='relu'),
Dense(3, activation='softmax')
])
所以我使用分类交叉熵损失:
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_ds,
validation_data=val_ds,
epochs=10)
当然,我知道to_categorical
方法,但它不接受BatchDataset作为参数,而train_ds和val_ds却是BatchDataset类型。
请指导我应该如何操作。
更新:我尝试进行以下操作:
def df_to_dataset(df, shuffle=True, batch_size=32):
df = df.copy()
labels = df.pop('class')
ds = tf.data.Dataset.from_tensor_slices((dict(df), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(df))
ds = ds.batch(batch_size).map(lambda x, y: (x, tf.one_hot(y, depth=3)))
return ds
batch_size = 32
train_ds = df_to_dataset(train, batch_size=batch_size) # error
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)
并且它给了我:
传递给参数'indices'的值具有数据类型字符串,不在允许的值列表中:uint8、int32、int64
我的类别列具有字符串值(它告诉对象是恒星、星系还是类星体,其他是int/float),但我已经弹出了它:
def df_to_dataset(df, shuffle=True, batch_size=32):
df = df.copy()
labels = df.pop('class')
ds = tf.data.Dataset.from_tensor_slices((dict(df), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(df))
return ds
df_to_dataset(df)
标签:
<ShuffleDataset shapes: ({objid: (), ra: (), dec: (), u: (), g: (), r: (), i: (), z: (), run: (), rerun: (), camcol: (), field: (), specobjid: (), redshift: (), plate: (), mjd: (), fiberid: ()}, ()), types: ({objid: tf.float64, ra: tf.float64, dec: tf.float64, u: tf.float64, g: tf.float64, r: tf.float64, i: tf.float64, z: tf.float64, run: tf.int64, rerun: tf.int64, camcol: tf.int64, field: tf.int64, specobjid: tf.float64, redshift: tf.float64, plate: tf.int64, mjd: tf.int64, fiberid: tf.int64}, tf.string)>