学习如何使用bert-base-cased和分类模型... 模型的代码如下:
def mao_func(input_ids, masks, labels):
return {'input_ids':input_ids, 'attention_mask':masks}, labels
dataset = dataset.map(mao_func)
BATCH_SIZE = 32
dataset = dataset.shuffle(100000).batch(BATCH_SIZE)
split = .8
ds_len = len(list(dataset))
train = dataset.take(round(ds_len * split))
val = dataset.skip(round(ds_len * split))
from transformers import TFAutoModel
bert = TFAutoModel.from_pretrained('bert-base-cased')
模型: "tf_bert_model"
层 (类型) 输出形状 参数数量
bert (TFBertMainLayer) 多个 108310272
================================================================= 总参数量: 108,310,272 可训练参数量: 108,310,272 不可训练参数量: 0
接着是神经网络的构建:
input_ids = tf.keras.layers.Input(shape=(50,), name='input_ids', dtype='int32')
mask = tf.keras.layers.Input(shape=(50,), name='attention_mask', dtype='int32')
embeddings = bert(input_ids, attention_mask=mask)[0]
X = tf.keras.layers.GlobalMaxPool1D()(embeddings)
X = tf.keras.layers.BatchNormalization()(X)
X = tf.keras.layers.Dense(128, activation='relu')(X)
X = tf.keras.layers.Dropout(0.1)(X)
X = tf.keras.layers.Dense(32, activation='relu')(X)
y = tf.keras.layers.Dense(3, activation='softmax',name='outputs')(X)
model = tf.keras.Model(inputs=[input_ids, mask], outputs=y)
model.layers[2].trainable = False
模型摘要如下:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 50)] 0 []
attention_mask (InputLayer) [(None, 50)] 0 []
tf_bert_model (TFBertModel) TFBaseModelOutputWi 108310272 ['input_ids[0][0]',
thPoolingAndCrossAt 'attention_mask[0][0]']
tentions(last_hidde
n_state=(None, 50,
768),
pooler_output=(Non
e, 768),
past_key_values=No
ne, hidden_states=N
one, attentions=Non
e, cross_attentions
=None)
global_max_pooling1d (GlobalMa (None, 768) 0 ['tf_bert_model[0][0]']
xPooling1D)
batch_normalization (BatchNorm (None, 768) 3072 ['global_max_pooling1d[0][0]']
alization)
dense (Dense) (None, 128) 98432 ['batch_normalization[0][0]']
dropout_37 (Dropout) (None, 128) 0 ['dense[0][0]']
dense_1 (Dense) (None, 32) 4128 ['dropout_37[0][0]']
outputs (Dense) (None, 3) 99 ['dense_1[0][0]']
==================================================================================================
Total params: 108,416,003
Trainable params: 104,195
Non-trainable params: 108,311,808
__________________________________________________________________________________________________
最终模型的拟合结果是
optimizer = tf.keras.optimizers.Adam(0.01)
loss = tf.keras.losses.CategoricalCrossentropy()
acc = tf.keras.metrics.CategoricalAccuracy('accuracy')
model.compile(optimizer,loss=loss, metrics=[acc])
history = model.fit(
train,
validation_data = val,
epochs=140
)
第7行出现执行错误->模型拟合(...):
ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 50), found shape=(None, 1, 512)
请问有没有好心人能够帮我看看我哪里做错了,以及为什么会错。非常感谢:)
更新:这是包含代码的 git 链接 https://github.com/CharlieArreola/OnlinePosts
shape=(1,512)
。如果这个方法可行,请告诉我。否则,你可能需要让我知道你正在使用什么数据集 ;) 另一个值得尝试的事情是在你的输入之前使用Flatten()-层,这应该会给你一个输出为(None,512)的Flatten层。 - Fabian