如何使用Keras实现一个简单的神经网络

3

使用以下库:

import keras
import tensorflow as tf
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten

我需要创建一个如下所示的简单网络:

enter image description here

该神经网络应接收一个与我们的数字形状相同的张量作为输入。第一隐藏层应输出300维向量并使用sigmoid作为激活函数。第二隐藏层也应输出300维向量并使用relu作为激活函数。第三隐藏层(即输出层)应输出预测大小并在最后使用softmax激活进行多类别分类。

目前,我有:

model = None

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(5, input_shape=(300,)))
model.add(Activation(___))
model.add(tf.keras.layers.Dense(6))
model.add(Activation(___))
model.add(tf.keras.layers.Dense(7))
model.add(Activation(___))

我不确定Dense中需要放入什么。我只是从这里https://keras.io/api/models/sequential/引用了它。

我知道第一个激活函数应该是sigmoid,其中sigmoid是1/(1 + np.exp(-x))

我也知道relu是max(0.0, x),softmax是

def softmax(vector):
    e = exp(vector)
    return e / e.sum()

然而我不确定如何将所有这些内容组合在一起创建神经网络并获得输出结果:

enter image description here

如果有人愿意帮助,我将非常感激,因为这是我第一次尝试初始化神经网络。谢谢!


1
Keras网站和Github上有很多示例,你看过吗?或者看过文档了吗?另外请注意,Keras已经实现了这些激活函数。 - Dr. Snoopy
请查看Keras文档中的激活函数 - 您将看到tf.keras.activations.sigmoid(x)tf.keras.activations.softmax(x, axis=-1)tf.keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0) - furas
activations的文档中,您还可以看到Dense(..., activation='softmax')Dense(..., activation='relu')等。 - furas
digits 的大小是多少?在第一层中,您应该使用 input_shape=(digit_size,) - furas
1个回答

3

keras.layers.activations的文档中,您可以看到:

tf.keras.activations.sigmoid(x)
tf.keras.activations.softmax(x, axis=-1) 
tf.keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0)

因此,代码可能是:

import tensorflow as tf

model = tf.keras.Sequential(name='sequential_2')

model.add(tf.keras.layers.Dense(300, name='dense_5', input_shape=(784,)))
model.add(tf.keras.layers.Activation(tf.keras.activations.sigmoid, name='activation_5'))

model.add(tf.keras.layers.Dense(300, name='dense_6'))
model.add(tf.keras.layers.Activation(tf.keras.activations.relu, name='activation_6'))

model.add(tf.keras.layers.Dense(10, name='dense_7'))
model.add(tf.keras.layers.Activation(tf.keras.activations.softmax, name='activation_7'))

model.summary()

结果:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_5 (Dense)             (None, 300)               235500    
                                                                 
 activation_5 (Activation)   (None, 300)               0         
                                                                 
 dense_6 (Dense)             (None, 300)               90300     
                                                                 
 activation_6 (Activation)   (None, 300)               0         
                                                                 
 dense_7 (Dense)             (None, 10)                3010      
                                                                 
 activation_7 (Activation)   (None, 10)                0         
                                                                 
=================================================================
Total params: 328,810
Trainable params: 328,810
Non-trainable params: 0
_________________________________________________________________

使用不同的导入可以减少代码量,同时仍然获得相同的总结
#import tensorflow as tf  # no need it at this moment
from keras import Sequential
from keras.layers import Dense, Activation
from keras.activations import sigmoid, relu, softmax

model = Sequential(name='sequential_2')

model.add(Dense(300, name='dense_5', input_shape=(784,)))
model.add(Activation(sigmoid, name='activation_5'))

model.add(Dense(300, name='dense_6'))
model.add(Activation(relu, name='activation_6'))

model.add(Dense(10, name='dense_7'))
model.add(Activation(softmax, name='activation_7'))

model.summary()

但您也可以减少代码来获得相同的模型:

#import tensorflow as tf  # no need it at this moment
from keras import Sequential
from keras.layers import Dense

model = Sequential(name='sequential_2')

model.add(Dense(300, name='dense_5', activation='sigmoid', input_shape=(784,)))
model.add(Dense(300, name='dense_6', activation='relu'))
model.add(Dense(10,  name='dense_7', activation='softmax'))

model.summary()

但是summary也会被缩短:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_5 (Dense)             (None, 300)               235500    
                                                                 
 dense_6 (Dense)             (None, 300)               90300     
                                                                 
 dense_7 (Dense)             (None, 10)                3010      
                                                                 
=================================================================
Total params: 328,810
Trainable params: 328,810
Non-trainable params: 0
_________________________________________________________________

您也可以省略 name= 以减少代码。

#import tensorflow as tf  # no need it at this moment
from keras import Sequential
from keras.layers import Dense

model = Sequential()

model.add(Dense(300, activation='sigmoid', input_shape=(784,)))
model.add(Dense(300, activation='relu'))
model.add(Dense(10,  activation='softmax'))

model.summary()

顺便说一下:

如果你想使用具有形状为(28,28)的图像的MNIST,那么您可以添加层Flatten来自动将形状(28,28)转换为形状(784)

#import tensorflow as tf  # no need it at this moment
from keras import Sequential
from keras.layers import Dense, Flatten

model = Sequential()

#model.add(Flatten(input_shape=(1,28,28)))
model.add(Flatten(input_shape=(28,28)))
model.add(Dense(300, activation='sigmoid'))
model.add(Dense(300, activation='relu'))
model.add(Dense(10,  activation='softmax'))

model.summary()

结果:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 300)               235500    
                                                                 
 dense_1 (Dense)             (None, 300)               90300     
                                                                 
 dense_2 (Dense)             (None, 10)                3010      
                                                                 
=================================================================
Total params: 328,810
Trainable params: 328,810
Non-trainable params: 0
_________________________________________________________________

如果在fit()之后使用summary(),那么您甚至可以跳过Flatten()中的input_shape=(28,28),因为fit()会设置它。


编辑:

完整可用代码。

已测试通过tensorflow 2.8.0Python 3.10Linux Mint 21.0(基于Ubuntu 22.04)。

import tensorflow as tf
from keras import Sequential
from keras.layers import Dense, Flatten
from keras.utils.np_utils import to_categorical
#from keras.losses import categorical_crossentropy
from keras.datasets import mnist
import numpy as np

print('\n--- version ---\n')

print('tensorflow:', tf.__version__)
#print('tensorflow:', tf.version.VERSION)
#print('tensorflow:', keras.__version__)

print('\n--- data ---\n')

(x_train, y_train), (x_test, y_test) = mnist.load_data()
print('train.shape  :', x_train.shape)
print('test.shape   :', x_test.shape)

print('image.shape  :', x_train[0].shape)
print('image.flatten:', x_train[0].flatten().shape)

y_train = to_categorical(y_train)  # `Y` will have shape `(10)` and last layer will also need `Dense(10)`
y_test  = to_categorical(y_test)   # `Y` will have shape `(10)` and last layer will also need `Dense(10)`

print('\n--- model ---\n')

model = Sequential()

#model.add(Flatten(input_shape=(1,28,28)))  # if `summary()` used before `fit()` then you have to set `input_shape`
model.add(Flatten())                        # if `summary()` used after  `fit()` then you can skip `input_shape` (because `fit` will set it)
model.add(Dense(300, activation='sigmoid'))
model.add(Dense(300, activation='relu'))
model.add(Dense(10,  activation='softmax'))

#model.compile(loss=categorical_crossentropy, optimizer='adam', metrics=['accuracy'])  # function `categorical_crossentropy`
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])  # string `'categorical_crossentropy'`
print('\n--- fit/train ---\n')

batch_size = 20
epochs = 5

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=True, validation_data=(x_test, y_test))

print('\n--- evaluate/test ---\n')

score = model.evaluate(x_test, y_test, verbose=True)
print(score)

print('\n--- predict ---\n')

# for N-elements it needs shape `(N, 28, 28)`
# for single element it needs shape `(1, 28, 28)`, not `(28, 28)`
#items = x_test[0]                          # WRONG
#items = x_test[0][np.newaxis, ...]         # OK
#items = x_test[0][None, ...]               # OK
#items = x_test[0:1]                        # OK
#items = np.array( [x_test[0]] )            # OK
#items = np.expand_dims(x_test[0], axis=0)  # OK
#items = x_test[0].reshape(-1,28,28)        # OK
items = x_test[0].reshape(1,28,28)          # OK
print('items.shape:', items.shape)

results = model.predict(items)   # it needs `numpy.array` even for single element (shape: `(1, 28, 28)`)

for item in results:
    print('predict:', item)
    print('max    :', np.max(item))
    print('argmax :', np.argmax(item))    
    
#print('result:', results[0])
#print('max:', np.max(results[0]), np.argmax(results[0]))

print('\n--- summary ---\n')

model.summary() 

#print('\n--- show image ---\n')

#import matplotlib.pyplot as plt
#plt.imshow(x_test[0])
#plt.show()

结果:

--- version ---

tensorflow: 2.8.0

--- data ---

train.shape  : (60000, 28, 28)
test.shape   : (10000, 28, 28)
image.shape  : (28, 28)
image.flatten: (784,)

--- model ---

--- fit/train ---

Epoch 1/5
3000/3000 [==============================] - 26s 8ms/step - loss: 0.4459 - accuracy: 0.8600 - val_loss: 0.3425 - val_accuracy: 0.8907
Epoch 2/5
3000/3000 [==============================] - 31s 10ms/step - loss: 0.3133 - accuracy: 0.9011 - val_loss: 0.2753 - val_accuracy: 0.9128
Epoch 3/5
3000/3000 [==============================] - 22s 7ms/step - loss: 0.2702 - accuracy: 0.9153 - val_loss: 0.2291 - val_accuracy: 0.9271
Epoch 4/5
3000/3000 [==============================] - 22s 7ms/step - loss: 0.2482 - accuracy: 0.9218 - val_loss: 0.2142 - val_accuracy: 0.9346
Epoch 5/5
3000/3000 [==============================] - 21s 7ms/step - loss: 0.2246 - accuracy: 0.9292 - val_loss: 0.2061 - val_accuracy: 0.9362

--- evaluate/test ---

313/313 [==============================] - 2s 6ms/step - loss: 0.2061 - accuracy: 0.9362
[0.20612983405590057, 0.9362000226974487]

--- predict ---

items.shape: (1, 28, 28)
predict: [2.2065036e-08 3.6986236e-09 2.2580415e-04 2.4656517e-06 6.2174437e-12
 6.3476205e-07 1.2726037e-15 9.9974352e-01 1.7498225e-07 2.7370381e-05]
max    : 0.9997435
maxarg : 7

--- summary ---

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 300)               235500    
                                                                 
 dense_1 (Dense)             (None, 300)               90300     
                                                                 
 dense_2 (Dense)             (None, 10)                3010      
                                                                 
=================================================================
Total params: 328,810
Trainable params: 328,810
Non-trainable params: 0
_________________________________________________________________

你好!非常感谢您提供如此详细的答案! - user19825372
我添加了完整的工作代码 - 包括 fit()evaluate()predict()(用于单个项目)。 - furas

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接