我对于GPU和CPU在小型网络中速度相似(有时CPU更快),而在大型网络中GPU更快的原因有些困惑。下面这段代码在i7-6700k上运行需要103.7秒,但是当使用tensorflow-gpu时,这段代码只需29.5秒。
然而,当我训练一个只有100个隐藏神经元的网络时,而不是像下面例子中的1000个,使用GPU可以得到大约20秒的运行时间,而使用CPU则只需15秒。
我在另一个stack overflow的答案中读到,CPU-> GPU转移需要很长时间,我认为这是指将数据样本加载到GPU上。
有人能解释这种情况发生的原因,并可能提供一些可用来最大化速度的代码修改建议吗?
import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.utils import np_utils
from keras.layers.core import Dense, Activation, Flatten, Dropout
from sklearn.preprocessing import normalize
## Importing the MNIST dataset using Keras
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape for vector input
N, x, y = X_train.shape
X_train = normalize(np.reshape(X_train, (N, x * y)))
N, x, y = X_test.shape
X_test = normalize(np.reshape(X_test, (N, x * y)))
# one-hot encoding
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
model = Sequential()
model.add(Dense(output_dim=750, input_dim=784))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(150))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy'])
fit = model.fit(X_train, y_train, batch_size=128, nb_epoch=10, verbose=0)
## Printing the accuracy of our model, according to the loss function specified in model.compile above
score = model.evaluate(X_test, y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])