使用PyTorch的nn.Sequential模型,我无法学习所有四个XOR布尔值的表示。
学习后:
我已经尝试使用几个随机种子运行相同的代码,但它无法学习XOR表示。如果没有PyTorch,我可以轻松地训练一个具有自定义导数函数的模型,并手动执行反向传播,请参见https://www.kaggle.io/svf/2342536/635025ecf1de59b71ea4fa03eb84f9f9/results.html#After-some-enlightenment。为什么使用PyTorch的2层MLP无法学习XOR表示?
[out]:
但是添加
它只在100次中的18次中成功学习了XOR表示... -_-|||
import numpy as np
import torch
from torch import nn
from torch.autograd import Variable
from torch import FloatTensor
from torch import optim
use_cuda = torch.cuda.is_available()
X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T
# Converting the X to PyTorch-able data structure.
X_pt = Variable(FloatTensor(X))
X_pt = X_pt.cuda() if use_cuda else X_pt
# Converting the Y to PyTorch-able data structure.
Y_pt = Variable(FloatTensor(Y), requires_grad=False)
Y_pt = Y_pt.cuda() if use_cuda else Y_pt
hidden_dim = 5
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
criterion = nn.L1Loss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000
for _ in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
loss_this_epoch.backward()
optimizer.step()
print([int(_pred > 0.5) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])
学习后:
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')
[输出]:
Input: [0, 0]
Pred: 0
Ouput: 0
######
Input: [0, 1]
Pred: 1
Ouput: 1
######
Input: [1, 0]
Pred: 0
Ouput: 1
######
Input: [1, 1]
Pred: 0
Ouput: 0
######
我已经尝试使用几个随机种子运行相同的代码,但它无法学习XOR表示。如果没有PyTorch,我可以轻松地训练一个具有自定义导数函数的模型,并手动执行反向传播,请参见https://www.kaggle.io/svf/2342536/635025ecf1de59b71ea4fa03eb84f9f9/results.html#After-some-enlightenment。为什么使用PyTorch的2层MLP无法学习XOR表示?
PyTorch中的模型如何:
hidden_dim = 5
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
这个与手写的使用导数和手动编写反向传播和优化器步骤的内容https://www.kaggle.com/alvations/xor-with-mlp不同吗?
它们是相同的隐藏层感知器网络吗?
已更新
奇怪的是,在nn.Linear
层之间添加nn.Sigmoid()
并没有起作用:
hidden_dim = 5
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Sigmoid(),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
criterion = nn.L1Loss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000
for _ in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
loss_this_epoch.backward()
optimizer.step()
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')
[out]:
Input: [0, 0]
Pred: 0
Ouput: 0
######
Input: [0, 1]
Pred: 1
Ouput: 1
######
Input: [1, 0]
Pred: 1
Ouput: 1
######
Input: [1, 1]
Pred: 1
Ouput: 0
######
但是添加
nn.ReLU()
会:model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
...
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')
[输出]:
Input: [0, 0]
Pred: 0
Ouput: 0
######
Input: [0, 1]
Pred: 1
Ouput: 1
######
Input: [1, 0]
Pred: 1
Ouput: 1
######
Input: [1, 1]
Pred: 1
Ouput: 0
######
对于非线性激活函数,sigmoid不够吗?
我知道ReLU
适用于布尔输出的任务,但是Sigmoid
函数难道不能产生相同/类似的效果吗?
更新2
运行相同的训练100次:
from collections import Counter
import random
random.seed(100)
import torch
from torch import nn
from torch.autograd import Variable
from torch import FloatTensor
from torch import optim
use_cuda = torch.cuda.is_available()
all_results=[]
for _ in range(100):
hidden_dim = 2
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.ReLU(), # Does the sigmoid has a build in biased?
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
criterion = nn.MSELoss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 3000
for _ in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
loss_this_epoch.backward()
optimizer.step()
##print([float(_pred) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])
x_pred = [int(model(_x)) for _x in X_pt]
y_truth = list([int(_y[0]) for _y in Y_pt])
all_results.append([x_pred == y_truth, x_pred, loss_this_epoch.data[0]])
tf, outputsss, losses__ = zip(*all_results)
print(Counter(tf))
它只在100次中的18次中成功学习了XOR表示... -_-|||
nn.Linear
之间是否有一个“预设”的 sigmoid 函数? - alvasmodel = Sequential(Linear, Sigmoid, Linear, Sigmoid, Linear, Sigmoid)
。我看到你设置了hidden_dim = 2
。对于Sigmoid,我会增加它,因为Sigmoid函数与ReLU的行为不同。 - Scratch'N'Purr