如何通过反转dropout来弥补dropout效应并保持期望值不变？

Question

如何通过反转dropout来弥补dropout效应并保持期望值不变？

machine-learningneural-networkdeep-learningregularizeddropout

7

我正在学习来自deeplearning.ai课程中神经网络的正则化。在这里，在dropout正则化中，教授说如果应用了dropout，则计算得出的激活值将比不应用dropout时小(在测试时)。因此，我们需要缩放激活值以使测试阶段更简单。

我理解了这一事实，但我不明白如何进行缩放。这是一个用于实现反向dropout的代码示例。

keep_prob = 0.8   # 0 <= keep_prob <= 1
l = 3  # this code is only for layer 3
# the generated number that are less than 0.8 will be dropped. 80% stay, 20% dropped
d3 = np.random.rand(a[l].shape[0], a[l].shape[1]) < keep_prob

a3 = np.multiply(a3,d3)   # keep only the values in d3

# increase a3 to not reduce the expected value of output
# (ensures that the expected value of a3 remains the same) - to solve the scaling problem
a3 = a3 / keep_prob

在上述代码中，为什么要将激活值除以0.8或保留层中节点的概率（keep_prob）？任何数值示例都将有所帮助。

- Kaushal28

2个回答

1

另一种看待这个问题的方式可能是：

简而言之：尽管由于辍学我们有更少的神经元，但我们希望神经元对输出的贡献与当我们拥有所有神经元时相同。

使用dropout = 0.20，我们“关闭了20%的神经元”，这也等同于“保留了80%的神经元”。

假设神经元数量为x。“保留80%”具体来说是0.8 * x。再次将x除以keep_prob有助于将其“缩放回”原始值，即x/0.8：

x = 0.8 * x # x is 80% of what it used to be
x = x/0.8   # x is scaled back up to its original value

现在，反转的目的是确保 Z 值不会受到 W 减少的影响。 (Cousera)。

当我们通过 keep_prob 缩小 a3 时，我们无意中也缩小了 z4 的值（因为 z4 = W4 * a3 + b4）。为了补偿这种缩放，我们需要将其除以 keep_prob，以将其重新缩放。 (Stackoverflow)。

# keep 80% of the neurons
keep_prob = 0.8 
d3 = np.random.rand(a3.shape[0], a3.shape[1]) < keep_prob
a3 = np.multiply(a3, d3)

# Scale it back up
a3 = a3 / keep_prob  

# this way z4 is not affected
z4 = W4 * a3 + b4

如果你不进行缩放，会发生什么？

With scaling:
-------------
Cost after iteration 0: 0.6543912405149825
Cost after iteration 10000: 0.061016986574905605
Cost after iteration 20000: 0.060582435798513114

On the train set:
Accuracy: 0.9289099526066351
On the test set:
Accuracy: 0.95


Without scaling:
-------------
Cost after iteration 0: 0.6634619861891963
Cost after iteration 10000: 0.05040089794130624
Cost after iteration 20000: 0.049722351029060516

On the train set:
Accuracy: 0.933649289099526
On the test set:
Accuracy: 0.95

尽管这只是一个带有一个数据集的单一示例，我不确定它是否会在浅层神经网络中产生重大差异。也许更适用于更深的架构。

- user4109800

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kaushal28 · Accepted Answer

通过花费一些时间理解反向dropout，我自己得到了答案。以下是直觉:

我们保留每个层中的神经元的概率为keep_prob。假设keep_prob = 0.6。这意味着要关闭40%的神经元。如果在关闭40%的神经元之前，层的原始输出为x，那么应用40%的dropout后，它将减少0.4 * x。现在它将变为x-0.4x = 0.6x。

为了保持原始输出(期望值)，我们需要将输出除以keep_prob(或者这里的0.6)。