Seq2Seq模型和损失函数（在Keras中）

Question

Seq2Seq模型和损失函数（在Keras中）

pythontensorflowkeraskeras-layerloss-function

3

我需要帮忙翻译一下关于编程的内容。有一个seq2seq模型在某些情况下可以正常工作，但在某些情况下，它只返回结束标记作为结果。

例如：

For given vector :
[2, #start token
3,
123,
1548, #end token
1548,
1548,
1548,
1548,
1548,
1548]

The model predict :
[1548, 
1548,
1548,
1548,
1548,
1548,
1548,
1548,
1548,
1548]

我尝试使用Keras的SaveModel回调函数监控"loss"，但结果仍然相同。因此，我认为也许应该使用自己的损失函数。Keras提供了简单的损失函数：

def mean_absolute_error(y_true, y_pred):
    return K.mean(K.abs(y_pred - y_true), axis=-1)

both y_true and y_pred是tensorflow对象（我们只得到指向实际数组的指针），因此...为了创建一些逻辑，我们需要从GPU获取数组或上传自己的数组到GPU。

我想要的损失函数

def mean_absolute_error(y_true, y_pred):
    sum = 0
    for y , _y in zip(y_true , y_pred):
         if (y == _y) and (y == self.startToken or y == self.endToken):
              continue
         else:
              sum += abs(y - _y)
    return sum

我试过使用y_true.eval()，它应该将数组作为numpy对象传输到cpu上（Cannot evaluate tensor using eval(): No default session is registered）。

我没有找到如何将自己的数组上传到TensorFlow。

如果你有解决方案或任何建议，我会非常乐意听取。

谢谢..

（不是很重要，但是...）

基于模型：https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html，但输出为one-hot（二维[矩阵]）。

- Ori Yampolsky

在您提供的链接中，他们在看到结束标记后停止预测；具体来说，在decode_sequence函数中：# Exit condition: either hit max length or find stop character. 他们还用开始字符预填了模型的输出数组：# Populate the first character of target sequence with the start character. 我想知道您是否使用类似于他们的decode_sequence函数？ - vasilyrud

不需要这样，可以更简单地使用：model.predict([x,x])。 - Ori Yampolsky

这可能是问题的原因。尝试完全按照他们的“decode_sequence”函数操作，看看是否首先能够正常工作。这可能是在Keras中进行seq-to-seq预测的唯一方法。 - vasilyrud

我已经检查过了...它是一样的。 - Ori Yampolsky

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Daniel Möller · Accepted Answer

K.eval或者在损失函数中使用if是不明智的。张量的所有思想都是它们有一个由tensorflow/keras管理的内部连接，通过这个连接可以计算梯度和其他东西。如果使用eval并操作numpy值，将会破坏这个连接并破坏模型。只有在查看结果时才使用eval，而不是创建函数。对张量的值进行判断也不起作用，因为张量的值不可用。但是，有一些keras函数，如K.switch、K.greater、K.less等，在backend documentation中都有介绍。您可以使用这些函数重新创建函数。但是，说实话，我认为你应该选择“掩码”或“类加权”。

掩码（解决方案1）如果您正在使用嵌入层，可以有意地保留零值以表示“结束后没有任何内容”，然后在嵌入层中使用mask_zero=True，并像这样设置inputs:

[2, #start token
3,
123,
1548, #end token
0, #nothing, value to be masked
0,
0,
0,
0,
0]

另一个选择是不使用“结束标记”，而使用“零”代替。

类加权（解决方案2）

由于您的期望输出中可能有比其他任何内容都多得多的结束标记，因此您可以减少结束标记的相关性。

计算输出中每个类别的出现次数，并为结束标记计算比率。例如：

- 计算所有其他类别的出现次数的平均值 - 计算结束标记的出现次数 - 比率 = 其他类别平均值 / 结束标记出现次数

然后在“fit”方法中使用：

class_weight = {0:1, 1:1, 2:1, ...., 1548:ratio, 1549:1,1550:1,...}

可以轻松完成：

class_weight = {i:1. for i in range(totalTokens)}
class_weight[1548] = ratio
model.fit(...,...,....., class_weight = class_weight,...)

(Make sure you have 0 as a possible class in this case, or shift the indices by 1) 请确保在这种情况下，您将0作为可能的类，或将索引向右移动1个位置。

类似的损失函数（解决方案3）

请注意，y_pred永远不会“等于”y_true。

y_pred是可变、连续和可微分的。
y_true是精确和恒定的。

为了进行比较，您应该采用“argmax”，它非常类似于（如果不是完全相同的话）一个类索引。

def mean_absolute_error(y_true, y_pred):

    #for comparing, let's take exact values
    y_true_max = K.argmax(y_true)
    y_pred_max = K.argmax(y_pred)

    #compare with a proper tensor function
    equal_mask = K.equal(y_true_max,y_pred_max)
    is_start = K.equal(y_true_max, self.startTokenAsIndex)
    is_end = K.equal(y_true_max, self.endTokenAsIndex)

    #cast to float for multiplying and summing
    equal_mask = K.cast(equal_mask, K.floatx()) 
    is_start = K.cast(is_start, K.floatx())
    is_end = K.cast(is_end, K.floatx())
        #these are tensors with 0 (false) and 1 (true) as float
    
    #entire condition as you wanted
    condition = (is_start + is_end) * equal_mask
        # sum = or ||| multiply = and
        # we don't have to worry about the sum resulting in 2
            # because you will never have startToken == endToken

    #reverse condition:
    condition = 1 - condition

    #result
    return condition * K.mean(K.abs(y_pred - y_true), axis=-1)