TensorFlow: 实现基于类别权重的交叉熵损失函数？

Question

TensorFlow: 实现基于类别权重的交叉熵损失函数？

machine-learningtensorflowcomputer-visiondeep-learningimage-segmentation

3

假设我们对用于分割的图像进行中位数频率平衡后，得到了以下类别权重：

class_weights = {0: 0.2595,
                 1: 0.1826,
                 2: 4.5640,
                 3: 0.1417,
                 4: 0.9051,
                 5: 0.3826,
                 6: 9.6446,
                 7: 1.8418,
                 8: 0.6823,
                 9: 6.2478,
                 10: 7.3614,
                 11: 0.0}

创建一个权重掩码的想法是，它可以与两个类别的交叉熵输出相乘。为了创建这个权重掩码，我们可以根据ground_truth标签或预测广播值。在我的实现中，有一些数学公式：

1. 标签和logits都具有形状 [batch_size, height, width, num_classes]。 2. 权重掩码具有形状 [batch_size, height, width, 1]。 3. 将权重掩码广播到softmax(logit和标签之间的乘积)的num_classes通道上，以产生输出形状 [batch_size, height, width, num_classes]。在这种情况下，num_classes是12。 4. 对批处理中的每个示例进行reduce_sum，然后对所有示例执行reduce_mean，以获得单个标量损失值。

在这种情况下，应该基于predictions还是ground truth来创建权重掩码？

如果我们基于 ground_truth 来构建它，那么不管预测像素标签是什么，它们都会根据类别的实际标签而受到惩罚，这似乎并不会以明智的方式指导训练。

但是，如果我们基于 predictions 来构建它，那么对于所产生的任何 logit 预测，如果预测标签（通过取logit的argmax）是主导的，那么该像素的 logit 值将全部减少一个显着数量。

尽管这意味着最大logit仍然是最大的，因为12个通道中的所有logits都将缩放相同的值，但是预测的标签的最终softmax概率（在缩放之前和之后仍然相同）将低于缩放之前的概率。预测较低的损失。

但问题在于：如果通过这种加权预测出了更低的损失，那么这是否会与预测主导标签应该给您更大损失的想法相矛盾？

我总体上对此方法的印象是：

1. 对于支配标签，它们受到的惩罚和奖励要少得多。 2. 对于次要标签，如果预测正确，则会高度奖励，但是如果预测错误，则会受到严重惩罚。

因此，这如何有助于解决类平衡问题？我不太明白这里的逻辑。

实现：以下是我的当前实现，用于计算加权交叉熵损失，尽管我不确定它是否正确。

def weighted_cross_entropy(logits, onehot_labels, class_weights):
    if not logits.dtype == tf.float32:
        logits = tf.cast(logits, tf.float32)

    if not onehot_labels.dtype == tf.float32:
        onehot_labels = tf.cast(onehot_labels, tf.float32)

    #Obtain the logit label predictions and form a skeleton weight mask with the same shape as it
    logit_predictions = tf.argmax(logits, -1) 
    weight_mask = tf.zeros_like(logit_predictions, dtype=tf.float32)

    #Obtain the number of class weights to add to the weight mask
    num_classes = logits.get_shape().as_list()[3]

    #Form the weight mask mapping for each pixel prediction
    for i in xrange(num_classes):
        binary_mask = tf.equal(logit_predictions, i) #Get only the positions for class i predicted in the logits prediction
        binary_mask = tf.cast(binary_mask, tf.float32) #Convert boolean to ones and zeros
        class_mask = tf.multiply(binary_mask, class_weights[i]) #Multiply only the ones in the binary mask with the specific class_weight
        weight_mask = tf.add(weight_mask, class_mask) #Add to the weight mask

    #Multiply the logits with the scaling based on the weight mask then perform cross entropy
    weight_mask = tf.expand_dims(weight_mask, 3) #Expand the fourth dimension to 1 for broadcasting
    logits_scaled = tf.multiply(logits, weight_mask)

    return tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits_scaled)

请问有人能够验证我的加权损失理解是否正确，以及我的实现是否正确吗？这是我第一次接触到有不平衡类别的数据集，所以我真的很希望有人能够验证一下。

测试结果：经过一些测试，我发现上面的实现会导致更大的损失。这应该是正常情况吗？也就是说，这会使训练变得更加困难，但最终会产生更准确的模型吗？

类似话题：

请注意，我已经在这里查看了一个类似的帖子：如何在tensorflow中使用sparse_softmax_cross_entropy_with_logits实现加权交叉熵损失但似乎TF只有针对样本的权重，而不是针对类别的权重。

非常感谢大家。

- kwotsin

如果我们基于ground_truth构建它，那么这意味着无论预测的像素标签是什么，它们都会根据类别的实际标签受到惩罚，这似乎不能以明智的方式指导训练。为什么会这样？ - P-Gn

如果某个像素[x，y]应该被标记为1，但是预测可以是0到11的任何值，则无论针对该标签给出什么预测，应用于logits的特定像素的缩放将始终相同，无论是什么logit预测。我认为这很奇怪，因为我们想要自适应地惩罚预测的标签。您对此有何见解？ - kwotsin

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jessica Alan · Accepted Answer

这是我使用TensorFlow后端在Keras中实现的代码：

def class_weighted_pixelwise_crossentropy(target, output):
    output = tf.clip_by_value(output, 10e-8, 1.-10e-8)
    with open('class_weights.pickle', 'rb') as f:
        weight = pickle.load(f)
    return -tf.reduce_sum(target * weight * tf.log(output))

这里的weight仅是一个标准的Python列表，其中索引与one-hot向量中相应类的索引相匹配。我将权重存储为pickle文件，以避免重新计算。这是对Keras分类交叉熵损失函数的改编。第一行代码只是剪辑值，以确保我们永远不会取对数为0。

我不确定为什么要使用预测结果来计算权重，而不是使用实际结果；如果您提供更多解释，我可以在回应中更新我的答案。

编辑：尝试运行此numpy代码以了解其工作原理。还请查看交叉熵的定义。

import numpy as np

weights = [1,2]

target = np.array([ [[0.0,1.0],[1.0,0.0]],
                    [[0.0,1.0],[1.0,0.0]]])

output = np.array([ [[0.5,0.5],[0.9,0.1]],
                    [[0.9,0.1],[0.4,0.6]]])

crossentropy_matrix = -np.sum(target * np.log(output), axis=-1)
crossentropy = -np.sum(target * np.log(output))