尝试理解PyTorch中的交叉熵损失函数

7

这是一个非常初学者的问题,但我正在努力理解Torch中的交叉熵损失函数,因此我创建了以下代码:

x = torch.FloatTensor([
                        [1.,0.,0.]
                       ,[0.,1.,0.]
                       ,[0.,0.,1.]
                       ])

print(x.argmax(dim=1))

y = torch.LongTensor([0,1,2])
loss = torch.nn.functional.cross_entropy(x, y)

print(loss)

这将输出以下内容:

tensor([0, 1, 2])
tensor(0.5514)

我不理解的是,既然我的输入与期望输出相匹配,为什么损失值不为0?

3个回答

4
这是因为您输入到交叉熵函数中的内容不是概率,而是需要使用以下公式转换为概率的logits:
probas = np.exp(logits)/np.sum(np.exp(logits), axis=1)

因此,PyTorch 在您的情况下将使用以下概率矩阵:

[0.5761168847658291,  0.21194155761708547,  0.21194155761708547]
[0.21194155761708547, 0.5761168847658291, 0.21194155761708547]
[0.21194155761708547,  0.21194155761708547, 0.5761168847658291]

从数学角度来看,OP需要将y转换为概率分布。 - Xatyrian
是的,如果我们将输入更改为如下: x = torch.FloatTensor([ [10.,0.,0.] ,[0.,10.,0.] ,[0.,0.,10.] ]) 那么F.cross_entropy的结果将接近于零,因此F.cross_entropy期望地面真实值与其他类之间的差异越大越好,而不是地面真实值= 1最好。 - Wade Wang

3

torch.nn.functional.cross_entropy函数将log_softmax(softmax函数后跟对数函数)和nll_loss(负对数似然损失)结合在一个单一的函数中,即等价于F.nll_loss(F.log_softmax(x, 1), y)

代码:

x = torch.FloatTensor([[1.,0.,0.],
                       [0.,1.,0.],
                       [0.,0.,1.]])
y = torch.LongTensor([0,1,2])

print(torch.nn.functional.cross_entropy(x, y))

print(F.softmax(x, 1).log())
print(F.log_softmax(x, 1))

print(F.nll_loss(F.log_softmax(x, 1), y))

输出:

tensor(0.5514)
tensor([[-0.5514, -1.5514, -1.5514],
        [-1.5514, -0.5514, -1.5514],
        [-1.5514, -1.5514, -0.5514]])
tensor([[-0.5514, -1.5514, -1.5514],
        [-1.5514, -0.5514, -1.5514],
        [-1.5514, -1.5514, -0.5514]])
tensor(0.5514)

"最初的回答":请点击此处,了解有关torch.nn.functional.cross_entropy损失函数的更多信息。

-1

完整的、可复制/粘贴的示例,展示了通过以下方式进行分类交叉熵损失计算的示例:

-纸笔+计算器
-NumPy
-PyTorch

除了小的四舍五入差异外,所有3个结果都是相同的:

import torch
import torch.nn.functional as F

import numpy as np

def main():

    ### paper + pencil + calculator calculation #################

    """
    predictions before softmax:
                  columns
               (4 categories)
        rows     1, 4, 1, 1
    (3 samples)  5, 1, 2, 1
                 1, 2, 5, 1

    ground truths (NOT one hot encoded)
          1, 0, 2

    preds softmax calculation:
    (e^1/(e^1+e^4+e^1+e^1)), (e^4/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1)), (e^1/(e^1+e^4+e^1+e^1))
    (e^5/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1)), (e^2/(e^5+e^1+e^2+e^1)), (e^1/(e^5+e^1+e^2+e^1))
    (e^1/(e^1+e^2+e^5+e^1)), (e^2/(e^1+e^2+e^5+e^1)), (e^5/(e^1+e^2+e^5+e^1)), (e^1/(e^1+e^2+e^5+e^1))

    preds after softmax:
    0.04332, 0.87005, 0.04332, 0.04332
    0.92046, 0.01686, 0.04583, 0.01686
    0.01686, 0.04583, 0.92046, 0.01686

    categorical cross-entropy loss calculation:
    (-ln(0.87005) + -ln(0.92046) + -ln(0.92046)) / 3 = 0.10166

    Note the loss ends up relatively low because all 3 predictions are correct
    """


    ### calculation via NumPy ###################################

    # predictions from model (just made up example data in this case)
    # rows = 3 samples, cols = 4 categories
    preds = np.array([[1, 4, 1, 1],
                      [5, 1, 2, 1],
                      [1, 2, 5, 1]], dtype=np.float32)

    # ground truths, NOT one hot encoded
    gndTrs = np.array([1, 0, 2], dtype=np.int64)

    preds = softmax(preds)

    loss = calcCrossEntropyLoss(preds, gndTrs)

    print('\n' + 'NumPy loss = ' + str(loss) + '\n')

    ### calculation via PyTorch #################################

    # predictions from model (just made up example data in this case)
    # rows = 3 samples, cols = 4 categories
    preds = torch.tensor([[1, 4, 1, 1],
                          [5, 1, 2, 1],
                          [1, 2, 5, 1]], dtype=torch.float32)

    # ground truths, NOT one hot encoded
    gndTrs = torch.tensor([1, 0, 2], dtype=torch.int64)

    loss = F.cross_entropy(preds, gndTrs)

    print('PyTorch loss = ' + str(loss) + '\n')
# end function

def softmax(x: np.ndarray) -> np.ndarray:
    numSamps = x.shape[0]

    for i in range(numSamps):
        x[i] = np.exp(x[i]) / np.sum(np.exp(x[i]))
    # end for

    return x
# end function

def calcCrossEntropyLoss(preds: np.ndarray, gndTrs: np.ndarray) -> np.ndarray:
    assert len(preds.shape) == 2
    assert len(gndTrs.shape) == 1
    assert preds.shape[0] == gndTrs.shape[0]

    numSamps = preds.shape[0]

    mySum = 0.0
    for i in range(numSamps):
        # Note: in numpy, "log" is actually natural log (ln)
        mySum += -1 * np.log(preds[i, gndTrs[i]])
    # end for

    crossEntLoss = mySum / numSamps
    return crossEntLoss
# end function

if __name__ == '__main__':
    main()

程序输出:

NumPy loss = 0.10165966302156448

PyTorch loss = tensor(0.1017)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接