将张量列表转换为PyTorch张量

Question

将张量列表转换为PyTorch张量

23

我有一个张量列表，其中每个张量的大小不同。如何使用PyTorch将这个张量列表转换为一个张量？

例如，

x[0].size() == torch.Size([4, 8])
x[1].size() == torch.Size([4, 7])  # different shapes!

这个：

torch.tensor(x)

出现错误：

ValueError：仅有一个元素的张量可以转换为Python标量

- Omar Abdelaziz

1

请提供更多的代码。 - Fábio Perez

对于features中的每个项目： x.append(torch.tensor((item))) - Omar Abdelaziz

这给了我一个张量列表，但每个张量的大小不同，因此当我尝试 torch.stack(x) 时，它会给出相同的错误提示 @FábioPerez - Omar Abdelaziz

如果您对长度相同的情况感兴趣，这里有解决方案：https://dev59.com/EVQJ5IYBdhLWcg3wNjO4#66036075 - Charlie Parker

我相信你的问题在PyTorch论坛中也有解决方案：https://discuss.pytorch.org/t/nested-list-of-variable-length-to-a-tensor/38699/21 - Charlie Parker

显示剩余2条评论

3个回答

8

Tensor在pytorch中不像python中的List，可以保存可变长度的对象。

在pytorch中，你可以将固定长度的数组转换为Tensor:

>>> torch.Tensor([[1, 2], [3, 4]])
>>> tensor([[1., 2.],
            [3., 4.]])

与其：

>>> torch.Tensor([[1, 2], [3, 4, 5]])
>>> 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-809c707011cc> in <module>
----> 1 torch.Tensor([[1, 2], [3, 4, 5]])

ValueError: expected sequence of length 2 at dim 1 (got 3)

对于torch.stack也是一样的。

- cloudyyyyy

嗨，cloudyy，谢谢你的回答，很有帮助...但是如果我有不同长度的张量列表需要堆叠到一个张量中，是否有任何解决方案？ - Omar Abdelaziz

不确定这是否有用，但这是来自PyTorch论坛的解决方案：https://discuss.pytorch.org/t/nested-list-of-variable-length-to-a-tensor/38699/21 - Charlie Parker

0

对于嵌套的张量列表，假设它们会通过np.array(x)（相同的形状等），以下是一个高效的解决方案：

def nested_list_to_tensor(x, out_shape, out, top_level=True):
    if isinstance(x[0], list):
        for i in range(out_shape[0]):
            nested_list_to_tensor(x[i], out_shape[1:], out, top_level=False)
    else:
        out.extend(x)
    if top_level:
        return torch.stack(out).reshape(*out_shape)

x = [[[torch.randn(2, 8) for _ in range(4)] for _ in range(3)] for _ in range(9)]
out = nested_list_to_tensor(x, (9, 3, 4, 2, 8), [])
print(out.shape)

torch.Size([9, 3, 4, 2, 8])

注意：根据用途，可能需要在`out`上调用`.contiguous()`。

如果`out_shape`未知

def get_out_shape(x):
    out_shape = []
    while True:
        if isinstance(x, list):
            out_shape.append(len(x))
            x = x[0]
        else:
            out_shape.extend(list(x.shape))
            break
    return out_shape

out = nested_list_to_tensor(x, get_out_shape(x), [])
print(out.shape)

替代方案

性能较差（尤其在GPU上），但可能更灵活：

def nested_list_to_tensor(x, out):
    if isinstance(x[0], list):
        for i in range(len(out)):
            out[i] = nested_list_to_tensor(x[i], out[i])
    else:
        out = torch.stack(x)
    return out

xt = x
while isinstance(xt, list):
    xt = xt[0]
out = torch.zeros(get_out_shape(x), dtype=xt.dtype, layout=xt.layout, 
                  device=xt.device)

out = nested_list_to_tensor(x, out)
print(out.shape)

这个不应该需要 .contiguous()。

为什么会有性能差异？

预分配通常是好的，但是重复赋值是昂贵的，尤其是在 GPU 上。向列表追加元素要快得多，而一个单一的 stack 将所有的连接操作合并为一个大操作。

- OverLordGoldDragon

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Separius · Accepted Answer

您可能正在寻找cat。

然而，张量无法保存可变长度的数据。

例如，这里有一个包含两个张量的列表，它们在最后一个维度(dim=2)上具有不同的大小，我们想要创建一个更大的张量，其中包含它们的所有数据，因此我们可以使用 cat 并创建一个更大的张量来包含它们的数据。

还要注意，截至现在，您不能在 CPU 上使用半精度张量进行 cat，因此您应该将它们转换为浮点数，进行连接，然后再转换回半精度。

import torch

a = torch.arange(8).reshape(2, 2, 2)
b = torch.arange(12).reshape(2, 2, 3)
my_list = [a, b]
my_tensor = torch.cat([a, b], dim=2)
print(my_tensor.shape) #torch.Size([2, 2, 5])

你没有解释你的目标，因此另一个选项是像这样使用pad_sequence：

from torch.nn.utils.rnn import pad_sequence
a = torch.ones(25, 300)
b = torch.ones(22, 300)
c = torch.ones(15, 300)
pad_sequence([a, b, c]).size() #torch.Size([25, 3, 300])

编辑：在这种特殊情况下，您可以使用torch.cat([x.float() for x in sequence], dim=1).half()