Pandas - 将列值合并为一个列表放入新列中

Question

Pandas - 将列值合并为一个列表放入新列中

71

I have a Python Pandas dataframe df:

d = [['hello', 1, 'GOOD', 'long.kw'],
     [1.2, 'chipotle', np.nan, 'bingo'],
     ['various', np.nan, 3000, 123.456]]
t = pd.DataFrame(data=d, columns=['A','B','C','D'])

它看起来像这样：

print(t)
         A         B     C        D
0    hello         1  GOOD  long.kw
1      1.2  chipotle   NaN    bingo
2  various       NaN  3000  123.456

我正在尝试创建一个新的列，这是由 A、B、C 和 D 的值组成的 列表。因此它应该是这样的：

t['combined']                                             

Out[125]: 
0        [hello, 1, GOOD, long.kw]
1        [1.2, chipotle, nan, bingo]
2        [various, nan, 3000, 123.456]
Name: combined, dtype: object

我正在尝试这段代码：

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['C'],
                                        x['D']]),axis=1)

这会返回以下错误：

ValueError: Wrong number of items passed 4, placement implies 1

我感到困惑的是，如果我删除列表中想要添加的其中一列（或者在数据框中添加另一列，但不将其添加到列表中），我的代码就可以工作。

例如，运行以下代码：

t['combined'] = t.apply(lambda x: list([x['A'],
                                        x['B'],
                                        x['D']]),axis=1)

返回这个内容非常完美，如果我只想要三列的话。

print(t)
         A         B     C        D                 combined
0    hello         1  GOOD  long.kw      [hello, 1, long.kw]
1      1.2  chipotle   NaN    bingo   [1.2, chipotle, bingo]
2  various       NaN  3000  123.456  [various, nan, 123.456]

我完全不明白为什么请求在数据框中的所有列合并成“combined”列表会导致错误，但是选择除一个列外的所有列来创建“combined”列表却能正常工作。

- clg4

3个回答

4

另一种方法是在底层的numpy数组上调用list()函数。

t['combined_arr'] = list(t.values)

应该注意的是，使用.tolist()与此产生的列略有不同。从下面可以看出，tolist()创建了一个嵌套列表，而list()创建了一个数组列表。

t['combined_list'] = t[['A', 'B']].values.tolist()
t['combined_arr'] = list(t[['A', 'B']].values)

t.iloc[0, 4]  # ['hello', 1]
t.iloc[0, 5]  # array(['hello', 1], dtype=object)

根据使用情况，保留ndarray类型有时是很有用的。

如果你想要合并没有NaN值的列，那么最快的方法是在循环遍历行时检查NaN值。由于NaN!=NaN，最快的检查方法是检查一个值是否等于它自己。

t['combined'] = [[e for e in row if e==e] for row in t.values.tolist()]


         A     B     C        D                     combined
0    hello   1.0  GOOD  long.kw  [hello, 1.0, GOOD, long.kw]
1      1.2  10.0   NaN    bingo           [1.2, 10.0, bingo]  <-- no NaN
2  various   NaN  3000  123.456     [various, 3000, 123.456]  <-- no NaN

更完整的检查方法是使用内置的math模块中的isnan函数。

import math
t['combined'] = [[e for e in row if not (isinstance(e, float) and math.isnan(e))] for row in t.values.tolist()]

要将非NaN值的特定列合并在一起，首先选择这些列：

cols = ['A', 'B']
t['combined'] = [[e for e in row if e==e] for row in t[cols].values.tolist()]

- cottontail

1

这里是使用NaN的一种方法

t.assign(combined = pd.Series(d))

输出：

         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000  123.456  [various, nan, 3000, 123.456]

以下是一种无需使用 NaN 的方法

t.assign(combined = t.stack().groupby(level=0).agg(list))

输出：

         A         B     C        D                   combined
0    hello         1  GOOD  long.kw  [hello, 1, GOOD, long.kw]
1      1.2  chipotle   NaN    bingo     [1.2, chipotle, bingo]
2  various       NaN  3000  123.456   [various, 3000, 123.456]

- rhug123

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Steven G · Accepted Answer

尝试这个：

试一试：

t['combined']= t.values.tolist()

t
Out[50]: 
         A         B     C        D                       combined
0    hello         1  GOOD  long.kw      [hello, 1, GOOD, long.kw]
1     1.20  chipotle   NaN    bingo    [1.2, chipotle, nan, bingo]
2  various       NaN  3000   123.46  [various, nan, 3000, 123.456]