Pandas将特定列的NaN值替换为列表

Question

Pandas将特定列的NaN值替换为列表

4

我有一个包含两行的数据框

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

有8个空值，它看起来像这样：

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

我想要的：

将序列列（seq_col、seq_col_2、seq_col_3等）中的 NaN值替换为我自己提供的列表。

注意：。

在这个数据中，只有2个序列列，但可能会有更多。
不能替换已经存在于列中的先前列表， 只能替换NaN

我找不到用字典中提供的用户提供的列表值替换NaN的解决方案。

伪代码:

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

我尝试了下面的代码，它可以处理你传递的第一列，但是对于第二列却无法正常工作。这很奇怪。

 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

上述方法确实可以解决问题，但是如果在seq_col_2上再次尝试，则会得到奇怪的结果。 期望输出: 给定参数输入：

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]

- annonymous_guy

1

你能展示一下期望的输出吗？另外，你的代码得到了什么结果？ - harvpan

1

很好，终于有人发布了至少一个可执行的代码示例！不幸的是我不能帮助你，但我会点赞你的问题。但正如Harv所提到的：提供一个预期的输出将会非常有帮助。 - JE_Muc

您是否基本上想将这两个列表中的10个值转换为这些列中每行的10个单独值？如果是这样，那么对于没有列表的列，您想要做什么？ - ALollz

链接可能有所帮助 https://dev59.com/RFYM5IYBdhLWcg3w7jc1#48197300 - BENY

这是您要找的吗？https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.fillna.html - xyzjayne

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jpp · Accepted Answer

使用输入数组，您可以使用 pd.DataFrame.loc 与 pd.Series.isnull ：

import pandas as pd, numpy as np

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))

L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])

df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2

print(df[['seq_col', 'seq_col_2']])

           seq_col        seq_col_2
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0                0               10
0                1               11
0                2               12
0                3               13
0                4               14
0                5               15
0                6               16
0                7               17

如果您需要在系列中使用列表值，则可以在赋值之前显式地将其转换为系列：

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))