Pandas将特定列的NaN值替换为列表

4

我有一个包含两行的数据框

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

有8个空值,它看起来像这样:

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

我想要的:

将序列列(seq_col、seq_col_2、seq_col_3等)中的 NaN值替换为我自己提供的列表。

注意:

  • 在这个数据中,只有2个序列列,但可能会有更多。
  • 不能替换已经存在于列中的先前列表, 只能替换NaN

我找不到用字典中提供的用户提供的 列表值替换NaN的解决方案。

伪代码:

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

我尝试了下面的代码,它可以处理你传递的第一列,但是对于第二列却无法正常工作。这很奇怪。
 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

上述方法确实可以解决问题,但是如果在seq_col_2上再次尝试,则会得到奇怪的结果。 期望输出: 给定参数输入:

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]

1
你能展示一下期望的输出吗?另外,你的代码得到了什么结果? - harvpan
1
很好,终于有人发布了至少一个可执行的代码示例!不幸的是我不能帮助你,但我会点赞你的问题。但正如Harv所提到的:提供一个预期的输出将会非常有帮助。 - JE_Muc
您是否基本上想将这两个列表中的10个值转换为这些列中每行的10个单独值?如果是这样,那么对于没有列表的列,您想要做什么? - ALollz
链接可能有所帮助 https://dev59.com/RFYM5IYBdhLWcg3w7jc1#48197300 - BENY
这是您要找的吗?https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.fillna.html - xyzjayne
显示剩余2条评论
1个回答

3

使用输入数组,您可以使用 pd.DataFrame.locpd.Series.isnull

import pandas as pd, numpy as np

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))

L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])

df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2

print(df[['seq_col', 'seq_col_2']])

           seq_col        seq_col_2
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0                0               10
0                1               11
0                2               12
0                3               13
0                4               14
0                5               15
0                6               16
0                7               17

如果您需要在系列中使用列表值,则可以在赋值之前显式地将其转换为系列:
df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接