Pandas:按组ID逐行填充NaN值

3
我想按组ID逐行填充NaN值,已尝试使用fillNA函数,并使用向前和向后填充选项,但fillNA函数未逐行填充数据框。此外,在填充NaN值之前,我希望确保公司匹配。在这种情况下,使用向前填充将导致“Pear”公司的数据被“Banana”公司的数据所替代。

appended = appended.sort_values(by=['Company','Intro'],na_position='last')
appended = appended.reset_index(drop=True)

for i in appended.index:

    if i==0:
        pass
    else:
        if appended.at[i,'Company']==appended.at[i-1,'Company']:
            appended.fillna(method='ffill',inplace=True)
        else:
            pass

追加的数据帧

Company    Intro          Categories         Headquarters  Founded Date   Funding Stage

 Apple       xyz       Healthcare, Big Data     New York       2018           Series A

 Apple       NaN              NaN                NaN           NaN             NaN

 Apple       NaN              NaN                NaN           NaN             NaN

 Banana     Lier           Government           Europe        2010           Series B

 Pear        NaN              NaN                NaN           NaN             NaN

我期望实现以下预期结果:

Expected Result

Company    Intro          Categories         Headquarters  Founded Date   Funding Stage

 Apple       xyz       Healthcare, Big Data     New York       2018           Series A

 Apple       xyz       Healthcare, Big Data     New York       2018           Series A

 Apple       xyz       Healthcare, Big Data     New York       2018           Series A

 Banana      Lier        Government             Europe        2010           Series B

 Pear         NaN              NaN                NaN           NaN             NaN

NaaN只是NaN的打字错误还是有什么不同之处吗?oO - meissner_
1
@meissner_ 抱歉,那是NaN的打字错误。 - terencetch
2个回答

3
使用groupbyffill函数。
df.groupby(['Company']).ffill()

  Company Intro            Categories Headquarters  Founded Date Funding Stage
0   Apple   xyz  Healthcare, Big Data     New York        2018.0      Series A
1   Apple   xyz  Healthcare, Big Data     New York        2018.0      Series A
2   Apple   xyz  Healthcare, Big Data     New York        2018.0      Series A
3  Banana  Lier            Government       Europe        2010.0      Series B
4    Pear   NaN                   NaN          NaN           NaN           NaN

这对我不起作用。此语句的返回值不包含完整数据框。 - Abhilash
附注:要基于此创建/更改单个列,您可以简单地设置 df['Founded Date'] = df.groupby(['Company']).ffill()['Founded Date'] - elPastor

0
import pandas as pd
from io import StringIO

# sample data
df = pd.read_fwf(StringIO("""
Company    Intro                 Categories   Headquarters  Founded_Date   Funding_Stage
 Apple       xyz       Healthcare, Big Data     New York       2018           Series A
 Apple       NaN              NaN                NaN           NaN             NaN
 Apple       NaN              NaN                NaN           NaN             NaN
 Banana     Lier           Government           Europe        2010           Series B
 Pear        NaN              NaN                NaN           NaN             NaN"""), header=1)


# Create the summary level - assumes repeat data comes first
df_summary = df.groupby("Company").head(1)

# Join the result
df_result = df[['Company']].merge(df_summary, on="Company")

#  Company Intro            Categories Headquarters  Founded_Date Funding_Stage
#0   Apple   xyz  Healthcare, Big Data     New York        2018.0      Series A
#1   Apple   xyz  Healthcare, Big Data     New York        2018.0      Series A
#2   Apple   xyz  Healthcare, Big Data     New York        2018.0      Series A
#3  Banana  Lier            Government       Europe        2010.0      Series B
#4    Pear   NaN                   NaN          NaN           NaN           NaN

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接