如何进行自定义的 pandas dataframe 合并？

Question

如何进行自定义的 pandas dataframe 合并？

3

假设我有：

data = [['tom', 10, 20], ['nick', 15, 30], ['juli', 14, 40]] 
df = pd.DataFrame(data, columns = ['Name', 'Low-Age', 'High-Age']) 
print(df)
None
   Name  Low-Age  High-Age
0   tom       10        20
1  nick       15        30
2  juli       14        40

接下来我有另一张表：

data = [[10, 'school'], [30, 'college']] 
edu = pd.DataFrame(data, columns = ['Age', 'Education']) 
print(edu)
None
   Age Education
0   10    school
1   30   college

我该如何得到一个表格，其中我将edu['Age']与df["Low-Age"]或df["High-Age"]匹配。如果它们匹配，我想将edu["Education"]附加到df上。（假设低年龄或高年龄可以匹配，但不是同时）

因此，我希望输出结果如下：

  Name  Low-Age  High-Age   Education
0   tom       10        20    school
1  nick       15        30    college
2  juli       14        40     NaN

- Denis

4个回答

3

使用 `map` 结合 `combine_first`

mapper = edu.set_index('Age')['Education']
df['Education'] = df['Low-Age'].map(mapper).combine_first(df['High-Age'].map(mapper))

    Name    Low-Age High-Age    Education
0   tom     10      20          school
1   nick    15      30          college
2   juli    14      40          NaN

- Vaishali

可能值得指出的是：如果df“edu”中的“Age”值不唯一，则会弹出此错误：pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects。 - Epion

2

使用Series.map和pd.concat：

edu2=edu.set_index('Age')
s=pd.concat([df['Low-Age'].map(edu2['Education']),df['High-Age'].map(edu2['Education'])])
df['Education']=s[s.notna()].reindex(index=df.index)
print(df)

   Name  Low-Age  High-Age Education
0   tom       10        20    school
1  nick       15        30   college
2  juli       14        40       NaN

此外，您还可以使用 pd.concat 进行合并求和：

edu2=edu.set_index('Age')
df['Education']= ( df['High-Age'].map(edu2['Education']).fillna('')+
                  df['Low-Age'].map(edu2['Education']).fillna('') )

或者

edu2=edu.set_index('Age')
df['Education']= df[['High-Age','Low-Age']].apply(lambda x: x.map(edu2['Education']).fillna('')).sum(axis=1)

print(df)

   Name  Low-Age  High-Age Education
0   tom       10        20    school
1  nick       15        30   college
2  juli       14        40

- ansev

1

当处理大型数据集时，这种方法可以更快地获得结果。使用apply()函数实现。

low_age_list = df['Low-Age'].tolist()
high_age_list = df['High-Age'].tolist()

def match(row):
   print(row[1])
      if row['Age'] in low_age_list or row['Age'] in high_age_list:
         return row[1]

df['Education'] = edu.apply(match,axis=1)
print(df)

- Sai Kiran

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

`堆栈` -> `映射表`

edu_dict = dict(zip(edu.Age, edu.Education))

Education = df[['Low-Age', 'High-Age']].stack().map(edu_dict).groupby(level=0).first()
df.assign(Education=Education)

   Name  Low-Age  High-Age Education
0   tom       10        20    school
1  nick       15        30   college
2  juli       14        40       NaN

如何进行自定义的 pandas dataframe 合并？

堆栈 -> 映射表

`堆栈` -> `映射表`