如何基于数值是否落入特定区间,合并两个数据框?

3

假设我有以下内容:

edu_data = [['school', 5, 18], ['college', 19, 23], ['grad-school', 24, 28]] 
edu = pd.DataFrame(edu_data, columns = ['Education', 'Low-Age', 'High-Age']) 
print(edu)
     Education  Low-Age  High-Age
0       school        5        18
1      college       19        23
2  grad-school       24        28

然后我有另一个表格列出人们的年龄:

data = [['tom', 5], ['nick', 28], ['juli', 14], ['jack', 30]] 
df = pd.DataFrame(data, columns = ['Name', 'Age']) 
print(df)
   Name  Age
0   tom    5
1  nick   28
2  juli   14
3  jack   30

我该如何获取一个表格,其中我将匹配df ['Age']与edu ["Low-Age"]和edu ["High-Age"]之间的范围。 如果df ['Age']在范围内,则我想要将edu ["Education"]附加到df中。 因此,我希望输出为:
   Name  Age Education
0   tom    5    school
1  nick   28    grad-school
2  juli   14    school
3  jack   30    NaN
2个回答

4

pd.cut:

bins = sorted([edu['Low-Age'][0]] + edu['High-Age'].to_list())

df['Education'] = pd.cut(df.Age, bins=bins,
        include_lowest=True,
        labels=edu.Education)

输出:

   Name  Age    Education
0   tom    5       school
1  nick   28  grad-school
2  juli   14       school
3  jack   30          NaN

pd.cut 是一个不错的解决方案 :) +1 - Andy L.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html - gosuto

2
使用IntervalIndexmap
edu = edu.set_index(pd.IntervalIndex.from_arrays(edu['Low-Age'], edu['High-Age'], closed='both'))

df['Education'] = df.Age.map(edu.Education)

In [488]: df
Out[488]:
   Name  Age    Education
0   tom    5       school
1  nick   28  grad-school
2  juli   14       school
3  jack   30          NaN

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接