Python Pandas基于表头值匹配Vlookup列

Question

Python Pandas基于表头值匹配Vlookup列

9

我有以下数据框 df：

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing
ABC            5      6     10     2015
BCD            6      7     3      2016        
DEF            10     4     5      2017
GHI            8      7     10     2016

我希望查找客户在加入邮件列表的那一年的价值，并将其保存在新列中。

输出结果如下:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing | Purchases_1st_year
ABC            5      6     10     2015                       5
BCD            6      7     3      2016                       7       
DEF            10     4     5      2017                       5
GHI            8      9     10     2016                       9

我已经找到了一些关于在Python中匹配vlookup的解决方案，但没有一种是使用其他列的标题。

- jeangelj

1

查找的列为2015年、2016年和2017年。 - jeangelj

3个回答

3

您可以对每一行应用“apply”。

df.apply(lambda x: x[x['Year_joined_mailing']],axis=1)

- galaxyan

2

假设列标题和 Year_joined_mailing 是相同的数据类型，并且所有 Year_joined_mailing 值都是有效的列，我会像这样去做。如果数据类型不同，您可以在适当的位置添加 str() 或 int() 进行转换。

df['Purchases_1st_year'] = [df[df['Year_joined_mailing'][i]][i] for i in df.index]

我们正在对数据框中的索引进行迭代，以获取该索引的“Year_joined_mailing”字段，然后使用它来获取我们想要的列，并再次从该列中选择该索引，将所有内容推送到列表中并将其分配给新列“Year_joined_mailing”。如果您的“Year_joined_mailing”列不总是有效的列名，则尝试：

from numpy import nan
new_col = []
for i in df.index:
    try:
        new_col.append(df[df['Year_joined_mailing'][i]][i])
    except IndexError:
        new_col.append(nan) #or whatever null value you want here)
df['Purchases_1st_year'] = new_col

这段较长的代码片段实现了同样的功能，但如果'Year_joined_mailing'不在df.columns中，它不会出现错误。

- Jeremy Barnes

非常感谢 - 这个也起作用了; 所以我点了赞。 - jeangelj

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

废弃通知: lookup在v1.2.0版本中已被弃用

请使用pd.DataFrame.lookup。请注意，我假设Customer_ID是索引。

df.lookup(df.index, df.Year_joined_mailing)

array([5, 7, 5, 7])

df.assign(
    Purchases_1st_year=df.lookup(df.index, df.Year_joined_mailing)
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7

但是，比较列名中可能出现的字符串和第一年列中的整数时，必须小心...

使用核选项确保类型比较得到尊重。

df.assign(
    Purchases_1st_year=df.rename(columns=str).lookup(
        df.index, df.Year_joined_mailing.astype(str)
    )
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7