在pandas中将变量传递给apply()函数

Question

在pandas中将变量传递给apply()函数

3

我在将函数应用于数据框时遇到了语法问题。我正在尝试通过使用分隔符连接两个其他列中的字符串来创建数据框中的新列。但是我遇到了错误。

TypeError: ("apply_join() missing 1 required positional argument: 'sep'", 'occurred at index cases')

如果我在apply_join()函数调用中添加sep，也会失败：

  File "unite.py", line 37, in unite
    tibble_extra = df[cols].apply(apply_join, sep)
NameError: name 'sep' is not defined

import pandas as pd
from io import StringIO

tibble3_csv = """country,year,cases,population
Afghanistan,1999,745,19987071
Afghanistan,2000,2666,20595360
Brazil,1999,37737,172006362
Brazil,2000,80488,174504898
China,1999,212258,1272915272
China,2000,213766,1280428583"""
with StringIO(tibble3_csv) as fp:
    tibble3 = pd.read_csv(fp)
print(tibble3)

def str_join_elements(x, sep=""):
    assert type(sep) is str
    return sep.join((str(xi) for xi in x))

def unite(df, cols, new_var, combine=str_join_elements):
    def apply_join(x, sep):
        joinstr = str_join(x, sep)
        return pd.Series({new_var[i]:s for i, s in enumerate(joinstr)})
  
    fixed_vars = df.columns.difference(cols)
    tibble = df[fixed_vars].copy()
    tibble_extra = df[cols].apply(apply_join)
  
    return pd.concat([tibble, tibble_extra], axis=1) 
table3_again = unite(tibble3, ['cases', 'population'], 'rate', combine=lambda x: str_join_elements(x, "/"))
print(table3_again)

- cumin

我已修改了程序，使apply_join成功返回一个应转换为DataFrame的列表。然后，该df应该与另一个df连接起来。我想我需要发布一个不同的问题来解决将列表转换为df并将其连接的失败；这是正确的吗？ - cumin

2个回答

1

您只需要将其添加到apply语句中即可：

tibble_extra = df[cols].apply(apply_join, sep=...)

此外，您应该指定轴。虽然没有指定也可能可以工作，但养成好习惯可以避免错误：

tibble_extra = df[cols].apply(apply_join, sep=..., axis=1(columns) or 0(rows|default))

- Evan Nowak

文件“unite.py”，第32行，在apply_join函数中： joinstr = str_join_elements(x，sep) 文件“unite.py”，第27行，在str_join_elements函数中：返回sep.join((str(xi) for xi in x)) 属性错误：（“'ellipsis'对象没有属性'join'”，发生在索引0处） - cumin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bharath M Shetty · Accepted Answer

5

当您有多个参数时，请使用lambda，例如：

df[cols].apply(lambda x: apply_join(x,sep),axis=1)

您可以通过args参数传递参数，例如：

 df[cols].apply(apply_join,args=[sep],axis=1)

- Bharath M Shetty

tibble_extra = df[cols].apply(apply_join, args=[sep], axis=1) NameError: 名称 'sep' 未定义 - cumin

请指定您的分隔符，例如 sep = ',' 或 sep='_'。 - Bharath M Shetty

当方法被调用时，我事先不知道它是什么；在这个例子中，它被称为 unite(tibble3, ['cases', 'population'], 'rate', combine=lambda x: str_join_elements(x, "/"))。 print(table3_again) - cumin

你可以传递 ',' 或 '/' 代替 sep。 - Bharath M Shetty

通过将合并函数传递给apply_join（而不是sep），我可以获得一个需要转换为df然后连接到其他df的列表。现在问题在于执行这最后两个步骤。 - cumin

它非常好。 - Alireza75