如果一个列的值不为NULL，Python pandas apply函数将如何应用？

Question

如果一个列的值不为NULL，Python pandas apply函数将如何应用？

49

我有一个数据框（Python 2.7，pandas 0.15.0）：

df=
       A    B               C
0    NaN   11             NaN
1    two  NaN  ['foo', 'bar']
2  three   33             NaN

我想为不包含特定列中NULL值的行应用一个简单的函数。我的函数应该是尽可能简单的：

def my_func(row):
    print row

我的应用程序代码如下：

df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)

它的表现非常完美。如果我想要检查列“B”是否存在NULL值，pd.notnull()同样完美。但是如果我选择包含列表对象的列“C”：

df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)

然后我收到了以下错误信息：ValueError:('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()',u'occurred at index 1')

有人知道为什么pd.notnull()只适用于整数和字符串列，而不适用于“列表列”吗？

还有比这更好的方法来检查'C'列中的NULL值吗：

df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)

谢谢！

- ragesz

7个回答

21

我的列包含了列表和NaN。所以，接下来的方法对我很有效。

df.C.map(lambda x: my_func(x) if type(x) == list else x)

- coffman21

6

另一种方法是仅使用row.notnull().all()（不使用numpy），以下是示例：

df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)

这里是一个完整的关于df的示例：

>>> d = {'A': [None, 2, 3, 4], 'B': [11, None, 33, 4], 'C': [None, ['a','b'], None, 4]}
>>> df = pd.DataFrame(d)
>>> df
     A     B       C
0  NaN  11.0    None
1  2.0   NaN  [a, b]
2  3.0  33.0    None
3  4.0   4.0       4
>>> def func1(r):
...     return 'No'
...
>>> def func2(r):
...     return 'Yes'
...
>>> df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
0    Yes
1    Yes
2    Yes
3     No

并且一张更加友好的截图 :-)

- Aziz Alto

3

Try...

df['a'] = df['a'].apply(lambda x: x.replace(',','\,') if x != None else x)

这个例子只是在值不为None时，给逗号添加了一个转义字符。

- Andrew Monger

1

这段代码会返回一个错误，因为 NaN != None。 - Echo9k

1

添加以下的IF条件，当条件为TRUE时返回NONE。

def funtion_name(input):
    if (pd.isnull(input)==False)
        return np.NAN
     //Rest funtion code//

- Rajashekar U

1

以下内容适用于不同的数据类型。

df=

   col_1  col_2
0    1     NaN
1  three  seven
2   NaN    NaN
3  [4,5]    2

可以使用map函数来完成，例如替换col_1中的非空值：

def my_func(n):
    return 'func'

df.loc[df['col_1'].notnull(), 'col_1'] = df['col_1'].map(my_func)
df =

    col_1  col_2
0    func   NaN
1    func   seven
2    NaN    NaN
3    func   func

- asif

0

如果您有一个字符串并希望应用类似于以下示例的函数： '2021年9月25日'

df['Year'] = df['date_added'].apply(lambda x : re.split(' |,', x)[-1] if isinstance(x, str) else np.nan)
df['Month'] = df['date_added'].apply(lambda x : re.split(' |,', x)[0] if isinstance(x, str) else np.nan )

你可以采用这种方式，并使用isinstance(x, str)来避免NaN或任何其他类型，你也可以像这样使用type()。

df['Year'] = df['date_added'].apply(lambda x : re.split(' |,', x)[-1] if type(x)==str else np.nan )

- Abdelrahman Abozied

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Korem · Accepted Answer

问题在于pd.notnull(['foo', 'bar'])是逐个运算并返回array([ True, True], dtype=bool)。你的if条件语句试图将其转换为布尔值，这就导致了异常。

为了解决问题，你可以简单地使用np.all来包装isnull语句：

df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)

现在您将看到 np.all(pd.notnull(['foo', 'bar'])) 的确是 True。