使用where()函数将Pandas DataFrame的列与阈值列进行比较

3

我需要将几列中绝对值小于阈值列对应值的空值置为null

        import pandas as pd
        import numpy as np
        df=pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
          'key2': [2000, 2001, 2002, 2001, 2002], 
          'data1': np.random.randn(5),
          'data2': np.random.randn(5),
           'threshold': [0.5,0.4,0.6,0.1,0.2]}).set_index(['key1','key2'])

                   data1    data2   threshold
key1    key2            
Ohio    2000    0.201240    0.083833    0.5
        2001    -1.993489   -1.081208   0.4
        2002    0.759038    -1.688769   0.6
Nevada  2001    -0.543916   1.412679    0.1
        2002    -1.545781   0.181224    0.2

这给我一个错误"无法加入没有指定级别和没有重叠名称的级别"。 df.where(df.abs()>df['threshold'])

这个可以工作,但显然是针对标量的。df.where(df.abs()>0.5)

                       data1           data2    threshold
        key1    key2            
        Ohio    2000    NaN              NaN    NaN
                2001    -1.993489   -1.081208   NaN
                2002    0.759038    -1.688769   NaN
      Nevada    2001    -0.543916   1.412679    NaN
                2002    -1.545781        NaN    NaN

顺便说一句,这似乎给出了一个不错的结果 - 仍然想知道如何使用where()方法实现

      df.apply(lambda x:x.where(x.abs()>x['threshold']),axis=1)
1个回答

3

这里有一个略微不同的选项,使用 DataFrame.gt (大于) 方法。

df[df.abs().gt(df['threshold'], axis='rows')]
Out[16]: 
# Output might not look the same because of different random numbers,
# use np.random.seed() for reproducible random number gen
Out[13]: 
                data1     data2  threshold
key1   key2                               
Ohio   2000       NaN       NaN        NaN
       2001  1.954543  1.372174        NaN
       2002       NaN       NaN        NaN
Nevada 2001  0.275814  0.854617        NaN
       2002       NaN  0.204993        NaN

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接