使用滚动中位数在Pandas数据框中过滤异常值

11

我正在尝试从具有日期的GPS高程偏移散点图中过滤掉一些异常值。

我试图使用df.rolling计算每个窗口的中位数和标准差,然后删除大于3个标准差的点。

但是,我无法想出一种方法来循环遍历列并比较滚动计算的中位数值。

这是我到目前为止的代码:

import pandas as pd
import numpy as np

def median_filter(df, window):
    cnt = 0
    median = df['b'].rolling(window).median()
    std = df['b'].rolling(window).std()
    for row in df.b:
      #compare each value to its median




df = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns = ['a', 'b'])

median_filter(df, 10)

我该如何遍历每个点并进行比较,然后将其删除?

3个回答

19

只需对数据框进行筛选

df['median']= df['b'].rolling(window).median()
df['std'] = df['b'].rolling(window).std()

#filter setup
df = df[(df.b <= df['median']+3*df['std']) & (df.b >= df['median']-3*df['std'])]

0

可能有更加优雅的方法来实现这个功能 - 这种方式有点像是一种hack,依赖于手动映射原始数据框的索引到每个滚动窗口上。我选择了大小为6的窗口。前六行记录与第一个窗口相关联;第七行是第二个窗口,以此类推。

n = 100
df = pd.DataFrame(np.random.randint(0,n,size=(n,2)), columns = ['a','b'])

## set window size
window=6
std = 1  # I set it at just 1; with real data and larger windows, can be larger

## create df with rolling stats, upper and lower bounds
bounds = pd.DataFrame({'median':df['b'].rolling(window).median(),
'std':df['b'].rolling(window).std()})

bounds['upper']=bounds['median']+bounds['std']*std
bounds['lower']=bounds['median']-bounds['std']*std

## here, we set an identifier for each window which maps to the original df
## the first six rows are the first window; then each additional row is a new window
bounds['window_id']=np.append(np.zeros(window),np.arange(1,n-window+1))

## then we can assign the original 'b' value back to the bounds df
bounds['b']=df['b']

## and finally, keep only rows where b falls within the desired bounds
bounds.loc[bounds.eval("lower<b<upper")]

0

这是我对创建中值滤波器的看法:

def median_filter(num_std=3):
    def _median_filter(x):
        _median = np.median(x)
        _std = np.std(x)
        s = x[-1]
        return s if s >= _median - num_std * _std and s <= _median + num_std * _std else np.nan
    return _median_filter

df.y.rolling(window).apply(median_filter(num_std=3), raw=True)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接