Python Pandas：通过另一个DataFrame的滚动索引获取一个DataFrame的滚动值

Question

Python Pandas：通过另一个DataFrame的滚动索引获取一个DataFrame的滚动值

3

我有两个数据框：一个具有多层次的列，另一个仅具有单层次的列（即第一个数据框的第一级别，或者说第二个数据框是通过组合第一个数据框计算出来的）。

这两个数据框看起来像下面这样:

第一个数据框-df1: 链接：df1 第二个数据框-df2: 链接：df2 df1和df2之间的关系是:

df2 = df1.groupby(axis=1, level='sector').mean()

然后，我通过以下方式获取df1的rolling_max的索引：

result1=pd.rolling_apply(df1,window=5,func=lambda x: pd.Series(x).idxmax(),min_periods=4)

让我稍微解释一下result1。例如，在2016/2/23 - 2016/2/29这五天（窗口长度）中，股票sh600870的最高价格发生在2016/2/24，该五天范围内2016/2/24的指数为1。因此，在result1中，2016/2/29时股票sh600870的值为1。

现在，我想通过result1中的指数获取每个股票的行业价格。

以同一只股票为例，股票sh600870属于“家用电器视听器材白色家电”行业。因此，在2016/2/29，我想获取2016/2/24的行业价格，即8.770。

我该怎么做呢？

- April

1

欢迎来到SO。如果您将数据框作为文本插入问题中（您可以编辑它），那将会很有帮助。请按照此链接获取有关如何提问“pandas”问题的有用信息：https://dev59.com/O2Ij5IYBdhLWcg3wk182 - IanS

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- unutbu · Accepted Answer

idxmax（或np.argmax）返回的索引是相对于滚动窗口的。要使索引相对于df1，请添加滚动窗口左侧边缘的索引：

index = pd.rolling_apply(df1, window=5, min_periods=4, func=np.argmax)
shift = pd.rolling_min(np.arange(len(df1)), window=5, min_periods=4)
index = index.add(shift, axis=0)

一旦你有了相对于df1的序数索引，你可以使用它们来在df1或df2中使用.iloc进行索引。

例如，

import numpy as np
import pandas as pd
np.random.seed(2016)
N = 15
columns = pd.MultiIndex.from_product([['foo','bar'], ['A','B']])
columns.names = ['sector', 'stock']
dates = pd.date_range('2016-02-01', periods=N, freq='D')
df1 = pd.DataFrame(np.random.randint(10, size=(N, 4)), columns=columns, index=dates)
df2 = df1.groupby(axis=1, level='sector').mean()

window_size, min_periods = 5, 4
index = pd.rolling_apply(df1, window=window_size, min_periods=min_periods, func=np.argmax)
shift = pd.rolling_min(np.arange(len(df1)), window=window_size, min_periods=min_periods)
# alternative, you could use
# shift = np.pad(np.arange(len(df1)-window_size+1), (window_size-1, 0), mode='constant')
# but this is harder to read/understand, and therefore it maybe more prone to bugs.
index = index.add(shift, axis=0)

result = pd.DataFrame(index=df1.index, columns=df1.columns)
for col in index:
    sector, stock = col
    mask = pd.notnull(index[col])
    idx = index.loc[mask, col].astype(int)
    result.loc[mask, col] = df2[sector].iloc[idx].values

print(result)

产量

sector      foo       bar     
stock         A    B    A    B
2016-02-01  NaN  NaN  NaN  NaN
2016-02-02  NaN  NaN  NaN  NaN
2016-02-03  NaN  NaN  NaN  NaN
2016-02-04  5.5    5    5  7.5
2016-02-05  5.5    5    5  8.5
2016-02-06  5.5  6.5    5  8.5
2016-02-07  5.5  6.5    5  8.5
2016-02-08  6.5  6.5    5  8.5
2016-02-09  6.5  6.5  6.5  8.5
2016-02-10  6.5  6.5  6.5    6
2016-02-11    6  6.5  4.5    6
2016-02-12    6  6.5  4.5    4
2016-02-13    2  6.5  4.5    5
2016-02-14    4  6.5  4.5    5
2016-02-15    4  6.5    4  3.5

在 Pandas 0.18 中，rolling_apply 语法已更改。现在 DataFrame 和 Series 都有一个 rolling 方法，因此现在您需要使用：

index = df1.rolling(window=window_size, min_periods=min_periods).apply(np.argmax)
shift = (pd.Series(np.arange(len(df1)))
         .rolling(window=window_size, min_periods=min_periods).min())
index = index.add(shift.values, axis=0)