Pandas - 非日期时间的重新采样

Question

Pandas - 非日期时间的重新采样

4

我有一个数据框看起来像这样:

n    Date        Area    Rank

12  2007-03-02  Other   4.276250
24  2007-03-02  Other   4.512632
3   2007-03-02  Other   3.513571
36  2007-03-02  Other   4.514000
48  2007-03-02  Other   4.55000

我想对 n 区间内的值进行重新采样，最终在获取这些值后插值 rank 字段。如果 n 是日期时间或类似对象，则可以直接重新采样。但如果是浮点数或整数，该怎么办呢？

输出应该类似于以下内容（rank 的数字仅为示例）

n    Date        Area    Rank

3   2007-03-02  Other   3.513571
4   2007-03-02  Other   3.513675
5   2007-03-02  Other   3.524819
6   2007-03-02  Other   3.613427
7   2007-03-02  Other   3.685635
....
....

- Solaxun

你的意思是要在整数值区间上插值Rank吗？也就是说，对于n=3，Rank是Rank[n=12]加上n=12和n=24之间值的1/12？ - andrew_reece

是的 - 对于缺乏清晰度，我很抱歉 - 现在已经很晚了，我几乎精疲力尽了:) 我知道我想要做什么，逻辑部分很简单，只是在将那个逻辑转化为pandas / 增加采样时遇到了一些问题。 - Solaxun

2

我想我弄清楚了...在我想要的范围内使用reindex，然后插值NaN。 - Solaxun

日期有变化吗？ - Andy Hayden

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- andrew_reece · Accepted Answer

df = (df.set_index('n')
        .reindex(range(df.n.min(), df.n.max()))
        .interpolate()
        .reset_index())
df[['Date','Area']] = df[['Date','Area']].ffill()

输出：

     n        Date   Area      Rank
0    3  2007-03-02  Other  3.513571
1    4  2007-03-02  Other  3.598313
2    5  2007-03-02  Other  3.683055
3    6  2007-03-02  Other  3.767797
4    7  2007-03-02  Other  3.852539
5    8  2007-03-02  Other  3.937282
6    9  2007-03-02  Other  4.022024
7   10  2007-03-02  Other  4.106766
8   11  2007-03-02  Other  4.191508
9   12  2007-03-02  Other  4.276250
10  13  2007-03-02  Other  4.295948
11  14  2007-03-02  Other  4.315647
                                ...

也许有一种方法可以根据列类型使用不同的插值方法，这样你就不需要为非float列单独使用ffill()了。我尝试过使用apply()，但无法使其正常工作。