扩展Python面板数据

Question

扩展Python面板数据

pythonpandasstata

4

我正在尝试扩展以下数据。我是Stata用户，我的问题可以通过Stata命令“fillin”解决，现在我正在尝试在Python中重写此命令，并且找不到任何有效的命令。

例如：转换这个数据框：（我的数据框比给定的示例更大，示例仅用于说明我想做什么）

将其转换为：

感谢您的使用，对于我的英语表示抱歉。

- Lucas Dresl

2个回答

1

从数据框中创建一个新的多重索引，然后重新索引。

years = np.tile(np.arange(df.year.min(), df.year.max()+1,1) ,2)
ids = np.repeat(df.id.unique(), df.year.max()-df.year.min()+1)
arrays = [ids.tolist(), years.tolist()]
new_idx = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['id', 'year'])

df = df.set_index(['id', 'year'])

df.reindex(new_idx).reset_index()

    id  year    X       Y
0   1   2008    10.0    20.0
1   1   2009    NaN     NaN
2   1   2010    15.0    25.0
3   1   2011    NaN     NaN
4   1   2012    NaN     NaN
5   2   2008    NaN     NaN
6   2   2009    NaN     NaN
7   2   2010    NaN     NaN
8   2   2011    2.0     4.0
9   2   2012    3.0     6.0

- Vaishali

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Reese · Accepted Answer

这可以通过使用.loc[]来完成。

from itertools import product
import pandas as pd

df = pd.DataFrame([[1,2008,10,20],[1,2010,15,25],[2,2011,2,4],[2,2012,3,6]],columns=['id','year','X','Y'])
df = df.set_index(['id','year'])

# All combinations of index
#idx = list(product(df.index.levels[0], df.index.levels[1]))
idx = list(product(range(1,3), range(2008,2013)))

df.loc[idx]