Pandas删除面板数据中不完整时间序列的行。

Question

Pandas删除面板数据中不完整时间序列的行。

3

我有一个类似于下面的数据框：

    Country1  Country2    Year
1    Italy       Greece   2000
2    Italy       Greece   2001  
3    Italy       Greece   2002
4    Germany      Italy   2000
5    Germany      Italy   2002
6    Mexico      Canada   2000
7    Mexico      Canada   2001
8    Mexico      Canada   2002
9    US          France   2000
10   US          France   2001
11   Greece      Italy    2000
12   Greece      Italy    2001

我想要保留只包含整个时间序列（2000-2002）的观测值的行。因此，最终结果应该是：

    Country1  Country2    Year
1    Italy       Greece   2000
2    Italy       Greece   2001  
3    Italy       Greece   2002
4    Mexico      Canada   2000
5    Mexico      Canada   2001
6    Mexico      Canada   2002

- user14237226

2个回答

1

一种方法是将数据透视，删除空行并重新调整形状；这仅在Country*和Year的组合是唯一的情况下有效（在样本数据中是唯一的）：

(df.assign(dummy = 1)
   .pivot(('Country1', 'Country2'), 'Year')
   .dropna()
   .stack()
   .drop(columns='dummy')
   .reset_index()
)
 
  Country1 Country2  Year
0    Italy   Greece  2000
1    Italy   Greece  2001
2    Italy   Greece  2002
3   Mexico   Canada  2000
4   Mexico   Canada  2001
5   Mexico   Canada  2002

- sammywemmy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

一个想法是通过crosstab重塑数据，并使用DataFrame.ne和DataFrame.all测试行是否有非0值，将索引转换为DataFrame通过MultiIndex.to_frame，最后在DataFrame.merge中获取筛选后的行：

df1 = pd.crosstab([df['Country1'], df['Country2']], df['Year'])

df = df.merge(df1.index[df1.ne(0).all(axis=1)].to_frame(index=False))
print (df)
  Country1 Country2  Year
0    Italy   Greece  2000
1    Italy   Greece  2001
2    Italy   Greece  2002
3   Mexico   Canada  2000
4   Mexico   Canada  2001
5   Mexico   Canada  2002

如果需要测试某些特定的范围，可以在GroupBy.transform中比较set：

r = set(range(2000, 2003))

df = df[df.groupby(['Country1', 'Country2'])['Year'].transform(lambda x: set(x) == r)]
print (df)
  Country1 Country2  Year
1    Italy   Greece  2000
2    Italy   Greece  2001
3    Italy   Greece  2002
6   Mexico   Canada  2000
7   Mexico   Canada  2001
8   Mexico   Canada  2002