Pandas系列 - 记录数字变化

Question

Pandas系列 - 记录数字变化

3

我有一个面板dataframe，记录了个人位置数据10年的许多观测值。它看起来像这样：

     personid     location_1991   location_1992  location_1993  location_1994 
0    111          1               1             2              2 
1    233          3               3             4              999  
2    332          1               3             3               3 
3    454          2               2             2               2             
4    567          2               1             1               1

我希望通过为每种转换创建一个变量来跟踪每个人的转换。我想要一列来标记每个人何时转移到每种位置类型。理想情况下，它应该是这样的：

     personid     transition_to_1    transition_to_2   transition_to_3   transition_to_4       
0    111          0                  1                 0                 0 
1    233          0                  0                 0                 1  
2    332          0                  0                 1                 0 
3    454          0                  0                 0                 0             
4    567          1                  0                 0                 0

到目前为止，我尝试迭代每一行，然后循环遍历每个元素以检查它是否与上一个元素相同。这似乎很费时间。有没有更好的方法来跟踪数据框中每一行的值的变化？

- svenkatesh

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- FooBar · Accepted Answer

我首先将这些列进行了堆叠，然后进行了透视。

df = pd.DataFrame(pd.read_clipboard())
df2 = pd.DataFrame(df.set_index('personid').stack(), columns=['location'])
df2.reset_index(inplace=True)
df2.reset_index(inplace=True)
df3 = df2.pivot(index='index', columns='location', values='personid')
df3 = df3.fillna(0)

到目前为止，它看起来像这样：

location  1    2    3    4    999
index                            
0         111    0    0    0    0
1         111    0    0    0    0
2           0  111    0    0    0
3           0  111    0    0    0
4           0    0  233    0    0
5           0    0  233    0    0
6           0    0    0  233    0
7           0    0    0    0  233
8         332    0    0    0    0
9           0    0  332    0    0
10          0    0  332    0    0
11          0    0  332    0    0
12          0  454    0    0    0
13          0  454    0    0    0
14          0  454    0    0    0
15          0  454    0    0    0
16          0  567    0    0    0
17        567    0    0    0    0
18        567    0    0    0    0
19        567    0    0    0    0

df3['personid'] = df3.max(axis=0, skipna=True)
df3 = df3.set_index('personid', drop=True)
df3[df3 > 0] = 1

接下来就开始了：

location  1    2    3    4    999
personid                         
111         1    0    0    0    0
567         1    0    0    0    0
567         0    1    0    0    0
332         0    1    0    0    0
233         0    0    1    0    0
233         0    0    1    0    0
233         0    0    0    1    0
233         0    0    0    0    1
332         1    0    0    0    0
332         0    0    1    0    0
332         0    0    1    0    0
332         0    0    1    0    0
454         0    1    0    0    0
454         0    1    0    0    0
454         0    1    0    0    0
454         0    1    0    0    0
567         0    1    0    0    0
567         1    0    0    0    0
567         1    0    0    0    0
567         1    0    0    0    0