使用for循环比较pandas中的行与下一行,如果不同则获取某一列的值

3

I have this pandas Dataframe:

         full path                               name      time 
0    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:20
1    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:25
2    C:\Users\User\Desktop\Test1\Test2\1.txt    1.txt      10:30
3    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:40
4    C:\Users\User\Desktop\Test1\2.txt          2.txt      10:50
5    C:\Users\User\Desktop\Test1\Test2\1.txt    2.txt      10:60

我想比较所有具有相同名称和路径的行,如果路径发生变化,则获取移动的时间和文件夹。 例如,将第一行与第二行进行比较时,“名称”和“全路径”没有任何更改,因此应该通过。然后将第二行与第三行进行比较,名称相同但路径已更改,因此我需要获取第三行的时间,例如“10:30和文件夹(Test2)”,并将其放置在新列中。
期望的输出是:
         full path                               name      time    time_when_path_changed
0    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:20
1    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:25
2    C:\Users\User\Desktop\Test1\Test2\1.txt    1.txt      10:30       10:30 - Test2
3    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:40       10:40 - Test1
4    C:\Users\User\Desktop\Test1\2.txt          2.txt      10:50
5    C:\Users\User\Desktop\Test1\Test2\1.txt    2.txt      10:60       10:60  - Test2

编辑:

是的,@erfan,它完美地解决了我描述的问题,但是当我按以下数据框的顺序写入名称时,它不起作用。我还对期望的输出进行了修改。你也有解决方案吗?

提前致谢。

         full path                               name      time 
0    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:20
1    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:25
2    C:\Users\User\Desktop\Test1\2.txt          2.txt      10:50
2    C:\Users\User\Desktop\Test1\Test2\1.txt    1.txt      10:30
3    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:40
5    C:\Users\User\Desktop\Test1\Test2\2.txt    2.txt      10:60

期望的输出:

         full path                               name      time    moved to "Test2"   moved to "Test1"
0    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:20
1    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:25
2    C:\Users\User\Desktop\Test1\2.txt          2.txt      10:50
3    C:\Users\User\Desktop\Test1\Test2\1.txt    1.txt      10:30       10:30
5    C:\Users\User\Desktop\Test1\1.txt          1.txt      10:40                            10:40
5    C:\Users\User\Desktop\Test1\Test2\2.txt    2.txt      10:60       10:60

1个回答

1
我们可以使用以下逻辑:
  1. 如果完整路径与前一行不相等
  2. 名称与前一行相同(相同的组)
  3. 如果步骤1和2都为真,则获取时间 + 最深路径
m1 = df["full path"].ne(df["full path"].shift(1, fill_value=df["full path"].iloc[0]))
m2 = df["name"].eq(df["name"].shift(fill_value=df["name"].iloc[0]))

folder = df["full path"].str.rsplit("\\", 2).str[-2]

df["time_when_path_changed"] = np.where(m1 & m2, df["time"] + " - " + folder, "")

                                 full path   name   time  \
0        C:\Users\User\Desktop\Test1\1.txt  1.txt  10:20   
1        C:\Users\User\Desktop\Test1\1.txt  1.txt  10:25   
2  C:\Users\User\Desktop\Test1\Test2\1.txt  1.txt  10:30   
3        C:\Users\User\Desktop\Test1\1.txt  1.txt  10:40   
4        C:\Users\User\Desktop\Test1\2.txt  2.txt  10:50   
5  C:\Users\User\Desktop\Test1\Test2\1.txt  2.txt  10:60   

  time_when_path_changed  
0                         
1                         
2          10:30 - Test2  
3          10:40 - Test1  
4                         
5          10:60 - Test2  

嗨@Erfan,我对我的问题进行了修改,请您检查一下好吗? - user14073111

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接