在R中将三列数据框转换为从某个网络到另一个网络的形式

5

我希望大家都过得很好,我有一个问题需要解决,就是如何在维基百科的单篇文章中建立编辑者网络(当一篇文章完成后,开始另一篇文章的网络),这些网络将用于查找用户和文章的度中心性。数据框如下:

faID           uName                     time_Stamp
 1               Qaless                 2003-09-06T20:27:00Z
 1               Austin                 2003-10-31T06:07:03Z
 1               SimonP                 2004-02-10T19:15:56Z
 1               SimonP                 2004-02-10T19:23:44Z
 1             Moncrief                2004-02-10T19:28:09Z
 1             Moncrief                2004-02-10T19:28:48Z
 1                  Rbs                2004-02-10T20:21:35Z
 1            Camembert                2004-02-10T20:27:34Z
 2             Moncrief                2004-02-10T20:29:33Z
 2                  Rbs                2004-02-10T20:39:33Z
 2              Jason M                2004-05-18T23:54:15Z
 2             Rickyrab                2004-05-28T05:35:32Z
 2             Rickyrab                2004-05-28T05:37:10Z
 2              Postdlf                2004-06-08T03:26:25Z
 2              Modster               2004-08-10T17:22:37Z
 3            PhilHibbs               2004-08-23T14:09:54Z
 3             Sfoskett               2004-09-10T18:22:15Z
 3               Dalton               2004-09-12T17:34:13Z
 3               Dalton               2004-09-12T17:38:35Z
 3      Ta bu shi da yu               2004-09-17T07:24:10Z

我希望有一个网络数据框,它应该是这样的:
faid      to         from        time stamp 
 1         Qaless    Qaless        2003-09-06T20:27:00Z
 1        Qaless     Austin        2003-10-31T06:07:03Z
 1        Austin     SimonP        2004-02-10T19:15:56Z
 1        SimonP     SimonP        2004-02-10T19:23:44Z
 1        SimonP     Moncrief      2004-02-10T19:28:09Z
 1        Moncrief   Moncrief      2004-02-10T19:28:48Z
 1        Moncrief     Rbs         2004-02-10T20:21:35Z
 1        Camembert    Rbs         2004-02-10T20:27:34Z
 2        Moncrief   Moncrief      2004-02-10T20:29:33Z
 2        Moncrief    Rbs          2004-02-10T20:39:33Z
 2        Rbs        Jason M       2004-05-18T23:54:15Z
 2        jason M    Rickyrab     2004-05-28T05:35:32Z
 2        Rickyrab  Rickyrab      2004-05-28T05:37:10Z
 2        Rickyrab     Postdlf    2004-06-08T03:26:25Z
 2        Postdlf    modster      2004-08-10T17:22:37Z
 3        PhilHibbs PhilHibbs     2004-08-23T14:09:54Z
 3        PhilHibbs Sfoskett      2004-09-10T18:22:15Z 
 3        Sfoskett  Dalton        2004-09-12T17:34:13Z
 3        Dalton    Dalton        2004-09-12T17:38:35Z 
 3    dalton     Ta bu shi da yu  2004-09-17T07:24:10Z

一般的解释如下: to--> 表示给正在编辑的人(即下一行之前有编辑的人) from--> 表示来自之前编辑的人(即在下一行之前编辑过的人) 如果需要解决问题,请提供更多信息。

2个回答

6

以下是使用最新版本的data.table可能的解决方案

library(data.table) # v 1.9.6+
setDT(df)[, to := shift(uName, fill = uName[1L]), by = faID]
setnames(df, "uName", "from")
df
#     faID            from           time_Stamp        to
#  1:    1          Qaless 2003-09-06T20:27:00Z    Qaless
#  2:    1          Austin 2003-10-31T06:07:03Z    Qaless
#  3:    1          SimonP 2004-02-10T19:15:56Z    Austin
#  4:    1          SimonP 2004-02-10T19:23:44Z    SimonP
#  5:    1        Moncrief 2004-02-10T19:28:09Z    SimonP
#  6:    1        Moncrief 2004-02-10T19:28:48Z  Moncrief
#  7:    1             Rbs 2004-02-10T20:21:35Z  Moncrief
#  8:    1       Camembert 2004-02-10T20:27:34Z       Rbs
#  9:    2        Moncrief 2004-02-10T20:29:33Z  Moncrief
# 10:    2             Rbs 2004-02-10T20:39:33Z  Moncrief
# 11:    2         Jason M 2004-05-18T23:54:15Z       Rbs
# 12:    2        Rickyrab 2004-05-28T05:35:32Z   Jason M
# 13:    2        Rickyrab 2004-05-28T05:37:10Z  Rickyrab
# 14:    2         Postdlf 2004-06-08T03:26:25Z  Rickyrab
# 15:    2         Modster 2004-08-10T17:22:37Z   Postdlf
# 16:    3       PhilHibbs 2004-08-23T14:09:54Z PhilHibbs
# 17:    3        Sfoskett 2004-09-10T18:22:15Z PhilHibbs
# 18:    3          Dalton 2004-09-12T17:34:13Z  Sfoskett
# 19:    3          Dalton 2004-09-12T17:38:35Z    Dalton
# 20:    3 Ta bu shi da yu 2004-09-17T07:24:10Z    Dalton

它给我一个错误,如下: 在[.data.table(setDT(na2), , :=(to, shift(uName, fill = uName[1L])))) 中找不到函数"shift". - Naveed Khan Wazir
是的,我提到你需要devel版本。你是否像我展示的那样先运行了library(devtools) ; install_github("Rdatatable/data.table", build_vignettes = FALSE) - David Arenburg
是的,我安装了库(devtools),但错误仍然相同。 - Naveed Khan Wazir
但是在安装devtools库之后,你是否运行了library(devtools) ; install_github("Rdatatable/data.table", build_vignettes = FALSE)这个命令?另外,在此之后你还需要再次运行library(data.table) - David Arenburg
1
所以它没有起作用。尝试关闭所有R会话。只打开一个。然后运行library(devtools) ; install_github("Rdatatable/data.table", build_vignettes = FALSE),看看是否仍然出现任何错误。 - David Arenburg
显示剩余6条评论

3
如果df是您的原始data.frame,您可以执行以下操作:
transform(df, 
             from = uName, 
             to = ave(as.character(uName), faID, FUN = function(x) c(x[1L], head(x,-1L))), 
             uName = NULL
           )

#    faID           time_Stamp            from        to
# 1     1 2003-09-06T20:27:00Z          Qaless    Qaless
# 2     1 2003-10-31T06:07:03Z          Austin    Qaless
# 3     1 2004-02-10T19:15:56Z          SimonP    Austin
# 4     1 2004-02-10T19:23:44Z          SimonP    SimonP
# 5     1 2004-02-10T19:28:09Z        Moncrief    SimonP
# 6     1 2004-02-10T19:28:48Z        Moncrief  Moncrief
# 7     1 2004-02-10T20:21:35Z             Rbs  Moncrief
# 8     1 2004-02-10T20:27:34Z       Camembert       Rbs
# 9     2 2004-02-10T20:29:33Z        Moncrief  Moncrief
# 10    2 2004-02-10T20:39:33Z             Rbs  Moncrief
# 11    2 2004-05-18T23:54:15Z         Jason M       Rbs
# 12    2 2004-05-28T05:35:32Z        Rickyrab   Jason M
# 13    2 2004-05-28T05:37:10Z        Rickyrab  Rickyrab
# 14    2 2004-06-08T03:26:25Z         Postdlf  Rickyrab
# 15    2 2004-08-10T17:22:37Z         Modster   Postdlf
# 16    3 2004-08-23T14:09:54Z       PhilHibbs PhilHibbs
# 17    3 2004-09-10T18:22:15Z        Sfoskett PhilHibbs
# 18    3 2004-09-12T17:34:13Z          Dalton  Sfoskett
# 19    3 2004-09-12T17:38:35Z          Dalton    Dalton
# 20    3 2004-09-17T07:24:10Z Ta bu shi da yu    Dalton

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接