时间变化的网络在R中的实现

4

我有关于大学俱乐部每周社交小时活动中可能发生和实际发生的每一个互动的数据。

以下是我的一部分数据:

structure(list(from = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), to = structure(c(2L, 3L, 2L, 3L, 
2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
"B", "C"), class = "factor"), timestalked = c(0L, 1L, 0L, 4L, 
1L, 2L, 0L, 1L, 0L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 0L, 0L), week = structure(c(1L, 
1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 
2L), .Label = c("1/1/2010", "1/15/2010", "1/8/2010"), class = "factor")), .Names = c("from", 
"to", "timestalked", "week"), class = "data.frame", row.names = c(NA, 
-18L))

我正在尝试计算每个星期、最近两周和今年到目前为止每个人(A、B、C)的网络统计数据,例如中心性。我唯一成功的方法是手动将文件按照所需的时间单位拆分,但一定有更简单的方法。
timestalked 为0时,应视为无边。
输出将生成以下 .csv
actor  cent_week1 cent_week2 cent_week3 cent_last2weeks cent_yeartodate
 A       
 B
 C 

cent_week1 是指 2010年1月1日的中心度;cent_last2weeks 只考虑了 2010年1月8日和1月15日;cent_yeartodate 则是同时考虑了所有的数据。这将应用于一个更大的数百万观测值的数据集。


这是您想要的吗:为每个时间段获取一个图表 b = by(d, d$week, FUN=graph_from_data_frame),然后在它们上运行函数 sapply(b, function(x) eigen_centrality(x, weights = E(x)$timestalked)$vector)(不确定是否合理)。 - user20650
@user20650 这似乎符合我的要求,归结为一个数据集,看起来像我在问题中提出的输出。如果您能将评论转化为答案,这就足够了。按周和累计绘制图表的能力也会很有帮助。 - CJ12
你能否告诉我们@user20650的回答是否令人满意? - nghauran
@CJ12,请问您能否进一步说明week1week2week3last2weeksyeartodate的具体含义?week1 == 1/1/2010week2 == 1/8/2010last2weeks ==?... - nghauran
@user20650 抱歉,我不明白你的评论。请随意发布一个能产生上述输出的答案。 - CJ12
显示剩余6条评论
4个回答

1

无法评论,所以我写了一个“答案”。如果您想对 timestalked 执行某些数学运算,并通过 from 获取值(在您的示例中未找到任何名为 actor 的变量),则可以使用 data.table 方法:

dat <- as.data.table(dat) # or add 'data.table' to the class parameter
dat$week <- as.Date(dat$week, format = "%m/%d/%Y")
dat[, .(cent = mean(timestalked)), by = list(from, weeknum = week(week))]

这将产生以下输出:

dat[, .(cent = mean(timestalked)), by = list(from, weeknum = week(week))]

   from weeknum cent
1:    A       1  0.5
2:    A       2  2.0
3:    A       3  1.5
4:    B       1  0.5
5:    B       2  1.0
6:    B       3  0.5
7:    C       1  1.5
8:    C       2  0.5
9:    C       3  0.0

将其分配给new_dat。您可以使用new_dat[weeknum %in% 2:3]或任何其他变体简单地按周进行子集,或者对全年进行sum。此外,您还可以按需要排序/订购。

希望这有所帮助!


1
怎么样:
library(dplyr)
centralities <- tmp       %>% 
  group_by(week)          %>% 
  filter(timestalked > 0) %>% 
  do(
    week_graph=igraph::graph_from_edgelist(as.matrix(cbind(.$from, .$to)))
  )                       %>% 
  do(
    ecs = igraph::eigen_centrality(.$week_graph)$vector
  )                       %>% 
  summarise(ecs_A = ecs[[1]], ecs_B = ecs[[2]], ecs_C = ecs[[3]])

如果你有很多演员,你可以使用summarise_all。将其放入长格式中留作练习。

从问题中加载数据集时,使用您的代码会出现以下错误:Error in eval(lhs, parent, parent) : object 'tmp' not found - CJ12
显然,你需要将问题中的structure加载到对象tmp中。 - user3603486
显然,我期望使用提供的数据得到完整的答案。如果您能按照问题中概述的方式创建输出,那么我很乐意接受它。 - CJ12
哈哈...这不是在跳过障碍。你要么按照问题所述回答,要么就不回答。尝试后者只会浪费你自己的时间。 - CJ12

1
你可以通过将窗口设置在另一个表中,然后对每个窗口进行分组操作来完成此操作:
数据准备:
# Load Data
DT <- structure(list(from = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), to = structure(c(2L, 3L, 2L, 3L, 
2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
"B", "C"), class = "factor"), timestalked = c(0L, 1L, 0L, 4L, 
1L, 2L, 0L, 1L, 0L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 0L, 0L), week = structure(c(1L, 
1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 
2L), .Label = c("1/1/2010", "1/15/2010", "1/8/2010"), class = "factor")), .Names = c("from", 
"to", "timestalked", "week"), class = "data.frame", row.names = c(NA, 
-18L))

# Code
library(igraph)
library(data.table)

setDT(DT)

# setup events
DT <- DT[timestalked > 0]
DT[, week := as.Date(week, format = "%m/%d/%Y")]

# setup windows, edit as needed
date_ranges <- data.table(label = c("cent_week_1","cent_week_2","cent_last2weeks","cent_yeartodate"),
                          week_from = as.Date(c("2010-01-01","2010-01-08","2010-01-08","2010-01-01")),
                          week_to = as.Date(c("2010-01-01","2010-01-08","2010-01-15","2010-01-15"))
)

# find all events within windows
DT[, JA := 1]
date_ranges[, JA := 1]
graph_base <- merge(DT, date_ranges, by = "JA", allow.cartesian = TRUE)[week >= week_from & week <= week_to]

现在是按组代码,第二行有点复杂,欢迎提出如何避免重复调用的想法。

graph_base <- graph_base[, .(graphs = list(graph_from_data_frame(.SD))), by = label, .SDcols = c("from", "to", "timestalked")] # create graphs
graph_base <- graph_base[, .(vertex = names(eigen_centrality(graphs[[1]])$vector), ec = eigen_centrality(graphs[[1]])$vector), by = label] # calculate centrality

用于最终格式化的dcast:

dcast(graph_base, vertex ~ label, value.var = "ec")
   vertex cent_last2weeks cent_week_1 cent_week_2 cent_yeartodate
1:      A       1.0000000   0.7071068   0.8944272       0.9397362
2:      B       0.7052723   0.7071068   0.4472136       0.7134685
3:      C       0.9008487   1.0000000   1.0000000       1.0000000

这很棒。 (1) 我一直在考虑最好的输出列应该是 vertex date cent_this_week cent_last_two_weekscent_yeartodate,这将使代码更具可移植性,如果您知道一种将此输出转置为这样的形式的方法,我将不胜感激。(2) 是否可以将 dcast 输出到 wd 中的 .csv 文件中?我已经尝试了一个小时,但几乎没有进展。谢谢。 - CJ12
此外,真实数据集有数千个日期,因此需要避免手动编码。 - CJ12
@CJ12(1)我不确定如何在输出中加入日期,因为列定义是特定的日期区域。在这种情况下,日期代表什么?(2)只需使用write.csv(),它将在dcasted值上工作(3)您可以从数据中以编程方式生成此表格-您尝试过什么? - Chris

0

这个分析遵循通用的分割-应用-合并方法,其中数据按周划分,应用图形函数,然后将结果组合在一起。有几个工具可以实现这个方法,但以下使用基本R和data.table

基本R

首先为你的数据设置数据类别,以便术语“最近两周”的含义更明确。

# Set date class and order
d$week <- as.Date(d$week, format="%m/%d/%Y")
d <- d[order(d$week), ]
d <- d[d$timestalked > 0, ] # remove edges // dont need to do this is using weights

然后分割并应用图形函数

# split data and form graph for eack week
g1 <- lapply(split(seq(nrow(d)), d$week), function(i) 
                                                  graph_from_data_frame(d[i,]))
# you can then run graph functions to extract specific measures
(grps <- sapply(g1, function(x) eigen_centrality(x,
                                            weights = E(x)$timestalked)$vector))

#   2010-01-01 2010-01-08 2010-01-15
# A  0.5547002  0.9284767  1.0000000
# B  0.8320503  0.3713907  0.7071068
# C  1.0000000  1.0000000  0.7071068

# Aside: If you only have one function to run on the graphs, 
# you could do this in one step
# 
# sapply(split(seq(nrow(d)), d$week), function(i) {
#             x = graph_from_data_frame(d[i,])
#             eigen_centrality(x, weights = E(x)$timestalked)$vector
#           })

接下来,您需要将所有数据的分析结合起来 - 因为您只需要构建另外两个图表,这不是耗时的部分。

fun1 <- function(i, name) {
            x = graph_from_data_frame(i)
            d = data.frame(eigen_centrality(x, weights = E(x)$timestalked)$vector)
            setNames(d, name)
    }


a = fun1(d, "alldata")
lt = fun1(d[d$week %in% tail(unique(d$week), 2), ], "lasttwo")

# Combine: could use `cbind` in this example, but perhaps `merge` is 
# safer if there are different levels between dates
data.frame(grps, lt, a) # or
Reduce(merge, lapply(list(grps, a, lt), function(x) data.frame(x, nms = row.names(x))))

#   nms X2010.01.01 X2010.01.08 X2010.01.15  alldata lasttwo
# 1   A   0.5547002   0.9284767   1.0000000 0.909899     1.0
# 2   B   0.8320503   0.3713907   0.7071068 0.607475     0.5
# 3   C   1.0000000   1.0000000   0.7071068 1.000000     1.0

data.table

很可能耗时的步骤是显式地将函数应用于数据并进行拆分。 data.table 在这方面应该有所帮助,特别是当数据变得庞大和/或存在更多分组时。

# function to apply to graph
fun <- function(d) {
  x = graph_from_data_frame(d)
  e = eigen_centrality(x, weights = E(x)$timestalked)$vector
  list(e, names(e))
}

library(data.table)
dcast(
  setDT(d)[, fun(.SD), by=week], # apply function - returns data in  long format
  V2 ~ week, value.var = "V1")   # convert to wide format

#    V2 2010-01-01 2010-01-08 2010-01-15
# 1:  A  0.5547002  0.9284767  1.0000000
# 2:  B  0.8320503  0.3713907  0.7071068
# 3:  C  1.0000000  1.0000000  0.7071068

然后像之前一样在整个数据/最近两周内运行该函数。

答案之间存在差异,这是由于我们在计算中心性时如何使用weights参数,而其他人则不使用权重。


d=structure(list(from = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), to = structure(c(2L, 3L, 2L, 3L, 
2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
"B", "C"), class = "factor"), timestalked = c(0L, 1L, 0L, 4L, 
1L, 2L, 0L, 1L, 0L, 2L, 1L, 0L, 1L, 2L, 1L, 0L, 0L, 0L), week = structure(c(1L, 
1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 
2L), .Label = c("1/1/2010", "1/15/2010", "1/8/2010"), class = "factor")), .Names = c("from", 
"to", "timestalked", "week"), class = "data.frame", row.names = c(NA, 
-18L))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接