使用R从GTFS数据创建iGraph图

3

我的目标是将GTFS停靠点和行程信息转换为图表,其中顶点是停靠点(来自GTFS的stops.txt),边缘是行程(来自GTFS的stop_times.txt)。开始的几个步骤很明显:

> library(igraph)

#Reading in GTFS files
> stops<-read.csv("stops.txt")
> stop_times<-read.csv("stop_times.txt")

我的第一反应是简单地使用iGraph中的graph_from_data_frame函数,但存在一个严重的缺点:stop_times DF实际上并没有结构化为所需的方案。它的方案如下:

>head(stop_times)
  trip_id stop_id arrival_time departure_time stop_sequence shape_dist_traveled
1 A895151  F04272     06:20:00       06:20:00            10                   0
2 A895151  F04184     06:22:00       06:22:00            20                 648
3 A895151  F04319     06:24:00       06:24:00            30                1224
4 A895151  F04369     06:27:00       06:27:00            40                2779
5 A895151  008264     06:31:00       06:31:00            50                5620
6 A895151  F01520     06:33:00       06:33:00            60                6691

这意味着它包含到达和离开各自站点的停靠ID及时间,而我想要每行获取起始站点ID、终点站点ID、起始时间和结束时间(实际上不是“站点”,而是从站点转换过来的“中转”)。但是这种转换对我来说似乎具有挑战性,因为我应该迭代stop_times中的行,并决定它们是否在同一trip_id中,如果是,则计算开始-结束数据,如果不是,则插入NULL或找到另一种解决方案来分离旅行...这对我来说非常令人困惑。

是否有任何优雅的方法将这两个数据框组合成所需的图形?

1个回答

2
“from”和“to”可以通过将以下行中的值“向上移动”来生成。停止信息可以简单地连接在一起。
让我用一个例子来解释,并使用library(data.table)
## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")

#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops

#setDT(dt_stop_times)
#setDT(dt_stops)


## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]

## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)

## create a new column by shifting the stop_id of the following row up 
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]

## you will have NAs at this point because the last stop doesn't go anywhere.

## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"), 
                     arrival_time_stop_to = shift(arrival_time, type = "lead"),
                     departure_time_stop_to = shift(departure_time, type = "lead")),
              by = .(trip_id)]

## now you have your 'from' and 'to' columns from which you can make your igraph

## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]

#                           trip_id stop_id                                                  stop_name_from arrival_time stop_id_to
# 1:          1.T0.3-86-A-mjp-1.7.R    4174                                    71-RMIT/Plenty Rd (Bundoora)     25:42:00       4485
# 2:          1.T0.3-86-A-mjp-1.7.R    4485                            70-Janefield Dr/Plenty Rd (Bundoora)     25:43:00       4486
# 3:          1.T0.3-86-A-mjp-1.7.R    4486                              69-Taunton Dr/Plenty Rd (Bundoora)     25:44:00       4487
# 4:          1.T0.3-86-A-mjp-1.7.R    4487                           68-Greenhills Rd/Plenty Rd (Bundoora)     25:45:00       4488
# 5:          1.T0.3-86-A-mjp-1.7.R    4488                      67-Bundoora Square SC/Plenty Rd (Bundoora)     25:46:00       4489
# ---                                                                                                                         
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H   17871           7-Queen Victoria Market/Elizabeth St (Melbourne City)     23:25:00      17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H   17873       5-Melbourne Central Station/Elizabeth St (Melbourne City)     23:27:00      17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H   17875              3-Bourke Street Mall/Elizabeth St (Melbourne City)     23:30:00      17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H   17876                      2-Collins St/Elizabeth St (Melbourne City)     23:31:00      17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H   17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City)     23:32:00         NA
#          arrival_time_stop_to
# 1:                   25:43:00
# 2:                   25:44:00
# 3:                   25:45:00
# 4:                   25:46:00
# 5:                   25:47:00
# ---                     
# 9415793:             23:27:00
# 9415794:             23:30:00
# 9415795:             23:31:00
# 9415796:             23:32:00
# 9415797:                   NA

现在,使用graph_from_data_frame{igraph}只需按照以下步骤操作:
# get a df with nodes
  nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]

# links beetween stops
  links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]

# create graph
  g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)

请注意,在一个GTFS.zip文件中可能会有多个运输模式(火车、公交车、地铁等),而一些站点对之间的连通性要高得多,这是由于服务频率的差异造成的。目前还不清楚在从GTFS.zip构建图形时应如何考虑这两个问题。可能的解决方法是根据其频率加权每个边,并建立一个具有某些站点共同跨越每种运输模式的相互依赖层的多层网络。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接