使用st_distance计算两组点之间的所有距离。

Question

使用st_distance计算两组点之间的所有距离。

5

我在R中有两组点存储为“sf”对象。点对象x包含204,467个点，而点对象y包含5,297个点。

理论上，我想计算从x中的所有点到y中的所有点的距离。我知道这将创建一个庞大的矩阵，但是使用“sf”包中的“st_distance(x, y, by_element=FALSE)”函数，在我的i7台式机上大约需要40分钟就可以完成。

我想要做的是计算从x中的所有点到y中的所有点的距离，然后将其转换为一个包含相应x和y点对的所有变量的“data.frame”。这是因为我希望能够使用“dplyr”进行灵活的聚合操作，例如，我想找出距离x在10、50、100公里范围内的y点的数量，并且满足“x$year < y$year”的条件。

我成功地创建了距离矩阵，其中包含大约1,083,061,699个单元格。我知道这种方法非常低效，但它提供了灵活的聚合方式。欢迎提出其他建议。

以下是创建两个sf点对象并测量它们之间距离的代码。接下来，我想将其转换为一个包含x和y所有变量的数据框，但这就是我无法继续的地方。

如果我的建议的工作流程行不通，有人能提供一个替代解决方案来测量预定义半径内所有点的距离，并创建一个包含x和y所有变量的结果数据框吗？

# Create two sf point objects 
set.seed(123)
library(sf)


pts1 <- st_as_sf(x = data.frame(id=seq(1,204467,1),
                                year=sample(seq(from = 1990, to = 2018, by = 1), size = 204467, replace = TRUE),
                                xcoord=sample(seq(from = -180, to = 180, by = 1), size = 204467, replace = TRUE),
                                ycoord=sample(seq(from = -90, to = 90, by = 1), size = 204467, replace = TRUE)),
                 coords=c("xcoord","ycoord"),crs=4326)

pts2 <- st_as_sf(x = data.frame(id=seq(1,5297,1),
                                year=sample(seq(from = 1990, to = 2018, by = 1), size = 5297, replace = TRUE),
                                xcoord=sample(seq(from = -180, to = 180, by = 1), size = 5297, replace = TRUE),
                                ycoord=sample(seq(from = -90, to = 90, by = 1), size = 5297, replace = TRUE)),
                 coords=c("xcoord","ycoord"),crs=4326)

distmat <- st_distance(pts1,pts2,by_element = FALSE)

- spesseh

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- dww · Accepted Answer

我认为你可以采用不同的方法来处理。一旦你有了distmat矩阵，就可以在不需要data.frame的情况下进行所描述的计算。你可以使用标准子集获取满足指定条件的点。

例如，要查找pts1 $ year大于pts2 $ year的点的组合，我们可以执行以下操作：

subset_points = outer(pts1$year, pts2$year, `>`)

然后，要找出这些之间距离超过100公里的数量，我们可以执行以下操作：

library(units)
sum(distmat[subset_points] > (100 * as_units('km', 1)))

关于内存使用的说明

无论你使用sf还是data.frame对象，都有可能在每个矩阵或数据表列中使用1e9浮点数时开始触及RAM限制。此时，您可以考虑将距离矩阵转换为raster格式。然后，raster可以存储在磁盘而不是内存中，并且您可以利用raster包中的内存安全函数来完成处理。

如何使用raster从磁盘上执行操作并节省RAM

我们可以像下面这样使用内存安全的raster操作来处理非常大的矩阵：

library(raster)

# convert our matrices to rasters, so we can work on them from disk
r = raster(matrix(as.numeric(distmat), length(pts1$id), length(pts2$id)))
s = raster(subset_points)
remove('distmat', 'subset_points')

# now create a raster equal to r, but with zeroes in the cells we wish to exclude from calculation
rs = overlay(r,s,fun=function(x,y){x*y}, filename='out1.tif')     

# find which cells have value greater than x (1e6 in the example)
Big_cells = reclassify(rs, matrix(c(-Inf, 1e6, 0, 1e6, Inf, 1), ncol=3, byrow=TRUE), 'out.tiff', overwrite=T)

# and finally count the cells
N = cellStats(Big_cells, sum)