计算两个数据集中两点之间的距离（最近邻）。

Question

计算两个数据集中两点之间的距离（最近邻）。

6

我想计算两个不同数据集中两点之间的距离。我不想计算所有点之间的距离，只需要计算到数据集B中最近的点的距离。

例如：数据集A - 人：http://pastebin.com/HbaeqACi 数据集B - 水源：http://pastebin.com/UdDvNtHs 数据集C - 城市：http://pastebin.com/nATnkMRk 因此...我想计算每个人到最近的水源点的距离。我已经尝试过使用rgeos包进行计算，一开始遇到了一些投影错误，但后来解决了。但是这种方法会计算所有点之间的距离，而我只对与最近水源点的距离感兴趣。

# load csv files
persons = read.csv("persons.csv", header = TRUE)
water = read.csv("water.csv", header = TRUE)
# change dataframes to SpatialPointDataFrame and assign a projection
library(sp)
library(rgeos)
coordinates(persons) <- c("POINT_X", "POINT_Y")
proj4string(persons) <- CRS("+proj=utm +datum=WGS84")
coordinates(water) <- c("POINT_X", "POINT_Y")
proj4string(water) <- CRS("+proj=utm +datum=WGS84")

# use rgoes package to calculate the distance
distance <- gDistance(persons, water, byid=TRUE)
# works, but calculates a huge number of distances

有没有我漏掉的参数？或者我需要使用另一个包或函数吗？我还看了一下spatstat，它能够计算到最近邻居的距离，但不能计算两个不同数据集之间的距离：http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/spatstat/html/nndist.html

编辑：
完整的R脚本包括绘制数据集：

library(RgoogleMaps)
library(ggplot2)
library(ggmap)
library(sp)
library(fossil)

#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("city.csv", header =TRUE)

# plot data
persons_ggplot2 <- persons
city_ggplot2 <- city
water_ggplot2 <- water
gc <- geocode('new york, usa')
center <- as.numeric(gc)  
G <- ggmap(get_googlemap(center = center, color = 'bw', scale = 1, zoom = 11, maptype = "terrain", frame=T), extent="device")
G1 <- G + geom_point(aes(x=POINT_X, y=POINT_Y ),data=city, shape = 22, color="black", fill = "yellow", size = 4) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=persons, shape = 8, color="red", size=2.5) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=water_ggplot2, color="blue", size=1)
plot(G1)

#### calculate distance
# Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)

# Generate a function that looks for the closest waterfeature for each id coordinates
NearestW <- function(id){
tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
WaterFeatures <- rbind(tmp,water[,2:3])
tmp1 <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
tmp1 <- which.min(tmp1)
tmp1 <- water[tmp1,1]
tmp1 <- data.frame(tmp1, WaterFeature=tmp)
return(tmp1)
}

#apply to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))

enter image description here

- schlomm

2个回答

2

也许我来晚了一步，但是你可以使用 spatstat 来计算两个不同数据集之间的距离。命令是 nncross。你需要使用的参数是两个类型为 ppp 的对象，你可以使用 as.ppp() 函数来创建它们。

- Mario Becerra

也许现在问这个问题的原因已经晚了，但对于未来肯定是有帮助的！谢谢！ :) - schlomm

然而，如果您使用 spatstat，则必须首先将其投影到平面坐标系中。它不会自动识别经度、纬度数据。但是，nncross 的 C 代码非常高效，因此您可能会在处理大型数据集时获得显着的速度提升。 - Ege Rubak

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- eclark · Accepted Answer

您觉得写一个函数来为每个人查找最近的水源如何？

#requires function earth.dist from "fossil" package
require(fossil)

#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)

#Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)


#Generate a function that looks for the closest waterfeature for each id coordinates
NearestW <- function(id){
   tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
   WaterFeatures <- rbind(tmp,water[,2:3])
   tmp1 <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
   tmp1 <- min(tmp1)
   tmp1 <- data.frame(tmp1, WaterFeature=tmp)
   return(tmp1)
 }

#apply to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))

注意：我已经在原始的read.csv函数中添加了stringsAsFactors参数，这将使最后的合并更容易进行。

注意：列tmp1记录了到最近水源的米数。