R kknn包和加权k最近邻计算

3

我正在尝试手动计算 R kknn 包输出的距离和权重测量。当数据未经过缩放时,我能够正确地计算欧几里得距离和倒数权重,如下所示:

欧几里得距离

sqrt((6-8)^2 + (4-5)^2) = 2.236068

sqrt((6-3)^2 + (4-7)^2) = 4.242641

sqrt((6-7)^2 + (4-3)^2) = 1.414214

倒数权重

1 / (2.236068 / 4.242641) = 1.897368

1 / (1.414214 / 4.242641) = 3.000000。

我不知道矩形权重是如何计算的,因为我得到了以下结果:

1/2 * 1 = 0.50

1/2 * 1 = 0.50

但是 kknn 包给出了 1 和 1。

最后,当数据被缩放时,我完全无法计算距离和权重。任何帮助都将不胜感激,因为我正在尝试理解 kknn 包的工作原理。

library(kknn)

training <- data.frame(class = c(1, 0, 1), height = c(8, 3, 7), weight = c(5, 7, 3))

training

holdouts <- data.frame(class = 1, height = 6, weight = 4)

holdouts

rectangular_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = FALSE)

rectangular_no_scale[["D"]]

rectangular_no_scale[["W"]]

inversion_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = FALSE)

inversion_no_scale[["D"]]

inversion_no_scale[["W"]]

rectangular_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = TRUE)

rectangular_with_scale[["D"]]

rectangular_with_scale[["W"]]

inversion_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = TRUE)

inversion_with_scale[["D"]]

inversion_with_scale[["W"]]
1个回答

2

kknn 的源代码(在控制台模式下键入 kknn + return)有助于理解计算过程:

library(kknn)

training <- data.frame(class = c(1, 0, 1), height = c(8, 3, 7), weight = c(5, 7, 3))

training
#>   class height weight
#> 1     1      8      5
#> 2     0      3      7
#> 3     1      7      3

holdouts <- data.frame(class = 1, height = 6, weight = 4)

holdouts
#>   class height weight
#> 1     1      6      4

# Euclidian distance
d <- sqrt((training$height-holdouts$height)^2 +(training$weight-holdouts$weight)^2)
d <- d[order(d)]
d
#> [1] 1.414214 2.236068 4.242641

rectangular_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = FALSE)

rectangular_no_scale[["D"]]
#> [1] 1.414214 2.236068
d[1:2]
#> [1] 1.414214 2.236068

rectangular_no_scale[["W"]]
#>      [,1] [,2]
#> [1,]    1    1
# 
# source code:
# if (kernel == "rectangular") 
#   W <- matrix(1, nrow = p, ncol = k)
# This is why you get 1,1 : weights are the same and not normalized

inversion_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = FALSE)

inversion_no_scale[["D"]]
#> [1] 1.414214 2.236068
d[1:2]
#> [1] 1.414214 2.236068

inversion_no_scale[["W"]]
#>      [,1]     [,2]
#> [1,]    3 1.897367
#
# Source code :
# W <- D/maxdist
# if (kernel == "inv") 
#   W <- 1/W
max(d)/d[1:2]
#> [1] 3.000000 1.897367

rectangular_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = TRUE)

height_sd <- sqrt(var(training$height))
weight_sd <- sqrt(var(training$weight))
training_scaled <- training
training_scaled$height <- training$height / height_sd
training_scaled$weight <- training$weight / weight_sd
holdouts_scaled <- holdouts
holdouts_scaled$height <- holdouts$height / height_sd
holdouts_scaled$weight <- holdouts$weight / weight_sd

rectangular_with_scale[["D"]]
#> [1] 0.6267832 0.9063270
d_scaled <- sqrt((training_scaled$height-holdouts_scaled$height)^2 +(training_scaled$weight-holdouts_scaled$weight)^2)
d_scaled <- d[order(d_scaled)]
d_scaled
#> [1] 0.6267832 0.9063270 1.8803495

rectangular_with_scale[["W"]]
#>      [,1] [,2]
#> [1,]    1    1
# Same as before : 1,1


inversion_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = TRUE)

inversion_with_scale[["D"]]
#> [1] 0.6267832 0.9063270
d_scaled[1:2]
#> [1] 0.6267832 0.9063270

inversion_with_scale[["W"]]
#>      [,1]     [,2]
#> [1,]    3 2.074692
max(d_scaled)/d_scaled[1:2]
#> [1] 3.000000 2.074692

总之,矩形核使用相同的权重,不需要进行归一化以查找k个最近邻居,因此权重简单地设置为1。
缩放只是通过其标准差将每列除以其标准差,然后继续计算。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接