线性回归代码的差异

Question

线性回归代码的差异

3

我正在从《统计学习导论：基于R语言的应用》自学R语言。我确信这两段代码应该得到相同的平均值，但是我得到了截然不同的结果。请问有谁可以帮我找出为什么我没有得到相同的消息？看起来第一段代码块是错误的。这些代码来自汽车数据集。我的预测和书中的预测是不同的。然而，这两者所训练的索引是相同的。

第一段代码块（我的代码）

set.seed(1)
train_index = sample (392, 196)
Auto$index = c(1:nrow(Auto))
train_df = Auto[train_index,]
test_df = anti_join(Auto, train_df, by="index")
attach(train_df)
lm.fit = lm(mpg ~ horsepower)
predictions = predict(lm.fit, horsepower = test_df$horsepower)

mean((test_df$mpg - predictions)^2)

第二部分（书中代码 - 统计学习导论：R语言实现）

set. seed (1)
train = sample (392, 196)
lm.fit = lm(mpg ~ horsepower , data = Auto , subset = train)
attach(Auto)

mean (( mpg - predict(lm.fit , Auto))[-train ]^2)

- Ba Lalo

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zephryl · Accepted Answer

在你的代码中，predict() 中没有正确指定测试数据。 predict() 接受包含预测变量的数据框作为参数传递给 newdata 参数; 而你包含了 horsepower = test_df$horsepower，这只是被 ... 吸收了，没有任何效果。

如果你将整个 test_df 数据框传递给 newdata，你将得到与文本相同的结果。

library(ISLR)
library(dplyr)
set.seed(1)

# OP’s code with change to predict()
train_index = sample(392, 196)
Auto$index = c(1:nrow(Auto))
train_df = Auto[train_index,]
test_df = anti_join(Auto, train_df, by="index")
attach(train_df)
lm.fit = lm(mpg ~ horsepower)
predictions = predict(lm.fit, newdata = test_df)
mean((test_df$mpg - predictions)^2)
# 23.26601

# ISLR code
set.seed (1)
train = sample (392 , 196)
lm.fit = lm(mpg ~ horsepower , data = Auto , subset = train)
attach(Auto)
mean (( mpg - predict(lm.fit , Auto))[-train ]^2)
# 23.26601