在R中将线性回归的输出循环放入数据框中

8
我有以下数据集,想要对每个国家和地区进行线性回归,然后将预测值添加到数据集中: enter image description here 添加三列后的最终数据框如下: enter image description here 我已经为一个国家和一个地区完成了上述操作,但现在希望能对每个国家和地区都做一遍,并通过cbind将预测值、上限值和下限值放回到数据集中:
 data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
            Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
            week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))

data_1 <- data[(data$country=="US" & data$Area=="G"),]
model <- lm(amount ~ week, data = data_1)
pre <- predict(model,newdata = data_1,interval = "prediction",level = 0.95)
pre

我该如何循环处理其他国家和地区的组合?
4个回答

6
我们还可以使用包中的函数augment来获取您想要的信息:
library(purrr)
library(broom)

data %>%
  group_by(country, Area) %>%
  nest() %>%
  mutate(models = map(data, ~ lm(amount ~ week, data = .)), 
         aug = map(models, ~ augment(.x, interval = "prediction"))) %>%
  unnest(aug) %>%
  select(country, Area, amount, week, .fitted, .lower, .upper)

# A tibble: 15 x 7
# Groups:   country, Area [3]
   country Area  amount  week .fitted  .lower .upper
   <chr>   <chr>  <dbl> <dbl>   <dbl>   <dbl>  <dbl>
 1 US      G         12     1    20.8 -27.7     69.3
 2 US      G         23     2    21.7 -22.0     65.4
 3 US      G         34     3    22.6 -19.4     64.6
 4 US      G         32     4    23.5 -20.2     67.2
 5 US      G         12     5    24.4 -24.1     72.9
 6 US      I         12     1    20.8 -33.9     75.5
 7 US      I         34     2    30.5 -18.8     79.8
 8 US      I         45     3    40.2  -7.17    87.6
 9 US      I         65     4    49.9   0.595   99.2
10 US      I         45     5    59.6   4.90   114. 
11 UK      A         45     1    36.6  -6.05    79.2
12 UK      A         34     2    37.1  -1.34    75.5
13 UK      A         23     3    37.6   0.667   74.5
14 UK      A         43     4    38.1  -0.341   76.5
15 UK      A         43     5    38.6  -4.05    81.2

6

...以及一个基于 R 基础的解决方案:

data <- data.frame(country = c("US","US","US","US","US","US","US","US","US","US","UK","UK","UK","UK","UK"),
                   Area = c("G","G","G","G","G","I","I","I","I","I","A","A","A","A","A"),
                   week = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),amount = c(12,23,34,32,12,12,34,45,65,45,45,34,23,43,43))

splitVar <- paste0(data$country,"-",data$Area)
dfList <- split(data,splitVar)
result <- do.call(rbind,lapply(dfList,function(x){
     model <- lm(amount ~ week, data = x)
     cbind(x,predict(model,newdata = x,interval = "prediction",level = 0.95))
}))
result

…结果如下:

        country Area week amount  fit         lwr       upr
UK-A.11      UK    A    1     45 36.6  -6.0463638  79.24636
UK-A.12      UK    A    2     34 37.1  -1.3409128  75.54091
UK-A.13      UK    A    3     23 37.6   0.6671656  74.53283
UK-A.14      UK    A    4     43 38.1  -0.3409128  76.54091
UK-A.15      UK    A    5     43 38.6  -4.0463638  81.24636
US-G.1       US    G    1     12 20.8 -27.6791493  69.27915
US-G.2       US    G    2     23 21.7 -21.9985147  65.39851
US-G.3       US    G    3     34 22.6 -19.3841749  64.58417
US-G.4       US    G    4     32 23.5 -20.1985147  67.19851
US-G.5       US    G    5     12 24.4 -24.0791493  72.87915
US-I.6       US    I    1     12 20.8 -33.8985900  75.49859
US-I.7       US    I    2     34 30.5 -18.8046427  79.80464
US-I.8       US    I    3     45 40.2  -7.1703685  87.57037
US-I.9       US    I    4     65 49.9   0.5953573  99.20464
US-I.10      US    I    5     45 59.6   4.9014100 114.29859

1
太棒了!我真的没有想到创建splitVar并将其用作split函数中的f会以两个变量进行拆分!非常直观的想法。 - Anoushiravan R
1
@AnoushiravanR - 我喜欢R语言的一件事情是,有很多不同的方法来解决一个问题。我点赞了你和Ronak的答案,但认为发表一个基本的R语言解决方案会很有趣,因为这是可能的。 - Len Greski
1
非常感谢您的点赞。基础R通常更快,因此比其他解决方案更可取。我需要努力减少使用其他软件包。正如您正确提到的,R在许多领域都提供了很大的自由度。顺便说一下,我很高兴在Coursera论坛上第一次见到您(如果我没有记错的话),并且在这里与您进行这样丰富而短暂的交流也是一种巨大的愉悦。 - Anoushiravan R
1
@AnoushiravanR - 不用谢。在《获取和清洗数据》一书中,Jeff Leek讲述了为什么整洁的数据很有用。但他没有讨论何时将数据“不整洁”以解决问题。知道何时打破规则(以及打破多少)对应于武术掌握的Shu Ha Ri阶段中的“离”部分。这个问题是一个很好的例子,打破“整洁数据”指令使人能够用基本的R创造性地解决问题。此外,感谢您对Stackoverflow社区的投资。 - Len Greski
这是一个非常有趣的想法。我认为要达到那个水平需要多年的经验,希望有一天能够实现它。这是我的荣幸,我在3个月前开始在Stackoverflow上贡献时就开始学习R语言,我不得不承认这是一个很好的学习平台,可以让你的知识保持长久,并帮助其他人获得他们想要的东西。我实际上在数据分析和R编程中找到了自己真正的热情,并希望有一天能够在这个领域工作。非常感谢您的鼓励性评论,这对我来说意义重大。 - Anoushiravan R

5

以下是使用 tidyverse 的方法,适用于每种 countryArea 的组合。

library(tidyverse)

data %>%
  group_by(country, Area) %>%
  nest() %>%
  mutate(model = map(data, ~ lm(amount ~ week, data = .x)), 
         result = map2(model, data, ~data.frame(predict(.x, newdata = .y,
                       interval = "prediction",level = 0.95)))) %>%
  ungroup %>%
  select(-model) %>%
  unnest(c(data, result)) 

#  country Area   week amount   fit     lwr   upr
#   <chr>   <chr> <dbl>  <dbl> <dbl>   <dbl> <dbl>
# 1 US      G         1     12  20.8 -27.7    69.3
# 2 US      G         2     23  21.7 -22.0    65.4
# 3 US      G         3     34  22.6 -19.4    64.6
# 4 US      G         4     32  23.5 -20.2    67.2
# 5 US      G         5     12  24.4 -24.1    72.9
# 6 US      I         1     12  20.8 -33.9    75.5
# 7 US      I         2     34  30.5 -18.8    79.8
# 8 US      I         3     45  40.2  -7.17   87.6
# 9 US      I         4     65  49.9   0.595  99.2
#10 US      I         5     45  59.6   4.90  114. 
#11 UK      A         1     45  36.6  -6.05   79.2
#12 UK      A         2     34  37.1  -1.34   75.5
#13 UK      A         3     23  37.6   0.667  74.5
#14 UK      A         4     43  38.1  -0.341  76.5
#15 UK      A         5     43  38.6  -4.05   81.2

1
还有一个:

library(tidyverse)

data %>% 
  mutate(CountryArea=paste0(country,Area) %>% factor %>% fct_inorder) %>% 
  split(.$CountryArea) %>% 
  map(~lm(amount~week, data=.)) %>% 
  map(predict, interval = "prediction",level = 0.95) %>% 
  reduce(rbind) %>% 
  cbind(data, .)

   country Area week amount  fit         lwr       upr
1       US    G    1     12 20.8 -27.6791493  69.27915
2       US    G    2     23 21.7 -21.9985147  65.39851
3       US    G    3     34 22.6 -19.3841749  64.58417
4       US    G    4     32 23.5 -20.1985147  67.19851
5       US    G    5     12 24.4 -24.0791493  72.87915
6       US    I    1     12 20.8 -33.8985900  75.49859
7       US    I    2     34 30.5 -18.8046427  79.80464
8       US    I    3     45 40.2  -7.1703685  87.57037
9       US    I    4     65 49.9   0.5953573  99.20464
10      US    I    5     45 59.6   4.9014100 114.29859
11      UK    A    1     45 36.6  -6.0463638  79.24636
12      UK    A    2     34 37.1  -1.3409128  75.54091
13      UK    A    3     23 37.6   0.6671656  74.53283
14      UK    A    4     43 38.1  -0.3409128  76.54091
15      UK    A    5     43 38.6  -4.0463638  81.24636

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接