我正试图在数百万个观测值上生成7个或多个变量的值,当我使用for循环实现时,它需要很长时间。以下是我尝试实现的示例。在这种情况下,由于只有几千个观测值,所以速度很快:
# Load dplyr
library(tidyverse)
set.seed(50)
df <- data_frame(SlNo = 1:2000,
Scenario = rep(c(1, 2, 3, 4),500),
A = round(rnorm(2000, 11, 6)),
B = round(rnorm(2000, 15, 4))) %>%
arrange(Scenario)
#splitting data-frame to add multiple rows in the data-frame
df<- df %>% split(f = .$Scenario) %>%
map_dfr(~bind_rows(tibble(Scenario = 0), .x))
#observations for certain variables in the newly added rows have specific values
df <- df %>% mutate(C = if_else(Scenario != 0, 0, 4),
E = if_else(Scenario != 0, 0, 6))
for(i in 2:nrow(df)) {
df$C[i] <- if_else(df$Scenario[i] != 0, (1-0.5) * df$C[i-1] + 3 + 2 + df$B[i] + df$E[i-1],
df$C[i])
df$E[i] <- if_else(df$Scenario[i] != 0, df$C[i] + df$B[i] - 50, df$E[i])
}
df
# A tibble: 2,004 x 6
Scenario SlNo A B C E
<dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 0 NA NA NA 4 6
2 1 1 14 19 32 1
3 1 5 1 13 35 -2
4 1 9 17 20 40.5 10.5
5 1 13 8 7 42.8 -0.25
6 1 17 10 16 42.1 8.12
7 1 21 9 12 46.2 8.19
8 1 25 14 18 54.3 22.3
9 1 29 14 15 69.4 34.4
10 1 33 4 17 91.1 58.1
# ... with 1,994 more rows
我希望在处理更大的数据框时能够快速获得类似的结果。非常感谢您的帮助。提前致谢!
data.table
并用一些更快的方法替换for循环吗? - NelsonGoncumsum
和lag
的东西来解决。 - Jon Spring