我正在使用mgcv拟合一些GAM,其中在某些情况下需要高k
值来捕获复杂的行为。但是我注意到有时(某些数据集),当k
很高时,拟合GAM需要很长时间,并且在这些情况下似乎也无法收敛。我需要尝试拟合大量数据集,不能在遇到其中一个可能需要很长时间并最终失败的数据集时等待三天!
R.utils::withTimeout()
,它似乎是一个很有前途的工具,可以确保如果我遇到这些游戏时间陷阱之一,我就可以继续前进,但它对我来说表现不一致。下面是连续三次运行相同脚本的输出结果。请注意,在这三次中的第一次中,超时显然没有发生。我模糊地理解 有些情况 下,withTimeout
有望失败... 我想知道如何强制停止 mgcv::gam
,可能需要使用不同的工具(我看到了对包 processx 的 参考)。这看起来很多,但实际上是完全重复的;下面还有会话信息。从下面的代码中可以看出,关键点是
withTimeout
显然第一次什么也没做(花了30秒),然后每次都会启动(8秒似乎足够接近6秒)。谢谢!
> tictoc::tic()
> set.seed(2) ## simulate some data...
> dat <- mgcv::gamSim(1
+ , n = 5000
+ , dist = "normal"
+ , scale = 2)
Gu & Wahba 4 term additive model
> b <- mgcv::gam(y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3, k = myk)
+ , data = dat)
> summary(b)
Family: gaussian
Link function: identity
Formula:
y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3,
k = myk)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.84735 0.02869 273.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x0) 5.057 6.321 87.162 <2e-16 ***
s(x1) 3.739 4.671 813.470 <2e-16 ***
s(x2) 20.716 25.865 354.052 <2e-16 ***
s(x3) 3.298 4.121 1.496 0.197
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.728 Deviance explained = 73%
GCV = 4.1439 Scale est. = 4.1158 n = 5000
> tictoc::toc()
30.418 sec elapsed
> # just did this script twice; this part takes me 30 seconds
>
> tmax <- 6
>
> tictoc::tic()
> gf <- tryCatch({
+ res <- R.utils::withTimeout({
+ set.seed(2) ## simulate some data...
+ dat <- mgcv::gamSim(1
+ , n = 5000
+ , dist = "normal"
+ , scale = 2)
+ b <- mgcv::gam(y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3, k = myk)
+ , data = dat)
+ summary(b)
+ }, timeout = tmax)
+ }, TimeoutException = function(ex) {
+ message(paste("Timeout before gam fit complete (should take", tmax, "seconds)"))
+
+ })
Gu & Wahba 4 term additive model
> tictoc::toc()
30.332 sec elapsed
> myk <- 120
>
> tictoc::tic()
> set.seed(2) ## simulate some data...
> dat <- mgcv::gamSim(1
+ , n = 5000
+ , dist = "normal"
+ , scale = 2)
Gu & Wahba 4 term additive model
> b <- mgcv::gam(y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3, k = myk)
+ , data = dat)
> summary(b)
Family: gaussian
Link function: identity
Formula:
y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3,
k = myk)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.84735 0.02869 273.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x0) 5.057 6.321 87.162 <2e-16 ***
s(x1) 3.739 4.671 813.470 <2e-16 ***
s(x2) 20.716 25.865 354.052 <2e-16 ***
s(x3) 3.298 4.121 1.496 0.197
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.728 Deviance explained = 73%
GCV = 4.1439 Scale est. = 4.1158 n = 5000
> tictoc::toc()
30.415 sec elapsed
> # just did this script twice; this part takes me 30 seconds
>
> # I expect the next part to take 6 seconds based on tmax.
> # dpeending on n and k, it either does, or it can take WAYYY longer. How can I
> # guarantee 6 seconds.
> tmax <- 6
>
> tictoc::tic()
> gf <- tryCatch({
+ res <- R.utils::withTimeout({
+ set.seed(2) ## simulate some data...
+ dat <- mgcv::gamSim(1
+ , n = 5000
+ , dist = "normal"
+ , scale = 2)
+ b <- mgcv::gam(y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3, k = myk)
+ , data = dat)
+ summary(b)
+ }, timeout = tmax)
+ }, TimeoutException = function(ex) {
+ message(paste("Timeout before gam fit complete (should take", tmax, "seconds)"))
+
+ })
Gu & Wahba 4 term additive model
Timeout before gam fit complete (should take 6 seconds)
> tictoc::toc()
8.112 sec elapsed
> # Test timeout with mgcv::gam
> myk <- 120
>
> tictoc::tic()
> set.seed(2) ## simulate some data...
> dat <- mgcv::gamSim(1
+ , n = 5000
+ , dist = "normal"
+ , scale = 2)
Gu & Wahba 4 term additive model
> b <- mgcv::gam(y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3, k = myk)
+ , data = dat)
> summary(b)
Family: gaussian
Link function: identity
Formula:
y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3,
k = myk)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.84735 0.02869 273.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x0) 5.057 6.321 87.162 <2e-16 ***
s(x1) 3.739 4.671 813.470 <2e-16 ***
s(x2) 20.716 25.865 354.052 <2e-16 ***
s(x3) 3.298 4.121 1.496 0.197
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.728 Deviance explained = 73%
GCV = 4.1439 Scale est. = 4.1158 n = 5000
> tictoc::toc()
30.318 sec elapsed
> # just did this script twice; this part takes me 30 seconds
>
> # I expect the next part to take 6 seconds based on tmax.
> # dpeending on n and k, it either does, or it can take WAYYY longer. How can I
> # guarantee 6 seconds.
> tmax <- 6
>
> tictoc::tic()
> gf <- tryCatch({
+ res <- R.utils::withTimeout({
+ set.seed(2) ## simulate some data...
+ dat <- mgcv::gamSim(1
+ , n = 5000
+ , dist = "normal"
+ , scale = 2)
+ b <- mgcv::gam(y ~ s(x0, k = myk) + s(x1, k = myk) + s(x2, k = myk) + s(x3, k = myk)
+ , data = dat)
+ summary(b)
+ }, timeout = tmax)
+ }, TimeoutException = function(ex) {
+ message(paste("Timeout before gam fit complete (should take", tmax, "seconds)"))
+
+ })
Gu & Wahba 4 term additive model
Timeout before gam fit complete (should take 6 seconds)
> tictoc::toc()
8.04 sec elapsed
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 compiler_4.2.3 pillar_1.8.1 R.methodsS3_1.8.2 R.utils_2.12.2 tools_4.2.3 mvnfast_0.2.8 digest_0.6.31 evaluate_0.20 lubridate_1.9.2
[11] lifecycle_1.0.3 tibble_3.2.1 nlme_3.1-162 gtable_0.3.2 lattice_0.20-45 timechange_0.2.0 mgcv_1.8-42 pkgconfig_2.0.3 rlang_1.1.0 Matrix_1.5-3
[21] cli_3.6.0 rstudioapi_0.14 patchwork_1.1.2.9000 yaml_2.3.7 parallel_4.2.3 xfun_0.37 fastmap_1.1.1 gratia_0.8.1 knitr_1.42 stringr_1.5.0
[31] furrr_0.3.1 dplyr_1.1.0 generics_0.1.3 vctrs_0.6.0 globals_0.16.2 tictoc_1.1 grid_4.2.3 tidyselect_1.2.0 glue_1.6.2 listenv_0.9.0
[41] R6_2.5.1 fansi_1.0.4 parallelly_1.34.0 rmarkdown_2.20 tidyr_1.3.0 purrr_1.0.1 ggplot2_3.4.1 magrittr_2.0.3 htmltools_0.5.4 scales_1.2.1
[51] codetools_0.2-19 splines_4.2.3 colorspace_2.1-0 future_1.32.0 utf8_1.2.3 stringi_1.7.12 munsell_0.5.0 R.oo_1.25.0