我正在运行一些从R脚本中获取网络信息的操作。问题在于,即使我使用gc()
清除了会话,内存仍然会不断增长,直到我的会话崩溃。
以下是脚本:
library(XML)
library(RJDBC)
library(RCurl)
procesarPublicaciones <- function(tabla){
log_file <<- file(log_path, open="a")
drv <<- JDBC("oracle.jdbc.OracleDriver", classPath="C:/jdbc/jre6/ojdbc6.jar"," ")
con <<- dbConnect(drv, "server_path", "user", "password")
query <- paste("SELECT * FROM",tabla,sep=' ')
bool <- tryCatch(
{
## Get a list of URLs from a DB
listUrl <- dbGetQuery(con, query)
if( nrow(listUrl) != 0) TRUE else FALSE
dbDisconnect(con)
}, error = function(e) return(FALSE)
)
if( bool ) {
file.create(data_file)
apply(listUrl,c(1),procesarHtml)
}else{
cat("\n",getTime(),"\t[ERROR]\t\t", file=log_file)
}
cat( "\n",getTime(),"\t[INFO]\t\t FINISH", file=log_file)
close(log_file)
}
procesarHtml <- function(pUrl){
headerGatherer <- basicHeaderGatherer()
html <- getURI(theUrl, headerfunction = headerGatherer$update, curl = curlHandle)
heatherValue <- headerGatherer$value()
if ( heatherValue["status"] == "200" ){
doc <- htmlParse(html)
tryCatch
(
{
## Here I get all the info that I need from the web and write it on a file.
## here is a simplification
info1 <- xpathSApply(doc, xPath.info1, xmlValue)
info2 <- xpathSApply(doc, xPath.info2, xmlValue)
data <- data.frame(col1 = info1, col2=info2)
write.table(data, file=data_file , sep=";", row.names=FALSE, col.names=FALSE, append=TRUE)
}, error= function(e)
{
## LOG ERROR
}
)
rm(info1, info2, data, doc)
}else{
## LOG INFO
}
rm(headerGatherer,html,heatherValue)
cat("\n",getTime(),"\t[INFO]\t\t memory used: ", memory.size()," MB", file=log_file)
gc()
cat("\n",getTime(),"\t[INFO]\t\t memory used after gc(): ", memory.size()," MB", file=log_file)
}
尽管我使用
rm()
删除所有内部变量并使用gc()
,但内存仍然在增长。看起来我从网页中获取的所有HTML都被保存在内存中。以下是我的会话信息:
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 3
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RCurl_1.95-4.6 bitops_1.0-6 RJDBC_0.2-5 rJava_0.9-6 DBI_0.3.1
[6] XML_3.98-1.1
loaded via a namespace (and not attached):
[1] tools_3.2.0
--------------------编辑于2015年6月8日--------------------
我仍然遇到这个问题,但我在其他帖子中找到了相同的问题,该问题显然已解决。
rm()
之前添加了free()
,但是没有起作用。尽管比以前慢,但内存仍然不断增长。还有其他建议吗? - Santiago P