向MonetDBLite表中添加新列

5

我正在尝试使用R + MonetDB作为大数据分析栈,并且在创建一个新列并将其填充为来自我的分析数据时遇到了问题。这里是一个示例:

library(MonetDBLite)
library(DBI)

data(mtcars)
db <- dbConnect(MonetDB.R::MonetDB(), embedded="./test.db")

# load mtcars into the database
dbWriteTable(conn=db, value = mtcars, name = "mtcars", overwrite=TRUE)

## Add a new column
dbSendQuery(db, "ALTER TABLE mtcars ADD v1 DOUBLE;")


## insert data into the new column
v1 <- mtcars["mpg"] * pi
dbSendQuery(db, "INSERT INTO mtcars (v1) VALUES (?)", bind.data=v1)

错误信息:

Error in .local(conn, statement, ...) : 
  Unable to execute statement 'INSERT INTO mtcars (v1) VALUES ('c(65.9734457253857, 65.9734457253857, 71.6283125018473, 67.23008278...'.
Server says 'ERROR: 49.6371639267187, 61.8893752757189, 47.1238898038469, 67.2300827868216)' to type dbl failed.
' [#conversion of string 'c(65.9734457253857, 65.9734457253857, 71.6283125018473, 67.2300827868216, 58.7477826221291, 56.8628270299753, 44.924774946334, 76.6548607475909, 71.6283125018473, 60.318578948924, 55.9203492338983, 51.5221195188726, 54.3495529071034, 47.7522083345649, 32.6725635973338, 32.6725635973338, 46.18141200777, 101.787601976309, 95.5044166691297, 106.499990956694, 67.5442420521806, 48.6946861306418, 47.7522083345649, 41.7831822927443, 60.318578948924, 85.7654794430014, 81.6814089933346, 95.5044166691297, 
].
In addition: Warning message:
In if (is.na(value)) statement <- sub("?", "NULL", statement, fixed = TRUE) else if (valueClass %in%  :
  the condition has length > 1 and only the first element will be used

从这个错误信息来看,也许 bind.data 不能与 MonetDBLite 一起使用?

问题:

我该如何向 MonetDBLite 表中添加列并将其填充为来自我的 R 会话的数据?


1
获取新列的最简单方法是从SQL计算它,例如 UPDATE mtcars SET v1 = mpg*3.14; 或类似语句。此外,INSERT 不会更改表格,而是在末尾添加值。 - Hannes Mühleisen
2个回答

0
首先,你最后一条语句中的“INSERT”命令是不正确的。你需要使用“UPDATE”语句。
话虽如此,我提出了一个解决方案,让你可以直接从R中填充你的MonetDBLite表格:
library(MonetDBLite)
library(DBI)

data(mtcars)
db <- dbConnect(MonetDB(), embedded="./test.db")

# I added a rownbr to the dataset so it will be easier later
mtcars$rownbr <- 1:nrow(mtcars)

# load mtcars into the database
dbWriteTable(conn=db, value = mtcars, name = "mtcars", overwrite=TRUE)

## Add a new column
dbSendQuery(db, "ALTER TABLE mtcars ADD v1 DOUBLE;")


## insert data into the new column
v1 <- mtcars["mpg"] * pi

for (i in 1:nrow(mtcars)){
   myquery <- paste0("UPDATE mtcars SET v1 = ",v1$mpg[i], "where rownbr =",i," ;")
   dbSendQuery(db, myquery )
}

能不能一次添加整个列?一个一个值添加会很慢吧。 - F. Privé
直觉上,我会说这肯定是可能的。虽然我并没有立即想出如何做到。 - Arno

0

使用dbBind通过MonetDBLite进行参数化SQL查询似乎存在问题(请参见https://github.com/hannesmuehleisen/MonetDBLite-R/issues/16)。以下代码适用于SQLite:

library(RSQLite)
data(mtcars)
db <- dbConnect(SQLite(), ":memory:")
# load mtcars into the database
dbWriteTable(conn=db, value = mtcars, name = "mtcars", overwrite = TRUE, row.names=TRUE)

## Add a new column
dbSendQuery(db, "ALTER TABLE mtcars ADD v1 DOUBLE;")

## do computation with R
mtcars$v1 <- mtcars$mpg * pi
mtcars$row_names <- rownames(mtcars)

update_query <- dbSendQuery(db, 'update mtcars set "v1"=$v1 WHERE row_names=$row_names')

dbBind(update_query, mtcars[, c("v1", "row_names")])  # send the updated data
dbClearResult(update_query)  # release the prepared statement
dbReadTable(db, "mtcars")
dbDisconnect(db)

然而,使用MonetDBLite时,在dbBind步骤中会产生一个错误(和一个警告):
> dbBind(update_query, mtcars[, c("v1", "row_names")])  # send the updated data
Error in vapply(params, function(x) { : values must be length 1,
 but FUN(X[[1]]) result is length 32
In addition: Warning message:
In if (is.na(x)) "NULL" else if (is.numeric(x) || is.logical(x)) { :
  the condition has length > 1 and only the first element will be used

我想到的一个解决方法是使用glue包中的glue_data_sql来“手动”组合查询(而不需要遍历行):
library(MonetDBLite)
library(DBI)

data(mtcars)
db <- dbConnect(MonetDB(), embedded="./test.db")
dbWriteTable(conn=db, value = mtcars, name = "mtcars", overwrite = TRUE, row.names=TRUE)

dbSendQuery(db, "ALTER TABLE mtcars ADD v1 DOUBLE;")

library(glue)
mtcars$row_names <- rownames(mtcars)
mtcars$v1 <- mtcars$mpg * pi
update_query <- glue_data_sql(mtcars, "update mtcars set v1 = {v1} where row_names = {row_names};", .con=db)
lapply(update_query, dbSendQuery, conn=db)

# verify
dbReadTable(db, "mtcars")
dbDisconnect(db)

另一种解决方案是在一个查询中执行insert(更接近@Zelazny7的原始尝试):

library(MonetDBLite)
library(DBI)

data(mtcars)
db <- dbConnect(MonetDB(), embedded="./test.db")

dbSendQuery(db, "CREATE TABLE mtcars (
                 row_names VARCHAR(32),
                 v1 DOUBLE);")

library(glue)
mtcars$row_names <- rownames(mtcars)
mtcars$v1 <- mtcars$mpg * pi
insert_values <- glue_data(mtcars, "('{row_names}', {v1})")
insert_values <- glue_collapse(insert_values, sep=", ", last="")
insert_query <- glue("INSERT INTO mtcars (row_names, v1) VALUES {insert_values}")
dbSendQuery(db, insert_query)

dbReadTable(db, "mtcars")
dbDisconnect(db)

尝试使用DBI的sqlInterpolate - Parfait
很好,使用glue。然而,这基本上与@Arno的答案相同,您需要循环遍历所有行。 - F. Privé
这要取决于你对 R 中“循环”的解释。*apply 调用是向量化的,在 R 中通常比循环更有效率。即使使用 dbSendQuery + dbBind 方法,也不是只用一个查询就能完成,而是多个查询,dbBind 处理参数化更新子句后才会执行。 - LmW.

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接