我试图处理超过2^32的大数字。虽然我也在使用data.table和fread,但我不认为问题与它们有关。我可以通过启用或禁用症状来解决问题,而不改变data.table或使用fread。我的症状是,当我期望正指数1e+3至1e+17时,我得到了一个报告平均值为4.1e-302。
当使用与integer64相关的bit64软件包和函数时,问题总是出现。在“常规大小的数据和R”中,我的工作是正常的,但我在这个软件包中没有正确表达。请参见下面的代码和数据。
我使用的是MacBook Pro、16GB、i7(已更新)。
我已经重新启动了R会话并清除了工作区,但问题仍然存在。
请您给予指导,我很感激您的帮助。我认为这可能与使用库bit64有关。
我查看的链接包括bit64文档。 有类似症状的问题由于fread()内存泄漏引起,但我认为我已经排除了此问题。
正常大小的 R 在正常数据上运行,仍使用 fread() 读入 data.table() 中的数据 - 可行
当使用与integer64相关的bit64软件包和函数时,问题总是出现。在“常规大小的数据和R”中,我的工作是正常的,但我在这个软件包中没有正确表达。请参见下面的代码和数据。
我使用的是MacBook Pro、16GB、i7(已更新)。
我已经重新启动了R会话并清除了工作区,但问题仍然存在。
请您给予指导,我很感激您的帮助。我认为这可能与使用库bit64有关。
我查看的链接包括bit64文档。 有类似症状的问题由于fread()内存泄漏引起,但我认为我已经排除了此问题。
这是我的输入数据
var1,var2,var3,var4,var5,var6,expected_row_mean,expected_row_stddev
1000 ,993 ,987 ,1005 ,986 ,1003 ,996 ,8
100000 ,101040 ,97901 ,100318 ,96914 ,97451 ,98937 ,1722
10000000 ,9972997 ,9602778 ,9160554 ,8843583 ,8688500 ,9378069 ,565637
1000000000 ,1013849241 ,973896894 ,990440721 ,1030267777 ,1032689982 ,1006857436 ,23096234
100000000000 ,103171209097 ,103660949260 ,102360301140 ,103662297222 ,106399064194 ,103208970152 ,2078732545
10000000000000 ,9557954451905 ,9241065464713 ,9357562691674 ,9376495364909 ,9014072235909 ,9424525034852 ,334034298683
1000000000000000 ,985333546044881 ,994067361457872 ,1034392968759970 ,1057553099903410 ,1018695335152490 ,1015007051886440 ,27363415718203
100000000000000000 ,98733768902499600 ,103316759127969000 ,108062824583319000 ,111332326225036000 ,108671041505404000 ,105019453390705000 ,5100048567944390
我的代码,与这个样本数据一起工作
# file: problem_bit64.R
# OBJECTIVE: Using larger numbers, I want to calculate a row mean and row standard deviation
# ERROR: I don't know what I am doing wrong to get such errors, seems bit64 related
# PRIORITY: BLOCKED (do this in Python instead?)
# reported Sat 9/24/2016 by Greg
# sample data:
# each row is 100 times larger on average, for 8 rows, starting with 1,000
# for the vars within a row, there is 10% uniform random variation. B2 = ROUND(A2+A2*0.1*(RAND()-0.5),0)
# Install development version of data.table --> for fwrite()
install.packages("data.table", repos = "https://Rdatatable.github.io/data.table", type = "source")
require(data.table)
require(bit64)
.Machine$integer.max # 2147483647 Is this an issue ?
.Machine$double.xmax # 1.797693e+308 I assume not
# -------------------------------------------------------------------
# ---- read in and basic info that works
csv_in <- "problem_bit64.csv"
dt <- fread( csv_in )
dim(dt) # 6 8
lapply(dt, class) # "integer64" for all 8
names(dt) # "var1" "var2" "var3" "var4" "var5" "var6" "expected_row_mean" "expected_row_stddev"
dtin <- dt[, 1:6, with=FALSE] # just save the 6 input columns
...现在问题开始了
# -------------------------------------------------------------------
# ---- CALCULATION PROBLEMS START HERE
# ---- for each row, I want to calculate the mean and standard deviation
a <- apply(dtin, 1, mean.integer64); a # get 8 values like 4.9e-321
b <- apply(dtin, 2, mean.integer64); b # get 6 values like 8.0e-308
# ---- try secondary variations that do not work
c <- apply(dtin, 1, mean); c # get 8 values like 4.9e-321
c <- apply(dtin, 1, mean.integer64); c # same result
c <- apply(dtin, 1, function(x) mean(x)); c # same
c <- apply(dtin, 1, function(x) sum(x)/length(x)); c # same results as mean(x)
##### I don't see any sd.integer64 # FEATURE REQUEST, Z-TRANSFORM IS COMMON
c <- apply(dtin, 1, function(x) sd(x)); c # unrealistic values - see expected
正常大小的 R 在正常数据上运行,仍使用 fread() 读入 data.table() 中的数据 - 可行
# -------------------------------------------------------------------
# ---- delete big numbers, and try regular stuff - WHICH WORKS
dtin2 <- dtin[ 1:3, ] # just up to about 10 million (SAME DATA, SAME FREAD, SAME DATA.TABLE)
dtin2[ , var1 := as.integer(var1) ] # I know there are fancier ways to do this
dtin2[ , var2 := as.integer(var2) ] # but I want things to work before getting fancy.
dtin2[ , var3 := as.integer(var3) ]
dtin2[ , var4 := as.integer(var4) ]
dtin2[ , var5 := as.integer(var5) ]
dtin2[ , var6 := as.integer(var6) ]
lapply( dtin2, class ) # validation
c <- apply(dtin2, 1, mean); c # get 3 row values AS EXPECTED (matching expected columns)
c <- apply(dtin2, 1, function(x) mean(x)); c # CORRECT
c <- apply(dtin2, 1, function(x) sum(x)/length(x)); c # same results as mean(x)
c <- apply(dtin2, 1, sd); c # get 3 row values AS EXPECTED (matching expected columns)
c <- apply(dtin2, 1, function(x) sd(x)); c # CORRECT
Brobdingnag
吗?它们可能无法很好地与 data.table 协作,但你也没有真正使用 data.table 的特殊功能。你甚至可以使用data.table = FALSE
的 fread 函数来获取数据框。 - dracodoc