我感到有些愚蠢,因为我无法看出问题所在...
根据 NEWS 文件的描述,
如果我使用
我做错了什么导致的内部和更快的POSIXct转换器无法工作?如果需要,我知道如何在读取后进行此操作,但是对于使用
(Windows-11, R-4.1.2, data.table-1.14.2)
如果感兴趣,
这种行为在使用
根据 NEWS 文件的描述,
fread
可以正确识别 ISO 8601 时间戳格式,例如 2020-07-24T10:11:12.134Z
(自版本1.13.0起)。但实际上并不正确:fread(text=c("now","2020-07-24T10:11:12.134Z"), colClasses="POSIXct", sep=",")
# now
# <POSc>
# 1: 2020-07-24
但是,如果我将 T
改为空格,则会返回正确的时间戳:
fread(text=c("now","2020-07-24 10:11:12.134Z"), colClasses="POSIXct", sep=",")
# now
# <POSc>
# 1: 2020-07-24 10:11:12
如果我使用
tz = ""
或 tz = "UTC"
,仍然会出现这个问题。(不出所料,如果我省略 colClasses =
,它甚至不会尝试转换。)我做错了什么导致的内部和更快的POSIXct转换器无法工作?如果需要,我知道如何在读取后进行此操作,但是对于使用
as.POSIXct
的大文件来说,后处理非常耗时。(Windows-11, R-4.1.2, data.table-1.14.2)
如果感兴趣,
verbose = TRUE
似乎并没有提供太多见解:fread(text=c("now","2020-07-24T10:11:12.134Z"), colClasses="POSIXct", sep=",", verbose=TRUE)
# OpenMP version (_OPENMP) 201511
# omp_get_num_procs() 16
# R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
# R_DATATABLE_NUM_THREADS unset
# R_DATATABLE_THROTTLE unset (default 1024)
# omp_get_thread_limit() 2147483647
# omp_get_max_threads() 16
# OMP_THREAD_LIMIT unset
# OMP_NUM_THREADS unset
# RestoreAfterFork true
# data.table is using 8 threads with throttle==1024. See ?setDTthreads.
# Input contains no \n. Taking this to be a filename to open
# [01] Check arguments
# Using 8 threads (omp_get_max_threads()=16, nth=8)
# NAstrings = [<<NA>>]
# None of the NAstrings look like numbers.
# show progress = 1
# 0/1 column will be read as integer
# [02] Opening the file
# Opening file C:\Users\r2\AppData\Local\Temp\Rtmpao7n9S\file49384a01388a
# File opened, size = 31 bytes.
# Memory mapped ok
# [03] Detect and skip BOM
# [04] Arrange mmap to be \0 terminated
# \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
# [05] Skipping initial rows if needed
# Positioned on line 1 starting: <<now>>
# [06] Detect separator, quoting rule, and ncolumns
# Using supplied sep ','
# No sep and quote rule found a block of 2x2 or greater. Single column input.
# Detected 1 columns on line 1. This line is either column names or first data row. Line starts as: <<now>>
# Quote rule picked = 0
# fill=false and the most number of columns found is 1
# [07] Detect column types, good nrow estimate and whether first row is column names
# Number of sampling jump points = 1 because (29 bytes from row 1 to eof) / (2 * 29 jump0size) == 0
# Type codes (jump 000) : C Quote rule 0
# 'header' determined to be true because all columns are type string and a better guess is not possible
# All rows were sampled since file is small so we know nrow=1 exactly
# [08] Assign column names
# [09] Apply user overrides on column types
# After 0 type and 0 drop user overrides : C
# [10] Allocate memory for the datatable
# Allocating 1 column slots (1 - 0 dropped) with 1 rows
# [11] Read the data
# jumps=[0..1), chunk_size=1048576, total_size=24
# Read 1 rows x 1 columns from 31 bytes file in 00:00.000 wall clock time
# [12] Finalizing the datatable
# Type counts:
# 1 : string 'C'
# =============================
# 0.000s ( 0%) Memory map 0.000GB file
# 0.000s ( 0%) sep='' ncol=1 and header detection
# 0.000s ( 0%) Column type detection using 1 sample rows
# 0.000s ( 0%) Allocation of 1 rows x 1 cols (0.000GB) of which 1 (100%) rows used
# 0.000s ( 0%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 1 rows) using 1 threads
# + 0.000s ( 0%) Parse to row-major thread buffers (grown 0 times)
# + 0.000s ( 0%) Transpose
# + 0.000s ( 0%) Waiting
# 0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
# 0.000s Total
# now
# <POSc>
# 1: 2020-07-24
fread(text=c("now","2020-07-24 10:11:12.134Z"), colClasses="POSIXct", sep=",", verbose=TRUE)
# OpenMP version (_OPENMP) 201511
# omp_get_num_procs() 16
# R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
# R_DATATABLE_NUM_THREADS unset
# R_DATATABLE_THROTTLE unset (default 1024)
# omp_get_thread_limit() 2147483647
# omp_get_max_threads() 16
# OMP_THREAD_LIMIT unset
# OMP_NUM_THREADS unset
# RestoreAfterFork true
# data.table is using 8 threads with throttle==1024. See ?setDTthreads.
# Input contains no \n. Taking this to be a filename to open
# [01] Check arguments
# Using 8 threads (omp_get_max_threads()=16, nth=8)
# NAstrings = [<<NA>>]
# None of the NAstrings look like numbers.
# show progress = 1
# 0/1 column will be read as integer
# [02] Opening the file
# Opening file C:\Users\r2\AppData\Local\Temp\Rtmpao7n9S\file493817cf4117
# File opened, size = 31 bytes.
# Memory mapped ok
# [03] Detect and skip BOM
# [04] Arrange mmap to be \0 terminated
# \n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
# [05] Skipping initial rows if needed
# Positioned on line 1 starting: <<now>>
# [06] Detect separator, quoting rule, and ncolumns
# Using supplied sep ','
# No sep and quote rule found a block of 2x2 or greater. Single column input.
# Detected 1 columns on line 1. This line is either column names or first data row. Line starts as: <<now>>
# Quote rule picked = 0
# fill=false and the most number of columns found is 1
# [07] Detect column types, good nrow estimate and whether first row is column names
# Number of sampling jump points = 1 because (29 bytes from row 1 to eof) / (2 * 29 jump0size) == 0
# Type codes (jump 000) : C Quote rule 0
# 'header' determined to be true because all columns are type string and a better guess is not possible
# All rows were sampled since file is small so we know nrow=1 exactly
# [08] Assign column names
# [09] Apply user overrides on column types
# After 0 type and 0 drop user overrides : C
# [10] Allocate memory for the datatable
# Allocating 1 column slots (1 - 0 dropped) with 1 rows
# [11] Read the data
# jumps=[0..1), chunk_size=1048576, total_size=24
# Read 1 rows x 1 columns from 31 bytes file in 00:00.000 wall clock time
# [12] Finalizing the datatable
# Type counts:
# 1 : string 'C'
# =============================
# 0.000s ( 0%) Memory map 0.000GB file
# 0.000s ( 0%) sep='' ncol=1 and header detection
# 0.000s ( 0%) Column type detection using 1 sample rows
# 0.000s ( 0%) Allocation of 1 rows x 1 cols (0.000GB) of which 1 (100%) rows used
# 0.000s ( 0%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 1 rows) using 1 threads
# + 0.000s ( 0%) Parse to row-major thread buffers (grown 0 times)
# + 0.000s ( 0%) Transpose
# + 0.000s ( 0%) Waiting
# 0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
# 0.000s Total
# now
# <POSc>
# 1: 2020-07-24 10:11:12.134
这种行为在使用
text=
代替 file=
时不会改变。
library(data.table) fread(text=c("now","2020-07-24T10:11:12.134Z"), colClasses="POSIXct", sep=",") #> now #> 1: 2020-07-24 10:11:12 #> #> data.table * 1.14.2 2021-09-27 [1] CRAN (R 4.1.2)
编辑:抱歉格式不好 编辑2:无论是否指定colClasses
参数,它都有效。 - Daniel Molitorcol_classes
参数后(即fread(text=c("now","2020-07-24T10:11:12.134Z"), sep=",")
)才有效。 - langtangrocker/shiny-verse:4.1.2
的Docker实例,无法重现这个问题... - r2evans