我有一个5GB的.dat文件(> 1000万行)。每行的格式如下:aaaa bb cccc0123 xxx kkkkkkkkkkkkkk
或者aaaaabbbcccc01234xxxkkkkkkkkkkkkkk
。由于使用readLines
读取大文件时性能较差,我选择使用fread()
来读取,但遇到了错误:
library("data.table")
x <- fread("test.DAT")
Error in fread("test.DAT") :
Expecting 5 cols, but line 5 contains text after processing all cols. It is very likely that this is due to one or more fields having embedded sep=' ' and/or (unescaped) '\n' characters within unbalanced unescaped quotes. fread cannot handle such ambiguous cases and those lines may not have been read in as expected. Please read the section on quotes in ?fread.
In addition: Warning message:
In fread("test.DAT") :
Unable to find 5 lines with expected number of columns (+ middle)
如何在不自动检测列的情况下,将fread()
用作readLines()
?或者还有其他解决此问题的方法吗?
sep="\n"
呢? - Cathfread(paste(f, collapse = "\n"))
。否则,我会直接使用fread
从文件中读取。 - Rich Scrivenfread(paste(f, collapse = "\n"))
执行起来也需要很长时间。 - Lazarus Thurston