导入数据集时出现问题: `scan(...)`中的错误：第1行没有145个元素。

Question

导入数据集时出现问题: `scan(...)`中的错误：第1行没有145个元素。

66

我正在尝试使用 read.table() 函数在 R 中导入我的数据集：

Dataset.df <- read.table("C:\\dataset.txt", header=TRUE)

但我收到了以下错误信息：

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
   line 1 did not have 145 elements

这是什么意思，我该如何修复？

- REnthusiast

12个回答

32

运行时遇到此错误并查看我的数据集，发现没有缺失数据，我发现一些条目中有特殊字符“#”，这导致了数据导入的失败。一旦我从有问题的单元格中删除“#”，数据就可以成功导入。

- Greg Kennedy

25

您也可以使用 read.table(..., comment.char = "") 命令来关闭文件中注释的解释。 - Rich Scriven

18

撇号也可能导致这个问题出现（ ' ）。通过设置选项 quote = """ 来解决此问题。 - Stuart

10

我在将Add Health数据中的一些文件导入到R中时遇到了这个问题（见：http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/21600?archive=ICPSR&q=21600）。例如，以下命令用于读取以制表符分隔的.tsv格式的DS12数据文件，将会生成以下错误：

ds12 <- read.table("21600-0012-Data.tsv", sep="\t", comment.char="", 
quote = "\"", header=TRUE)

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  : line 2390 did not have 1851 elements

看起来部分文件存在格式问题，导致 R 拒绝了这些文件。问题的一部分原因似乎是有时使用了双引号而不是撇号，这导致一行中的双引号数量不匹配。

经过尝试，我已经找到了三种可能的解决方案：

Open the file in a text editor and search/replace all instances of a quote character " with nothing. In other words, delete all double quotes. For this tab-delimited data, this meant only that some verbatim excerpts of comments from subjects were no longer in quotes which was a non-issue for my data analysis.
With data stored on ICPSR (see link above) or other archives another solution is to download the data in a new format. A good option in this case is to download the Stata version of the DS12 and then open it using the read.dta command as follows:
```
library(foreign)
ds12 <- read.dta("21600-0012-Data.dta")
```
A related solution/hack is to open the .tsv file in Excel and re-save it as a tab separated text file. This seems to clean up whatever formatting issue makes R unhappy.

这些方法都不是理想的解决方案，因为它们无法完全解决原始.tsv文件在R中的问题，但数据整理通常需要使用多个程序和格式。

- Omar Wasow

4

如果你正在使用Linux系统，而数据文件来自Windows系统，那么可能是因为字符 ^M 的原因导致的。请找到并删除它。完成！

- user1526449

2

对于那些找不到解决方案且知道数据没有缺失元素的人：

当我使用Excel 2013将文件保存为.csv，然后尝试使用read.table()在R中加载这些文件时，就会出现这个问题。我发现的解决方法是直接从Excel粘贴数据到一个.txt文档，然后用以下方法打开：

read.table(file.choose(), sep="\t").

我希望这可以帮到您。

- user2859829

1

我的一个变量是分类变量，其中一个选项是多字符串（“无事件”）。当我使用read.table时，它假定第一个字符串后的空格表示数据点的结束，第二个字符串被推到下一个变量。我使用sep="\t"来解决这个问题。我在Mac OX环境中使用RStudio。

以前的解决方案是在Excel中将.txt文件转换为.csv文件，然后使用read.csv函数打开它们。

- Jose Champsaur

1

被投反对。扩展名与R无关。只要文件是ASCII格式的，无论文件名是something.txt还是something.xls或something.csv，都可以使用read.csv()。此外，请注意，read.csv()与read.table(sep=',', header=TRUE)相同，只是后者的简写形式。 - Ricardo Magalhães Cruz

1

在这个错误中，哈希符号# 是导致问题的原因。如果您从列名的开头删除#，可能会解决问题。基本上，在行之间，当列名以#开头时，read.table()将其识别为该行的起点。

- SGV

0

当我使用一个名为"id"的列时，遵循tutorial中的row.names="id"，我遇到了这个错误。

- QED

0

除了上述提到的所有指导，您还可以检查所有数据。

如果单词之间有空格，必须用"_"替换它们。

然而，这是我解决自己问题的方法。

- Jack.yu

0

这个简单的方法解决了我的问题：复制数据集的内容，打开一个空的Excel表格，选择“粘贴特殊”->“值”，然后保存。导入新文件即可。

（我尝试了所有现有的解决方案，但都没有对我起作用。我的旧数据集似乎没有缺失值、空格、特殊字符或嵌入式公式。）

- Lux

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

这个错误相当明显。你的数据文件的第一行（或者如果你使用了header = TRUE，那么可能是第二行）似乎缺少数据。

下面是一个小示例：

## Create a small dataset to play with
cat("V1 V2\nFirst 1 2\nSecond 2\nThird 3 8\n", file="test.txt")

R自动检测到它应该期望带有行名称和两列（3个元素），但是它在第2行上没有找到3个元素，因此会产生错误：

read.table("test.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 2 did not have 3 elements

查看数据文件，确定是否存在问题：

cat(readLines("test.txt"), sep = "\n")
# V1 V2
# First 1 2
# Second 2
# Third 3 8

可能需要手动更正，或者我们可以假定“第二”行的第一个值应该在第一列中，并且其他值应该是NA。如果是这种情况，fill = TRUE就足以解决你的问题。

read.table("test.txt", header = TRUE, fill = TRUE)
#        V1 V2
# First   1  2
# Second  2 NA
# Third   3  8

R也足够聪明，即使缺少行名，也能自动计算需要多少个元素：

cat("V1 V2\n1\n2 5\n3 8\n", file="test2.txt")
cat(readLines("test2.txt"), sep = "\n")
# V1 V2
# 1
# 2 5
# 3 8
read.table("test2.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 1 did not have 2 elements
read.table("test2.txt", header = TRUE, fill = TRUE)
#   V1 V2
# 1  1 NA
# 2  2  5
# 3  3  8