我正在清理存储在tibble中的数据,但是我反复将一些空字符串观测值转换为
NA
,然后在调用summary(df)
检查我的工作时,这些观测值似乎消失了。看起来,在使用tibble()
时,只有非字符列才会报告NA
。为什么会这样?这是故意的吗?如果是,为什么?
最小示例:
tdf <- tibble::tibble(a = c("apple", "pear", NA),
b = 1:3, c = factor(letters[1:3]))
# We see that the NA in the 'chr' column is not displayed
summary(tdf)
#> a b c
#> Length:3 Min. :1.0 a:1
#> Class :character 1st Qu.:1.5 b:1
#> Mode :character Median :2.0 c:1
#> Mean :2.0
#> 3rd Qu.:2.5
#> Max. :3.0
# But NA in other column types will be
tdf[3, 2:3] <- NA
summary(tdf)
#> a b c
#> Length:3 Min. :1.00 a :1
#> Class :character 1st Qu.:1.25 b :1
#> Mode :character Median :1.50 c :0
#> Mean :1.50 NA's:1
#> 3rd Qu.:1.75
#> Max. :2.00
#> NA's :1
# This behavior is not the same with data.frame
ddf <- data.frame(a = c("apple", "pear", NA),
b = 1:3, c = factor(letters[1:3]))
summary(ddf)
#> a b c
#> apple:1 Min. :1.0 a:1
#> pear :1 1st Qu.:1.5 b:1
#> NA's :1 Median :2.0 c:1
#> Mean :2.0
#> 3rd Qu.:2.5
#> Max. :3.0
ddf[3, 2:3] <- NA
summary(ddf)
#> a b c
#> apple:1 Min. :1.00 a :1
#> pear :1 1st Qu.:1.25 b :1
#> NA's :1 Median :1.50 c :0
#> Mean :1.50 NA's:1
#> 3rd Qu.:1.75
#> Max. :2.00
#> NA's :1
此内容由 reprex 包(v0.2.0)于2018年3月1日创建。
tdf %>% group_by(a) %>% tally
将会给你NA
的计数。 - loki