使用dplyr从数据框中计算分组中位数。

3
计算中位数对于R来说似乎有点棘手(即没有数据框方法)。使用dplyr从数据框获取组中位数需要输入的最少字符是什么?
my_data <- structure(list(group = c("Group 1", "Group 1", "Group 1", "Group 1", 
"Group 1", "Group 1", "Group 1", "Group 1", "Group 1", "Group 1", 
"Group 1", "Group 1", "Group 1", "Group 1", "Group 1", "Group 2", 
"Group 2", "Group 2", "Group 2", "Group 2", "Group 2", "Group 2", 
"Group 2", "Group 2", "Group 2", "Group 2", "Group 2", "Group 2", 
"Group 2", "Group 2"), value = c("5", "3", "6", "8", "10", "13", 
"1", "4", "18", "4", "7", "9", "14", "15", "17", "7", "3", "9", 
"10", "33", "15", "18", "6", "20", "30", NA, NA, NA, NA, NA)), .Names = c("group", 
"value"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-30L))

library(dplyr)  

# groups 1 & 2
my_data_groups_1_and_2 <- my_data[my_data$group %in% c("Group 1", "Group 2"), ]

# compute medians per group
medians <- my_data_groups_1_and_2 %>%
  group_by(group) %>%
  summarize(the_medians = median(value, na.rm = TRUE)) 

这将会给出:

Error in summarise_impl(.data, dots) : 
  STRING_ELT() can only be applied to a 'character vector', not a 'double'
In addition: Warning message:
In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
  argument is not numeric or logical: returning NA

这里最省力的解决方法是什么?

1
д№ҹи®ёжҲ‘еңЁиҝҷйҮҢжңүжүҖйҒ—жјҸпјҢдҪҶжҳҜиҝҷйҡҫйҒ“дёҚжҳҜеӣ дёәis.character(my_data_groups_1_and_2$value)зҡ„з»“жһңжҳҜTRUEеҗ—пјҹж·»еҠ дёҖдёӘmutateпјҢ并е°ҶvalueиҪ¬жҚўдёәdoubleзұ»еһӢеҸҜд»Ҙи®©дёӯдҪҚж•°еҫ—еҲ°и®Ўз®—гҖӮ - Matt Upson
1个回答

4

正如 ivyleavedtoadflax 所评论的那样,错误是由于向 median 提供非数字或非逻辑参数引起的,因为您的 value 列的类型是 character(您可以轻松地看出它们不是 numeric,因为数字带引号)。以下是两种简单的解决方法:

my_data %>% 
  filter(group %in% c("Group 1", "Group 2")) %>%
  group_by(group) %>%
  summarize(the_medians = median(as.numeric(value), na.rm = TRUE)) 

或者
my_data %>% 
  filter(group %in% c("Group 1", "Group 2")) %>%
  mutate(value = as.numeric(value))  %>%
  group_by(group) %>%
  summarize(the_medians = median(value, na.rm = TRUE)) 

为了查看数据中列的结构,包括列的type,您可以方便地使用:
str(my_data)
#Classes ‘tbl_df’ and 'data.frame': 30 obs. of  2 variables:
# $ group: chr  "Group 1" "Group 1" "Group 1" "Group 1" ...
# $ value: chr  "5" "3" "6" "8" ...

谢谢,太完美了,比我想象的简单多了。我完全忽略了数字作为字符类型的错误信息。 - Ben

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接