在R plotly中去除离群值和异常点

3

我有连续数据,想要使用Rplotly进行箱形图(box plot)小提琴图(violin plot)绘制,但不需要离群值和触须:

set.seed(1)
df <- data.frame(group=c(rep("g1",500),rep("g2",700),rep("g3",600)),
                 value=c(c(rep(0,490),runif(10,10,15)),abs(rnorm(700,1,10)),c(rep(0,590),runif(10,10,15))),
                 stringsAsFactors = F)
df$group <- factor(df$group, levels = c("g1","g2","g3"))

我知道如何在 plotly 中删除异常值:

plotly::plot_ly(x = df$group, y =df$value, type = 'box', color = df$group, boxpoints = F, showlegend = F)

这里输入图片描述

但是我仍然有胡须。

我尝试使用ggplot2来解决这个问题(还限制了y轴的高度,使其不超过第75个百分位):

library(ggplot2)
gp <- ggplot(df, aes(group, value, color = group, fill = group)) + geom_boxplot(outlier.shape = NA, coef = 0) +
  scale_y_continuous(limits = c(0, ceiling(max(dplyr::summarise(dplyr::group_by(df, group), tile = quantile(value, probs = 0.75))$tile)))) +
  theme_minimal() + theme(legend.position = "none",axis.title = element_blank())

这里输入图片描述

但是将其转换为plotly对象后,就无法保持该格式:

plotly::ggplotly(gp)

您好,以下是您需要的翻译内容:

图片描述在这里输入

有什么想法吗?


希望对您有所帮助。
1个回答

3
这是一个解决方法。
我先稍微修改了你的情节。
# box without outliers
p <- plot_ly(df, x = ~group, y = ~value, type = 'box', 
        color = ~group, boxpoints = F, showlegend = F,
        whiskerwidth = 0, line = list(width = 0)) # no whisker, max or min line

然后我将中位数添加回图表中。这需要计算中位数,匹配颜色,并创建Plotly的形状列表。
至于颜色,有点奇怪,前三个默认颜色被使用,但顺序是g3,g2,g1...
# the medians
res = df %>% group_by(group) %>% 
  summarise(med = median(value))

# default color list: https://community.plotly.com/t/plotly-colours-list/11730/2
col = rev(c('#1f77b4', '#ff7f0e', '#2ca02c')) # the plot is colored 3, 2, 1

# discrete x-axis; domain default [0, 1]
# default box margin = .08, three groups, each get 1/3 of space
details <- function(col){ # need everytime basics
  list(type = 'line',
       line = list(color = col, width = 4),
       xref = "paper", yref = "y")
}

# horizontal segments/ median
segs = lapply(1:nrow(res),
              function(k){
                x1 <- k/3 - .08  # if the domain is [0, 1]
                x0 <- (k - 1)/3 + .08
                y0 <- y1 <- res[k, ]$med
                line = list("x0" = x0, "x1" = x1,
                            "y0" = y0, "y1" = y1)
                deets = details(col[k])
                c(deets, line)
              })

最后,我把它们重新添加到图表上。
p %>% layout(shapes = segs)

我把线条变得非常宽,但你可以理解我的意思。

enter image description here

如果您想要IQR轮廓回来,您也可以这样做。我在这里也使用了函数。我认为您提供的数据不是实际数据,所以该函数将有作用。
# include IQR outline
res2 = df %>% group_by(group) %>% 
  summarise(q1 = setNames(quantile(value, type = 7, 1/4), NULL),
            q3 = setNames(quantile(value, type = 7, 3/4), NULL),
            med = median(value))

# IQR segments 
rects = lapply(1:nrow(res2),         # if the domain is [0, 1]
               function(k){
                 x1 <- k/3 - .08
                 x0 <- (k - 1)/3 + .08
                 y0 <- res2[k, ]$q1
                 y1 <- res2[k, ]$q3
                 line = list(color = col[k], width = 4)
                 rect = list("x0" = x0, "x1" = x1,
                             "y0" = y0, "y1" = y1,
                             type = "rect", xref = "paper",
                             yref = "y", "line" = line)
                 rect
               })

rects = append(segs, rects)
p %>% layout(shapes = rects)

enter image description here


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接