如何在ggplot2中绘制logit和probit图形

Question

如何在ggplot2中绘制logit和probit图形

11

这几乎可以肯定是一个新手问题。

对于下面的数据集，我一直在尝试使用ggplot2绘制logit和probit曲线，但没有成功。

我一直在天真地使用的代码是：

    library(ggplot2)
    TD<-mydata$TD
    Temp<-mydata$Temp
    g<-    qplot(Temp,TD)+geom_point()+stat_smooth(method="glm",family="binomial",formula=y~x,col="red")
    g1<-g+labs(x="Temperature",y="Thermal Distress")
    g1
    g2<-g1+stat_smooth(method="glm",family="binomial",link="probit",formula=y~x,add=T)
    g2

请告诉我如何改进我的代码，以便将这两个曲线绘制在同一图表上？

谢谢

- JohnK

2个回答

3

您在stat_smooth中使用的这两个函数重叠了。这就是为什么您认为不能在同一张图上看到这两个函数的原因。运行下面的代码将使其清晰，第二条线的颜色为蓝色。

library(ggplot2)
TD<-mydata$TD
Temp<-mydata$Temp
g <- qplot(Temp,TD)+geom_point()+stat_smooth(method="glm",family="binomial",formula=y~x,col="red")
g1<-g+labs(x="Temperature",y="Thermal Distress")
g1
g2<-g1+stat_smooth(method="glm",family="binomial",link="probit",formula=y~x,add=T,col='blue')
g2

如果您在第二个stat_smooth上运行不同的族群，例如泊松分布glm：

library(ggplot2)
TD<-mydata$TD
Temp<-mydata$Temp
g <- qplot(Temp,TD)+geom_point()+stat_smooth(method="glm",family="binomial",formula=y~x,col="red")
g1<-g+labs(x="Temperature",y="Thermal Distress")
g1
g2<-g1+stat_smooth(method="glm",family="poisson",link="log",formula=y~x,add=T,col='blue')
g2

然后您可以看到确实绘制了两条线：

enter image description here

- LyzandeR

从风格上讲，我更喜欢使用 ggplot(mydata,aes(Temp,TD))+geom_point()+ ...，为了使它更清晰，可以在相应的图形中添加 fill='red'、fill='blue' 来着色... PS：将logit二项式与log-Poisson进行比较并没有太多意义... 我认为你真正想要的是 link="logit" 而不是 link="log"... - Ben Bolker

@BenBolker 谢谢Ben。我的意思是要展示他的代码是有效的，而且他绘制的两条线重叠在一起。最简单的方法是将第二个glm模型更改为不同的内容，以使其清晰明了。我并不想以任何方式比较这两个模型。我也不想将logit二项式与log-Poisson进行比较。此外，是的，在风格上有10000种方法可以使我的图形更好，但我只想快速地表达我的观点。谢谢。 - LyzandeR

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andrew · Accepted Answer

另一种方法是生成自己的预测值，并使用ggplot绘制它们 - 这样你可以更好地控制最终的绘图（而不是依赖于stat_smooth进行计算；如果你使用多个协变量并需要在绘图时将某些常数保持在它们的均值或模式处，这尤其有用）。

library(ggplot2)

# Generate data
mydata <- data.frame(Ft = c(1, 6, 11, 16, 21, 2, 7, 12, 17, 22, 3, 8, 
                            13, 18, 23, 4, 9, 14, 19, 5, 10, 15, 20),
                     Temp = c(66, 72, 70, 75, 75, 70, 73, 78, 70, 76, 69, 70, 
                              67, 81, 58, 68, 57, 53, 76, 67, 63, 67, 79),
                     TD = c(0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 
                            0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0))

# Run logistic regression model
model <- glm(TD ~ Temp, data=mydata, family=binomial(link="logit"))

# Create a temporary data frame of hypothetical values
temp.data <- data.frame(Temp = seq(53, 81, 0.5))

# Predict the fitted values given the model and hypothetical data
predicted.data <- as.data.frame(predict(model, newdata = temp.data, 
                                        type="link", se=TRUE))

# Combine the hypothetical data and predicted values
new.data <- cbind(temp.data, predicted.data)

# Calculate confidence intervals
std <- qnorm(0.95 / 2 + 0.5)
new.data$ymin <- model$family$linkinv(new.data$fit - std * new.data$se)
new.data$ymax <- model$family$linkinv(new.data$fit + std * new.data$se)
new.data$fit <- model$family$linkinv(new.data$fit)  # Rescale to 0-1

# Plot everything
p <- ggplot(mydata, aes(x=Temp, y=TD)) 
p + geom_point() + 
  geom_ribbon(data=new.data, aes(y=fit, ymin=ymin, ymax=ymax), alpha=0.5) + 
  geom_line(data=new.data, aes(y=fit)) + 
  labs(x="Temperature", y="Thermal Distress")

更好的单行

额外福利，只是为了好玩：如果您使用自己的预测函数，可以在协变量方面发疯，例如显示模型在不同水平的Ft下的拟合情况：

# Alternative, if you want to go crazy
# Run logistic regression model with two covariates
model <- glm(TD ~ Temp + Ft, data=mydata, family=binomial(link="logit"))

# Create a temporary data frame of hypothetical values
temp.data <- data.frame(Temp = rep(seq(53, 81, 0.5), 2),
                        Ft = c(rep(3, 57), rep(18, 57)))

# Predict the fitted values given the model and hypothetical data
predicted.data <- as.data.frame(predict(model, newdata = temp.data, 
                                        type="link", se=TRUE))

# Combine the hypothetical data and predicted values
new.data <- cbind(temp.data, predicted.data)

# Calculate confidence intervals
std <- qnorm(0.95 / 2 + 0.5)
new.data$ymin <- model$family$linkinv(new.data$fit - std * new.data$se)
new.data$ymax <- model$family$linkinv(new.data$fit + std * new.data$se)
new.data$fit <- model$family$linkinv(new.data$fit)  # Rescale to 0-1

# Plot everything
p <- ggplot(mydata, aes(x=Temp, y=TD)) 
p + geom_point() + 
  geom_ribbon(data=new.data, aes(y=fit, ymin=ymin, ymax=ymax, 
                                       fill=as.factor(Ft)), alpha=0.5) + 
  geom_line(data=new.data, aes(y=fit, colour=as.factor(Ft))) + 
  labs(x="Temperature", y="Thermal Distress")

更好的多行