Python中计算两个比例差异的置信区间

Question

Python中计算两个比例差异的置信区间

pythonstatisticsconfidence-intervalab-testing

7

例如，在AB测试中，A人群可能有1000个数据点，其中100个是成功的。而B可能有2000个数据点和220个成功。这使得A的成功比例为0.1，B为0.11，其差值为0.01。我如何在Python中计算此差异的置信区间？

统计模型可以为一个样本完成此操作，但似乎没有处理两个样本之间差异的软件包，这对于AB测试是必要的。（http://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html）

- Johnny V

请参考以下链接以了解有关Python编程的内容：https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.ttest_ind.html 或 https://dev59.com/q3RB5IYBdhLWcg3wAjJH ...? - Dadep

这是一个不寻常的范式。通常情况下，当比较两个总体时，假设是它们的成功概率相等。基于此，置信区间将围绕p=0进行计算。这可能是你在这里没有得到任何答案的原因。 - Bill Bell

3个回答

4

样本大小不需要相等。两个比例的置信区间为

。

其中，p1和p2是分别计算其各自样本n1和n2上观察到的概率。

更多内容请参见这篇白皮书。

- Igor Urisman

我认为这并没有回答问题。正如那篇论文的作者所说，“请注意，我们并不对整个人口中p2和p1之间的差异大小做出任何声明 - 只是它存在。”问题是关于在0.11值周围构建置信区间，而不是零。 - Bill Bell

我同意，如果这是问题，那么什么是零假设？ - Igor Urisman

我认为（普通的）理论不适用。（并不是任何人都遵守这样的细节。）你不能取样本，计算样本比例之间的差异，然后假装在取样本之前就在测试差异是否为0.11。这不公平。 - Bill Bell

在某种程度上，引导式方法能够解决这个问题吗？ - Jonny Brooks

2

statsmodels包现在有confint_proportions_2indep函数，用于获取比较两个比例的置信区间。您可以在文档中查看详细信息https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.confint_proportions_2indep.html

- Nazly Sabbour

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Johnny V · Accepted Answer

我在Statsmodels中找不到这个功能。不过，这个网站介绍了生成置信区间的数学方法，并提供了下面的函数：

def two_proprotions_confint(success_a, size_a, success_b, size_b, significance = 0.05):
    """
    A/B test for two proportions;
    given a success a trial size of group A and B compute
    its confidence interval;
    resulting confidence interval matches R's prop.test function

    Parameters
    ----------
    success_a, success_b : int
        Number of successes in each group

    size_a, size_b : int
        Size, or number of observations in each group

    significance : float, default 0.05
        Often denoted as alpha. Governs the chance of a false positive.
        A significance level of 0.05 means that there is a 5% chance of
        a false positive. In other words, our confidence level is
        1 - 0.05 = 0.95

    Returns
    -------
    prop_diff : float
        Difference between the two proportion

    confint : 1d ndarray
        Confidence interval of the two proportion test
    """
    prop_a = success_a / size_a
    prop_b = success_b / size_b
    var = prop_a * (1 - prop_a) / size_a + prop_b * (1 - prop_b) / size_b
    se = np.sqrt(var)

    # z critical value
    confidence = 1 - significance
    z = stats.norm(loc = 0, scale = 1).ppf(confidence + significance / 2)

    # standard formula for the confidence interval
    # point-estimtate +- z * standard-error
    prop_diff = prop_b - prop_a
    confint = prop_diff + np.array([-1, 1]) * z * se
    return prop_diff, confint