Python函数获取t统计量

Question

Python函数获取t统计量

pythonpython-2.7statisticsconfidence-interval

37

我正在寻找一个Python函数（如果没有现成的函数，我将自己编写）以获取t统计量，以便用于置信区间计算。我已经找到了一些表格，可以给出各种概率/自由度的答案，例如this one，但我想要能够为任何给定的概率计算这个值。对于那些不熟悉的人来说，自由度是样本中数据点的数量（n）-1，顶部列标题的数字是概率（p），例如，如果您正在查找用于在计算中使用的t分数，以便在重复n次测试时结果会落在平均值+/-置信区间内，使用双尾显著性水平为0.05。我已经研究了使用scipy.stats中的各种函数，但我看不到有任何一个函数允许使用我上述简单输入。Excel对此有一个简单的实现，例如，要获得样本为1000的t分数，其中我需要有95％的信心，我会使用：=TINV(0.05,999)并获得得分约为1.96。

这里是我目前用来实现置信区间的代码，如您所见，我目前使用的方法非常粗糙，只是允许一些perc_conf值，并警告对于样本＜1000，它不准确：

# -*- coding: utf-8 -*-
from __future__ import division
import math

def mean(lst):
    # μ = 1/N Σ(xi)
    return sum(lst) / float(len(lst))

def variance(lst):
    """
    Uses standard variance formula (sum of each (data point - mean) squared)
    all divided by number of data points
    """
    # σ² = 1/N Σ((xi-μ)²)
    mu = mean(lst)
    return 1.0/len(lst) * sum([(i-mu)**2 for i in lst])

def conf_int(lst, perc_conf=95):
    """
    Confidence interval - given a list of values compute the square root of
    the variance of the list (v) divided by the number of entries (n)
    multiplied by a constant factor of (c). This means that I can
    be confident of a result +/- this amount from the mean.
    The constant factor can be looked up from a table, for 95% confidence
    on a reasonable size sample (>=500) 1.96 is used.
    """
    if perc_conf == 95:
        c = 1.96
    elif perc_conf == 90:
        c = 1.64
    elif perc_conf == 99:
        c = 2.58
    else:
        c = 1.96
        print 'Only 90, 95 or 99 % are allowed for, using default 95%'
    n, v = len(lst), variance(lst)
    if n < 1000:
        print 'WARNING: constant factor may not be accurate for n < ~1000'
    return math.sqrt(v/n) * c

这是上述代码的一个调用示例：

# Example: 1000 coin tosses on a fair coin. What is the range that I can be 95%
#          confident the result will f all within.

# list of 1000 perfectly distributed...
perc_conf_req = 95
n, p = 1000, 0.5 # sample_size, probability of heads for each coin
l = [0 for i in range(int(n*(1-p)))] + [1 for j in range(int(n*p))]
exp_heads = mean(l) * len(l)
c_int = conf_int(l, perc_conf_req)

print 'I can be '+str(perc_conf_req)+'% confident that the result of '+str(n)+ \
      ' coin flips will be within +/- '+str(round(c_int*100,2))+'% of '+\
      str(int(exp_heads))
x = round(n*c_int,0)
print 'i.e. between '+str(int(exp_heads-x))+' and '+str(int(exp_heads+x))+\
      ' heads (assuming a probability of '+str(p)+' for each flip).'

这个的输出是：

我可以有95%的信心，即使假设每次抛硬币的概率为0.5，1000次抛硬币的结果将在500的正负3.1%之间，即在469到531个正面朝上（头像）之间。

我还尝试计算t分布范围，并返回最接近所需概率的t分数，但我在实现公式时遇到了问题。如果这与您相关并且您想查看代码，请告诉我，但我假设不需要，因为可能有更简单的方法。

- ChrisProsser

4个回答

3

请尝试以下代码：

from scipy import stats
#Studnt, n=22,  2-tail
#stats.t.ppf(1-0.025, df)
# df=n-1=22-1=21
print (stats.t.ppf(1-0.025, 21))

- javac

2

scipy.stats.t有另一种方法isf，直接返回对应于上尾概率alpha的分位数。这是逆生存函数的一种实现，返回与t.ppf(1-alpha, dof)完全相同的值。

from scipy import stats
alpha, dof = 0.05, 999

stats.t.isf(alpha, dof) 
# 1.6463803454275356

对于双尾检验，将 alpha 减半：

stats.t.isf(alpha/2, dof)
# 1.962341461133449

- cottontail

0

你可以尝试这段代码：

# for small samples (<50) we use t-statistics
# n = 9, degree of freedom = 9-1 = 8
# for 99% confidence interval, alpha = 1% = 0.01 and alpha/2 = 0.005
from scipy import stats

ci = 99
n = 9
t = stats.t.ppf(1- ((100-ci)/2/100), n-1) # 99% CI, t8,0.005
print(t) # 3.36

- user8864088

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- henderso · Accepted Answer

你尝试过scipy库吗?

你需要安装scipy库...有关安装它的更多信息请查看这里：http://www.scipy.org/install.html

一旦安装完成，你可以像以下这样复制Excel的功能：

from scipy import stats
#Studnt, n=999, p<0.05, 2-tail
#equivalent to Excel TINV(0.05,999)
print stats.t.ppf(1-0.025, 999)

#Studnt, n=999, p<0.05%, Single tail
#equivalent to Excel TINV(2*0.05,999)
print stats.t.ppf(1-0.05, 999)

您也可以在这里阅读有关安装该库的信息：如何为Python安装Scipy？