运行T-Test时出现Python运行时错误

5

我正在从我的C++代码中调用一个Python函数来计算“t-test”。函数调用如下:

#include <iostream>
#include "Python.h"
#include "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h"

#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION

int main(int argc, char** argv)
{
    Py_Initialize();

    PyRun_SimpleString("import sys");
    PyRun_SimpleString("sys.path.append(\"PATH_TO_MOD\")");
    PyObject *pName = PyString_FromString("tmpPyth");
    PyObject *pModule = PyImport_Import(pName);


    double arr[] ={9.74219, 10.2226, 8.7469, 8.69791, 9.96442, 9.96472, 9.37913, 9.75004};
    double arr1[] ={9.74219, 10.2226, 8.7469, 8.69791, 9.96442, 9.96472, 9.37913, 9.75004};

    PyObject *lst = PyList_New(8);
    PyObject *lst1 = PyList_New(8);
    // if (!lst)
    //     return NULL;
    for (int i = 0; i < 8; i++) {
        PyObject *num = PyFloat_FromDouble(arr[i]);
        PyObject *num1 = PyFloat_FromDouble(arr1[i]);
        PyList_SET_ITEM(lst, i, num);
        PyList_SET_ITEM(lst1, i, num1);
    }

    PyObject *pArgs = PyTuple_New(2);
    PyTuple_SetItem(pArgs, 0, lst);
    PyTuple_SetItem(pArgs, 1, lst1);

    if (pModule != NULL) {
        PyObject *pFunc = PyObject_GetAttrString(pModule, "blah");

        if(pFunc != NULL){
            PyObject_CallObject(pFunc, pArgs);
        }
    }
    else
        std::cout << "Module path provided may be wrong. Module not found.\n\n";
    return 0;
}

我的Python模块定义如下:

import numpy
import scipy
import matplotlib

from scipy import stats
def blah(baseline, follow_up):
    paired_sample  = stats.ttest_rel(baseline , follow_up )
    print "The t-statistic is %.3f and the p-value is %.3f." % paired_sample

现在当我尝试运行它时,我会得到以下的运行时异常:
/usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/stats.py:3458: RuntimeWarning: invalid value encountered in divide
  t = np.divide(dm, denom)

但是,如果我明确定义一个列表,并尝试执行“t-test”函数,它会正常运行。运行函数的定义如下:

import numpy
import scipy
import matplotlib

from scipy import stats

    def blah():
        baseline = [9.74219, 10.2226, 8.7469, 8.69791, 9.96442, 9.96472, 9.37913, 9.75004]
        follow_up = [9.94227,9.46763,8.53081,9.43679,9.97695,10.4285,10.159,8.86134]
        paired_sample  = stats.ttest_rel(baseline , follow_up )
        print "The t-statistic is %.3f and the p-value is %.3f." % paired_sample

我假设在定义传递给Python脚本的列表时犯了一些错误,但无法弄清楚是什么错误。希望能得到帮助。


3
从你的描述中我没有看到明显的问题,所以它看起来是正确的。只是出于好奇,当你调用blah()时,发生了在numpy.stats()中的错误,blah()中传递的值是什么?使用print repr(baseline)print repr(follow_up)输出它们的表示形式。话虽如此,为什么你还在使用Python 2?至少Numpy也可以在当前版本的Python中使用。 - Ulrich Eckhardt
嗯,也许是个愚蠢的问题,但你有没有考虑过使用现有的scipy中的t-tests而不是重新实现它们? - mtzl
1个回答

0

arr1 在你的 C++ 代码中与 arr 相同,因此导致了除以零。由于你的 Python 代码中的 baselinefollow_up 不同,所以你不会得到错误。

对于较大的数组,你不想通过 Python 列表来传递它们,而是直接将数组发送到 Python。我已经修改了你上面的代码来实现这一点:

-- pyfromc.cc --
#include <iostream>
#include <Python.h>

int main(int argc, char** argv)
{
    Py_Initialize();

    PyRun_SimpleString("import sys; sys.path.append('.')");
    // PyRun_SimpleString("print '\\n'.join(sys.path)");
    PyObject *pName = PyString_FromString("ttest");
    PyObject *pModule = PyImport_Import(pName);

    double arr[] ={9.74219, 10.2226, 8.7469, 8.69791, 9.96442, 9.96472, 9.37913, 9.75004};
    double arr1[] ={9.94227,9.46763,8.53081,9.43679,9.97695,10.4285,10.159,8.86134};

    PyObject *pArgs = PyTuple_New(3);
    PyTuple_SetItem(pArgs, 0, PyLong_FromLong(8));
    PyTuple_SetItem(pArgs, 1, PyLong_FromVoidPtr(arr));
    PyTuple_SetItem(pArgs, 2, PyLong_FromVoidPtr(arr1));

    if (pModule != NULL) {
        PyObject *pFunc = PyObject_GetAttrString(pModule, "blahptr");

        if(pFunc != NULL){
            PyObject_CallObject(pFunc, pArgs);
        }
    }
    else
        std::cout << "Module path provided may be wrong. Module not found.\n\n";
    return 0;
}

在Python端:
-- ttest.py --
from ctypes import POINTER, c_double, cast
c_double_p = POINTER(c_double)

import numpy as np
from scipy import stats

def blahptr(n, baseline_ptr, follow_up_ptr):
    baseline = np.ctypeslib.as_array(cast(baseline_ptr, c_double_p), shape=(n,))
    follow_up = np.ctypeslib.as_array(cast(follow_up_ptr, c_double_p), shape=(n,))
    return blah(baseline, follow_up)

def blah(baseline, follow_up):
    paired_sample  = stats.ttest_rel(baseline , follow_up )
    print "The t-statistic is %.3f and the p-value is %.3f." % paired_sample
    return paired_sample

为了在Mac OS X上编译和运行代码,我使用了以下方法:

$ PYENV=/path/to/python/env
$ c++ pyfromc.cc -I$PYENV/include/python2.7 -L$PYENV/lib -lpython2.7    
$ PYTHONHOME=$PYENV DYLD_LIBRARY_PATH=$PYENV/lib ./a.out 
The t-statistic is -0.187 and the p-value is 0.857.

通过在可执行文件的同一行设置环境变量,bash解释器仅为该命令的持续时间设置它们,而不是其他命令。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接