为什么OLS会引发LinAlgError:SVD未收敛?

6

我有一个数组:

Num Col2 Col3  Col4  
1   6     1     1   
2   60    0     2   
3   60    0     1   
4   6     0     1   
5   60    1     1   

代码如下:

y = df.loc[:,'Col3']  # response
X = df.loc[:,['Col2','Col4']]  # predictor
X = sm.add_constant(X) #add constant
est = sm.OLS(y, X) #build regression
est = est.fit() #full model

当代码执行到 .fit() 时,会引发以下错误:

Traceback (most recent call last):
File "D:\Users\Anna\workspace\mob1\mobols.py", line 36, in <module>
est = est.fit() #full model
File "C:\Python27\lib\site-packages\statsmodels\regression\linear_model.py", line 174, in fit
self.pinv_wexog, singular_values = pinv_extended(self.wexog)
File "C:\Python27\lib\site-packages\statsmodels\tools\tools.py", line 392, in pinv_extended
u, s, vt = np.linalg.svd(X, 0)
File "C:\Python27\lib\site-packages\numpy\linalg\linalg.py", line 1327, in svd
u, s, vt = gufunc(a, signature=signature, extobj=extobj)
File "C:\Python27\lib\site-packages\numpy\linalg\linalg.py", line 99, in _raise_linalgerror_svd_nonconvergence
raise LinAlgError("SVD did not converge")
numpy.linalg.linalg.LinAlgError: SVD did not converge

什么是问题?我应该如何解决它?

谢谢


6
数据中很可能存在缺失值,创建模型时可以添加 missing='drop' 参数,比如 sm.OLS(y, X, missing='drop')。另一个可能的问题是数据类型不匹配,尝试使用 X.astype(float) - Josef
1个回答

2

看起来您正在使用Pandas和statsmodels。我运行了您的片段,并没有出现“raise LinAlgError(“ SVD未收敛”)”异常。这是我运行的内容:

import numpy as np
import pandas
import statsmodels.api as sm
d = {'col2': [6, 60, 60, 6, 60], 'col3': [1, 0, 0, 0, 1], 'col4': [1, 2, 1, 1, 1]}
df = pandas.DataFrame(data=d, index=np.arange(1, 6))
print df

输出:

   col2  col3  col4
1     6     1     1
2    60     0     2
3    60     0     1
4     6     0     1
5    60     1     1

y = df.loc[:, 'col3']
X = df.loc[:, ['col2', 'col4']]
X = sm.add_constant(X)
est = sm.OLS(y, X)
est = est.fit()
print est.summary()

这将打印:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   col3   R-squared:                       0.167
Model:                            OLS   Adj. R-squared:                 -0.667
Method:                 Least Squares   F-statistic:                    0.2000
Date:                Sat, 28 Mar 2015   Prob (F-statistic):              0.833
Time:                        16:43:02   Log-Likelihood:                -3.0711
No. Observations:                   5   AIC:                             12.14
Df Residuals:                       2   BIC:                             10.97
Df Model:                           2                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          1.0000      1.003      0.997      0.424        -3.316     5.316
col2       -8.674e-18      0.013  -6.62e-16      1.000        -0.056     0.056
col4          -0.5000      0.866     -0.577      0.622        -4.226     3.226
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.500
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.638
Skew:                          -0.000   Prob(JB):                        0.727
Kurtosis:                       1.250   Cond. No.                         187.
==============================================================================

看起来这个代码是可以工作的,所以没有问题。你是否可能调用了错误的矩阵,导致出现问题?


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接