数值错误：在运行sklearn LinearRegression().fit()时，内部None的第4个参数存在非法值。

Question

数值错误：在运行sklearn LinearRegression().fit()时，内部None的第4个参数存在非法值。

8

由于某些原因，我无法再让这段代码正确运行：

import numpy as np
from sklearn.linear_model import LinearRegression

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

Traceback (most recent call last):
  File "<input>", line 2, in <module>
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

我不确定为什么在这么简单的例子中会出现这个错误。这是我的当前版本：

scipy.__version__
'1.5.0'
sklearn.__version__
'0.23.1'

我正在64位的Windows 10企业版和Python 3.7.3上运行此代码。我尝试卸载并重新安装scipy和scikit-learn，尝试较早版本的scipy，以及卸载并重新安装Python，但这些都没有解决问题。

更新： 似乎与matplotlib有关。我之前在Pycharm中运行了这个例子，但现在直接从PowerShell中运行它。因此，如果我在Pycharm之外运行此代码，则不会出现错误。

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

但是，如果我在其间绘制数据，则会出现错误：

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Plot data
plt.scatter(x, y)
plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

 ** On entry to DLASCLS parameter number  4 had an illegal value
Traceback (most recent call last):
  File ".\run.py", line 18, in <module>
    lm.fit(x.reshape(-1, 1), y)
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

但是如果我将plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')这一行注释掉，程序就可以正常运行。

- evan.tuck

dlascls来自lapack。你可以找到很多请求帮助的信息，显示有关参数4的相同消息。也许使用更新的版本或不同的实现会有所帮助？ - BlackBear

1

让我再次建议使用miniconda :) - BlackBear

我使用Miniconda安装了一个新的Python环境，并通过conda install -c conda-forge lapack安装了LAPACK，但仍然遇到相同的错误。 - evan.tuck

是的，那就是我尝试过的，但问题仍然存在。 - evan.tuck

1

抱歉，我之前是在Anaconda命令提示符中使用PIP安装了那些包，但现在我已经卸载了它们并使用conda重新安装，看起来一切正常！ - evan.tuck

显示剩余5条评论

10个回答

2

似乎只有在使用matplotlib打印图形时才会出现这种情况，否则您可以运行拟合算法任意次数。然而，如果将数据类型从float64更改为float32（Grzesik的答案），奇怪的是错误消失了。对我来说感觉像是一个bug，为什么更改数据类型会影响matplotlib和sklearn中lapack_function之间的交互？这更像是一个问题而不是一个答案，但是发现这些函数和数据类型之间的意外交互有点可怕。

import numpy as np
import sklearn
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


def main(print_matplotlib=False,dtype=np.float64):
    x = np.linspace(-3,3,100).astype(dtype)
    print(x.dtype)
    y = 2*np.random.rand(x.shape[0])*x + np.random.rand(x.shape[0])
    x = x.reshape((-1,1))

    reg=LinearRegression().fit(x,y)
    print(reg.intercept_,reg.coef_)
    
    yh = reg.predict(x)
    
    if print_matplotlib:
        plt.scatter(x,y)
        plt.plot(x,yh)
        plt.show()

不需要绘图

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)  
    pass

float64
0.5957165420019624 [0.91960601]
float64
0.5957165420019624 [0.91960601]

绘制 dtype = np.float64

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    pass

float64
0.5957165420019624 [0.91960601]

图1

float64
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-52593a548324> in <module>
      3     main(print_matplotlib = True)
      4     np.random.seed(64)
----> 5     main(print_matplotlib = True)
      6 
      7     pass

<ipython-input-1-11139051f2d3> in main(print_matplotlib, dtype)
     11     x = x.reshape((-1,1))
     12 
---> 13     reg=LinearRegression().fit(x,y)
     14     print(reg.intercept_,reg.coef_)
     15 

~\Anaconda3\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight)
    545         else:
    546             self.coef_, self._residues, self.rank_, self.singular_ = \
--> 547                 linalg.lstsq(X, y)
    548             self.coef_ = self.coef_.T
    549 

~\AppData\Roaming\Python\Python37\site-packages\scipy\linalg\basic.py in lstsq(a, b, cond, overwrite_a, overwrite_b, check_finite, lapack_driver)
   1249         if info < 0:
   1250             raise ValueError('illegal value in %d-th argument of internal %s'
-> 1251                              % (-info, lapack_driver))
   1252         resids = np.asarray([], dtype=x.dtype)
   1253         if m > n:

ValueError: illegal value in 4-th argument of internal None

绘制 dtype=np.float32 类型的图形

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    pass

输出 2

- Alberto GR

2

截至numpy 1.19.1和sklearn v0.23.2，我发现polyfit(deg=1)和LinearRegression().fit()在没有任何充分理由的情况下产生了意外错误。不，数据没有任何NaN或Inf值。最终，我使用了scipy.stats.linregress()。

slope, intercept, r_value, p_value, std_err = stats.linregress(x.astype(np.float32), y.astype(np.float32))

- Tae-Sung Shin

1

首先检查NaN、inf值。还可以尝试使用normalize=True。

lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()

但是这些对我没有用。而且，我的数据中没有任何NaN或inf值。但是，在实验过程中，我发现第二次运行相同的代码可行。因此我这样做了。

try: 
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
except:
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()

我不知道为什么这个方法可行，但它解决了我的问题。因此，对同一代码进行两次尝试对我很有帮助。

- Suraj Mahangade

0

在运行WSL2 VSCode jupyter笔记本（python 3.8.8）中的sklearn线性回归时，我遇到了相同的问题。即使是非常简单的例子（例如y=x），回归也会产生随机结果，并偶尔抛出此ValueError。

经过多次尝试，解决方法是升级到scipy 1.7.1（从1.6.2）。升级后，回归产生正确的结果。不再出现随机错误！

- Steve Lihn

0

在scipy/linalg/basic.py中，有一行1031 lstsq函数。 lstsq中的lapack_driver参数设置为None。如果driver为None，则在第1162行将driver设置为'gelsd'。我认为'gelsd'是问题所在。如果您将driver = 'gelsy'更改，代码将正常工作。

- Swann Proust

0

你的代码中缺少 plt.show()。请在这行代码后面添加它：

plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')
plt.show()

- Dammio

0

我建议您在代码中使用参数normalize=True以避免这种情况。

LinearRegression(fit_intercept=True,
                 normalize=True,
                 copy_X=True,
                 n_jobs=None)

这对我解决了错误。

- Ash Upadhyay

0

在你的代码中更改它：

lm.fit(x.reshape(-1, 1), y)

关于：

lm.fit(x.reshape(-1, 1).astype(np.float32), y)

- Grzesik

0

对我来说，我的数据集中有些数据点的小数位太多了，导致我的多项式拟合出现问题。我猜测这可能是溢出错误，或者由于NaN值引起的错误（在我的情况下，我没有任何NaN）。当我将数据集数组四舍五入后，就不再遇到这个错误了。

您可以尝试将数据集数组中的所有数据点都进行四舍五入处理：

data_array = np.round(data_array,4)

- ThomasAFink

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Lorem Ipsum · Accepted Answer

这似乎是由Windows (更新2004?)中的一个错误引起的。

发布的问题https://github.com/scipy/scipy/issues/12893
是https://github.com/scipy/scipy/issues/12747的一个副本，并且
由https://github.com/numpy/numpy/issues/16744引起

它与Numpy是否能够与特定的基本线性代数子程序（BLAS）接口有关。

最流行的解决方法是使用conda安装Numpy或使用非Windows系统（例如GNU/Linux OS）。 conda捆绑了IntelMath Kernel Library (MKL)，它没有这个问题。非Windows系统没有Windows的问题。据说Microsoft将在2021年1月左右提供补丁程序。

如果您受到此问题的影响，就像许多其他人一样，请记住，对于Numpy、Python和许多其他免费软件包，许可证明确声明：

版权所有者和贡献者提供此软件"如实"。

请在与这些系统的开发人员进行任何评论时注意这一点（即要礼貌和尊重）。