线性代数错误：在尝试进行多项式拟合时，最小二乘法中的奇异值分解未收敛。

Question

线性代数错误：在尝试进行多项式拟合时，最小二乘法中的奇异值分解未收敛。

20

如果我尝试运行下面的脚本，我会得到错误：LinAlgError：在线性最小二乘中，SVD无法收敛。我在类似的数据集上使用了完全相同的脚本，并且它可以工作。我已经尝试搜索我的数据集中Python可能将其解释为NaN的值, 但我找不到任何信息。

我的数据集非常大，无法手动检查。（但我认为我的数据集没问题）。我还检查了stageheight_masked和discharge_masked的长度，但它们是相同的。有人知道为什么我的脚本出现了错误，我该怎么办吗？

import numpy as np
import datetime
import matplotlib.dates
import matplotlib.pyplot as plt
from scipy import polyfit, polyval

kwargs = dict(delimiter = '\t',\
     skip_header = 0,\
     missing_values = 'NaN',\
     converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
     dtype = float,\
     names = True,\
     )

rating_curve_Gillisstraat = np.genfromtxt('G:\Discharge_and_stageheight_Gillisstraat.txt',**kwargs)

discharge = rating_curve_Gillisstraat['discharge']   #change names of columns
stageheight = rating_curve_Gillisstraat['stage'] - 131.258

#mask NaN
discharge_masked = np.ma.masked_array(discharge,mask=np.isnan(discharge)).compressed()
stageheight_masked = np.ma.masked_array(stageheight,mask=np.isnan(discharge)).compressed()

#sort
sort_ind = np.argsort(stageheight_masked)
stageheight_masked = stageheight_masked[sort_ind]
discharge_masked = discharge_masked[sort_ind]

#regression
a1,b1,c1 = polyfit(stageheight_masked, discharge_masked, 2)
discharge_predicted = polyval([a1,b1,c1],stageheight_masked)

print 'regression coefficients'
print (a1,b1,c1)

#create upper and lower uncertainty
upper = discharge_predicted*1.15
lower = discharge_predicted*0.85

#create scatterplot

plt.scatter(stageheight,discharge,color='b',label='Rating curve')
plt.plot(stageheight_masked,discharge_predicted,'r-',label='regression line')
plt.plot(stageheight_masked,upper,'r--',label='15% error')
plt.plot(stageheight_masked,lower,'r--')
plt.axhline(y=1.6,xmin=0,xmax=1,color='black',label='measuring range')
plt.title('Rating curve Catsop')
plt.ylabel('discharge')
plt.ylim(0,2)
plt.xlabel('stageheight[m]')
plt.legend(loc='upper left', title='Legend')
plt.grid(True)
plt.show()

- Toine Kerckhoffs

1

我非常确定 polyfit 不支持屏蔽数组，因此它会像对待其他任何值一样处理 NaNs。您还需要检查无限值（例如使用 np.isinf）。 - ali_m

1

另一个可能的原因是您的数据中有一条“竖线”！ - Yahya

5个回答

2

正如其他人指出的那样，问题很可能是算法没有可以使用的数字行。这是大多数回归分析所面临的问题。

这就是问题所在。然后，解决方案取决于数据。通常，您可以使用Pandas .fillna(0)用0替换NaN值。有时，您可能需要插值缺失值，而Pandas .interpolate()可能是最简单的解决方案。或者，当数据不是时间序列时，您可以使用例如Pandas .dropna()方法删除包含NaN值的行。或者，有时候问题不是NaN，而是infs或其他情况，那么就有其他解决方案：https://dev59.com/BWEh5IYBdhLWcg3w9nhj#55293137 到底采取什么方法，取决于数据，取决于您如何解读数据。领域知识对于良好的数据解释有很大帮助。

- Robin

1

正如ski_squaw所提到的，这个错误大多数情况下是由NaN引起的，但对我来说，这个错误是在Windows更新后出现的。我使用的是numpy版本1.16。将我的numpy版本移动到1.19.3解决了这个问题。（在cmd中运行pip install numpy==1.19.3 --user）

这个gitHub问题解释得更详细： https://github.com/numpy/numpy/issues/16744

Numpy 1.19.3在Linux上不起作用，而1.19.4在Windows上不起作用。

- Joris

0

我在Windows 8上开发了一段代码。现在我正在使用Windows 10，问题出现了！正如@Joris所说，问题已经解决。

pip install numpy==1.19.3

- Leonardo

2

虽然这是对问题的一个有效回答，至少在您的用例中，但它并没有添加新信息，这些信息已经包含在@Joris的答案中。最好不要发布重复的答案。 - joanis

0

修复后的示例：

def calculating_slope(x):
        x = x.replace(np.inf, np.nan).replace(-np.inf, np.nan).dropna()
        if len(x)>1:
            slope = np.polyfit(range(len(x)), x, 1)[0]
        else: 
            slope = 0
        return slope

- Alexandr Kosolapov

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- ski_squaw · Accepted Answer

我没有你的数据文件，但通常情况下，当你遇到这个错误时，你的数据中会有NaN或无穷大的值。使用pd.notnull或np.isfinite查找它们。