Python sklearn多元线性回归显示R平方

26

我计算了多元线性回归方程,想要查看调整后的R平方值。我知道得分函数可以显示R平方,但它不是经过调整的。

import pandas as pd #import the pandas module
import numpy as np
df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',')
df
       AverageNumberofTickets   NumberofEmployees   ValueofContract Industry
   0              1                    51                  25750    Retail
   1              9                    68                  25000    Services
   2             20                    67                  40000    Services
   3              1                   124                  35000    Retail
   4              8                   124                  25000    Manufacturing
   5             30                   134                  50000    Services
   6             20                   157                  48000    Retail
   7              8                   190                  32000    Retail
   8             20                   205                  70000    Retail
   9             50                   230                  75000    Manufacturing
  10             35                   265                  50000    Manufacturing
  11             65                   296                  75000    Services
  12             35                   336                  50000    Manufacturing
  13             60                   359                  75000    Manufacturing
  14             85                   403                  81000    Services
  15             40                   418                  60000    Retail
  16             75                   437                  53000    Services
  17             85                   451                  90000    Services
  18             65                   465                  70000    Retail
  19             95                   491                  100000   Services

from sklearn.linear_model import LinearRegression
model = LinearRegression()
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets
model.fit(X, y)
model.score(X, y)
>>0.87764337132340009

我手动检查过了,0.87764是R平方;而0.863248则是调整后的R平方。

2个回答

58

有许多不同的方法来计算R^2调整后的 R^2,以下是其中的一些方法(使用您提供的数据计算):

from sklearn.linear_model import LinearRegression
model = LinearRegression()
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets
model.fit(X, y)

SST = SSR + SSE (ref definitions)

# compute with formulas from the theory
yhat = model.predict(X)
SS_Residual = sum((y-yhat)**2)       
SS_Total = sum((y-np.mean(y))**2)     
r_squared = 1 - (float(SS_Residual))/SS_Total
adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)
print r_squared, adjusted_r_squared
# 0.877643371323 0.863248473832

# compute with sklearn linear_model, although could not find any function to compute adjusted-r-square directly from documentation
print model.score(X, y), 1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1)
# 0.877643371323 0.863248473832 

另一种方法:
# compute with statsmodels, by adding intercept manually
import statsmodels.api as sm
X1 = sm.add_constant(X)
result = sm.OLS(y, X1).fit()
#print dir(result)
print result.rsquared, result.rsquared_adj
# 0.877643371323 0.863248473832

另一种方法:

# compute with statsmodels, another way, using formula
import statsmodels.formula.api as sm
result = sm.ols(formula="AverageNumberofTickets ~ NumberofEmployees + ValueofContract", data=df).fit()
#print result.summary()
print result.rsquared, result.rsquared_adj
# 0.877643371323 0.863248473832

3
你可以在公式中使用model.coef_,而不是X.shape[1]。这样更加易于理解说明。 - Manuel G
1
非常感谢您! - mohammed_ayaz
@ManuelG 不正确,即使你的意思是 len(model.coef_)(我假设你是这个意思); 这也会包括LR的常数项,但这不应该是这种情况。 - desertnaut
1
你也可以这样做:from sklearn.metrics import explained_variance_score, r2_score。 其中,r^2得分是explained_variance_score,而调整后的r^2得分是r2_score - Mohith7548

-1
regressor = LinearRegression(fit_intercept=False)
regressor.fit(x_train, y_train)
print(f'r_sqr value: {regressor.score(x_train, y_train)}')

6
考虑添加更多细节或解释您的答案。 - Roy
2
我不确定这个答案如何有所帮助。 - Hardik Gupta

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接