我有一个脚本,使用随机森林和线性回归来预测第二个数据集的值。这个脚本还算可用,线性回归的准确率为18%,太差了。
所以我尝试使用随机森林,但我不知道如何计算该模型的准确率。
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
from pylab import rcParams
import urllib
import sklearn
from sklearn.linear_model import RidgeCV, LinearRegression, Lasso
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.model_selection import GridSearchCV
data = pd.read_csv('EncuestaVieja.csv')
X = data[['Edad','Sexo','v1','v2','v3']]
y = data['Alumna']
dataP = pd.read_csv('EncuestaVieja_test.csv')
X_p = dataP[['Edad','Sexo','v1','v2','v3']]
y_p = dataP['Alumna']
dataT = pd.read_csv('EncuestaVieja_test_2.csv')
X_t = dataT[['Edad','Sexo','v1','v2','v3']]
y_t = dataT['Alumna']
regr = linear_model.LinearRegression()
regr.fit(X, y)
lr = RandomForestRegressor(n_estimators=50)
lr.fit(X, y)
X_test = pd.read_csv('EncuestaNueva.csv')[['Edad','Sexo','v1','v2','v3']]
predictions = regr.predict(X_test)
predictions2 = lr.predict(X_test)
print( 'RandomForest Accuracy: ')
print(((predictions2)))
print( '')
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_p,y_p)
accuracy = regressor.score(X_t,y_t)
print( 'Linear Regression Accuracy: ', accuracy*100,'%')
print(((predictions)))
输出:
RandomForest Accuracy:
[ 1.64 2.54 2.6 2.38 1.64 1.32 1.68 2.56 3. 2.28 2.38 2.68
2.9 2.5 2.78 1.96 1.56 2.6 2.12 2.76 2.74 1.66 1.68 2.12
2.3 2.36 2.28 2.28 2.82 1.7 1.86 2.36 1.24]
Linear Regression Accuracy: 18.1336149086 %
[ 1.2681851 1.02802219 3.13377072 2.96885127 2.30808853 1.98814349
2.39233726 2.8638321 1.86640316 2.63073399 2.21166731 2.25201016
1.95065189 2.65360517 3.08855254 1.01229733 2.18225606 2.41802534
2.43539473 2.50227407 1.71105799 1.88238089 2.12152321 3.33525397
2.72820453 2.43241713 2.88757874 2.6242382 2.63087916 1.98379487
2.25430306 1.96810279 0.8554685 ]
R^2
不是准确度。 - modesitt