第一次尝试使用sklearn和pandas,如果这是一个基础问题,请原谅。这是我的代码:
import pandas as pd
from sklearn.linear_model import LogisticRegression
X = df[predictors]
y = df['Plc']
X_train = X[:int(X.shape[0]*0.7)]
X_test = X[int(X.shape[0]*0.7):]
y_train = y[:int(X.shape[0]*0.7)]
y_test = y[int(X.shape[0]*0.7):]
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
result = model.score(X_test, y_test)
print("Accuracy: %.3f%%" % (result*100.0))
现在我希望的是将预测值放回到原始的
df
中,这样我就可以查看实际df['Plc']
列和y_test
的预测值之间的差异。我已经尝试过了,但感觉这可能不是最好的方法,并且索引号没有像预期的那样对齐。y_pred = pd.DataFrame()
y_pred['preds'] = model.predict(X_test)
y_test = pd.DataFrame(y_test)
y_test['index1'] = y_test.index
y_test = y_test.reset_index()
y_test = pd.concat([y_test,y_pred],axis=1)
y_test.set_index('index1')
df = df.reset_index()
df_out = pd.merge(df,y_test,how = 'inner',left_index = True, right_index = True)
有什么其他的建议吗?谢谢!