使用sklearn中的鸢尾花数据集。我正在拆分数据应用感知器,记录得分在字典中,将样本大小(键)用于拟合模型到相应的分数(训练和测试分数作为元组)。
这会产生3个字典,因为我运行了3次循环。如何找到3次迭代的平均分数?我尝试将字典存储在列表中并进行平均,但它没有起作用。
例如:如果字典是
输出应该是
此外,如果有人了解统计学,是否有一种方法可以找到试验中的标准误差并打印出每个样本大小(字典键)的平均标准误差?
这会产生3个字典,因为我运行了3次循环。如何找到3次迭代的平均分数?我尝试将字典存储在列表中并进行平均,但它没有起作用。
例如:如果字典是
{21: (0.85, 0.82), 52: (0.80, 0.62), 73: (0.82, 0.45), 94: (0.81, 0.78)}
{21: (0.95, 0.91), 52: (0.80, 0.89), 73: (0.84, 0.87), 94: (0.79, 0.41)}
{21: (0.809, 0.83), 52: (0.841, 0.77), 73: (0.84, 0.44), 94: (0.79, 0.33)}
输出应该是
{21:(0.869,0.853),52.....}
,其中键21的值的第一个元素为(0.85+0.95+0.809)/3,第二个元素为(0.82+0.91+0.83)/3。import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
score_list=shape_list=[]
iris = load_iris()
props=[0.2,0.5,0.7,0.9]
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])
y=df[list(df.loc[:,df.columns.values =='target'])]
X=df[list(df.loc[:,df.columns.values !='target'])]
# number of trials
for i in range(3):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, train_size=0.7)
results = {}
for i in props:
size = int(i*len(X_train))
ix = np.random.choice(X_train.index, size=size, replace = False)
sampleX = X_train.loc[ix]
sampleY = y_train.loc[ix]
#apply model
modelP = Perceptron(tol=1e-3)
modelP.fit(sampleX, sampleY)
train_score = modelP.score(sampleX,sampleY)
test_score = modelP.score(X_test,y_test)
#store in dictionary
results[size] = (train_score, test_score)
print(results)
此外,如果有人了解统计学,是否有一种方法可以找到试验中的标准误差并打印出每个样本大小(字典键)的平均标准误差?
df[['TrS{}'.format(c), 'TeS{}'.format(c)]] = pd.DataFrame(df[c].tolist(), index= df.index)
,但你应该考虑使用新的工具。 - Trenton McKinney