最近几周我一直在研究多目标回归问题。我正在使用scikit-learn软件包进行工作。我的机器学习问题有3个特征的输入,需要预测两个输出变量。sklearn软件包中的一些ML模型天然支持多目标回归。如果这些模型不支持,则可以使用sklearn的多目标回归算法进行转换。 multioutput 类为每个目标拟合一个回归器。
- 多输出回归类或支持的多目标回归算法是否考虑了输入变量的基本关系?
- 除了多目标回归算法外,我应该使用神经网络吗?
最近几周我一直在研究多目标回归问题。我正在使用scikit-learn软件包进行工作。我的机器学习问题有3个特征的输入,需要预测两个输出变量。sklearn软件包中的一些ML模型天然支持多目标回归。如果这些模型不支持,则可以使用sklearn的多目标回归算法进行转换。 multioutput 类为每个目标拟合一个回归器。
1)关于您的第一个问题,我将其分成两个部分。
First part has the answer written in the documentation you linked and also in this user guide topic, which states explicitly that:
As MultiOutputRegressor fits one regressor per target it can not take advantage of correlations between targets.
Second part of first question asks about other algorithms which support this. For that you can look at the "inherently multiclass" part in the user-guide. Inherently multi-class means that they don't use One-vs-Rest or One-vs-One strategy to be able to handle multi-class (OvO and OvR uses multiple models to fit multiple classes and so may not use the relationship between targets). Inherently multi-class means that they can structure the multi-class setting into a single model. This lists the following:
sklearn.naive_bayes.BernoulliNB
sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.naive_bayes.GaussianNB
sklearn.neighbors.KNeighborsClassifier
sklearn.semi_supervised.LabelPropagation
sklearn.semi_supervised.LabelSpreading
sklearn.discriminant_analysis.LinearDiscriminantAnalysis
sklearn.svm.LinearSVC (setting multi_class=”crammer_singer”)
sklearn.linear_model.LogisticRegression (setting multi_class=”multinomial”)
...
...
...
Try replacing the 'Classifier' at the end with 'Regressor' and see the documentation of fit()
method there. For example let's take DecisionTreeRegressor.fit():
y : array-like, shape = [n_samples] or [n_samples, n_outputs]
The target values (real numbers).
Use dtype=np.float64 and order='C' for maximum efficiency.
You see that it supports a 2-d array for targets (y
). So it may be able to use correlation and underlying relationship of targets.
关于您第二个问题,是否要使用神经网络,这取决于个人偏好、问题类型、您拥有的数据数量和类型以及您希望进行的训练迭代次数。也许您可以尝试多种算法,选择对您的数据和问题产生最佳输出的算法。