鉴于一位热心回答者的解释,一个广泛接受的观点是one-hot编码和虚拟编码之间有所不同。既然如此,当使用默认参数(即drop_first=False
)时,pandas.get_dummies
方法是进行one-hot编码吗?
如果是这样,从逻辑回归模型中删除拦截是否有意义?下面是一个例子:
# I assume I have already my dataset in a DataFrame X and the true labels in y
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X = pd.get_dummies(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .80)
clf = LogisticRegression(fit_intercept=False)
clf.fit(X_train, y_train)