如果每次运行代码时不指定
random_state
,则会得到不同的(随机)拆分。相反,如果给定
random_state
值,则拆分将始终相同。它经常用于实验的可重复性。
例如:
X = [[1,5],[2,6],[3,2],[4,7], [5,5], [6,2], [7,1],[8,6]]
y = [1,2,3,4,5,6,7,8]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
X_train_rs, X_test_rs, y_train_rs, y_test_rs = train_test_split(X, y, test_size=0.33, random_state=324)
print("WITH RANDOM STATE: ")
print("X_train: {}\ny_train: {}\nX_test: {}\ny_test: {}".format(X_train_rs, X_test_rs, y_train_rs, y_test_rs))
print("WITHOUT RANDOM STATE: ")
print("X_train: {}\ny_train: {}\nX_test: {}\ny_test: {}".format(X_train, X_test, y_train, y_test))
如果您多次运行此代码,则可以看到在每次运行时未更改没有随机状态的分割。
正如在
sklearn文档中所解释的那样,如果要指定随机数生成器种子(最常见的情况),则
random_state
可以是整数,或者直接是
RandomState类的实例。请注意,每次运行时都会得到相同的结果。