我希望将面板数据进行分区,并保留数据的面板性质:
library(caret)
library(mlbench)
#example panel data where id is the persons identifier over years
data <- read.table("http://people.stern.nyu.edu/wgreene/Econometrics/healthcare.csv",
header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
## Here for instance the dependent variable is working
inTrain <- createDataPartition(y = data$WORKING, p = .75,list = FALSE)
# subset into training
training <- data[ inTrain,]
# subset into testing
testing <- data[-inTrain,]
# Here we see some intersections of identifiers
str(training$id[10:20])
str(testing$id)
当对数据进行分区或抽样时,我希望避免同一个人(id)被分割为两个数据集。有没有一种方法可以从数据中随机抽样/分区并将个人分配到相应的分区而不是观察结果?
我尝试过抽样:
mysample <- data[sample(unique(data$id), 1000,replace=FALSE),]
然而,这样做会破坏数据面板的特性...
inTrain
而不是inTrainID
。 - Googme