我是R的新手,正在为自己的目的而开展一个项目。 我有这些数据(问题结尾处有可重现的dput):
X datetime user state
1 1 2016-02-19 19:13:26 User1 joined
2 2 2016-02-19 19:21:18 User2 joined
3 3 2016-02-19 19:21:33 User1 joined
4 4 2016-02-19 19:35:38 User1 joined
5 5 2016-02-19 19:44:15 User1 joined
6 6 2016-02-19 19:48:55 User1 joined
7 7 2016-02-19 19:52:40 User1 joined
8 8 2016-02-19 19:53:15 User3 joined
9 9 2016-02-19 20:02:34 User3 joined
10 10 2016-02-19 20:13:48 User3 joined
19 637 2016-02-19 19:13:32 User1 left
20 638 2016-02-19 19:25:26 User1 left
21 639 2016-02-19 19:30:30 User2 left
22 640 2016-02-19 19:42:16 User1 left
23 641 2016-02-19 19:47:59 User1 left
24 642 2016-02-19 19:51:06 User1 left
25 643 2016-02-19 20:02:26 User3 left
我想让它看起来像这样:
user joined left
1 User1 2016-02-19 19:13:26 2016-02-19 19:13:32
2 User2 2016-02-19 19:21:18 2016-02-19 19:30:30
3 User3 2016-02-19 19:53:15 2016-02-19 20:02:26
4 User1 2016-02-19 19:21:33 2016-02-19 19:25:26
.
.
.
我正在研究tidyr,因为显然涉及到一些重塑,但我无法理解需要做什么。这是否可能(没有循环/大量的过程代码)?我无法理解如何避免的问题是,没有办法知道特定的“左”记录应该连接到特定的“连接”记录。我能找到的示例都涉及静态月份或日期,其他值在其上收集。我应该补充说,并不一定保证所有记录都有“左”值(用户仍然可能“加入”)。
以下是数据样本:
samp <- data.frame(
X = c(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 637L, 638L, 639L, 640L, 641L, 642L, 643L,
644L, 645L, 646L, 647L, 648L, 649L, 650L, 651L
),
datetime = factor(c(
"2016-02-19 19:13:26", "2016-02-19 19:21:18", "2016-02-19 19:21:33",
"2016-02-19 19:35:38", "2016-02-19 19:44:15", "2016-02-19 19:48:55",
"2016-02-19 19:52:40", "2016-02-19 19:53:15", "2016-02-19 20:02:34",
"2016-02-19 20:13:48", "2016-02-19 20:49:31", "2016-02-19 20:59:58",
"2016-02-19 21:06:20", "2016-02-19 21:11:15", "2016-02-19 21:11:22",
"2016-02-19 22:05:18", "2016-02-19 22:05:47", "2016-02-19 22:30:30",
"2016-02-19 19:13:32", "2016-02-19 19:25:26", "2016-02-19 19:30:30",
"2016-02-19 19:42:16", "2016-02-19 19:47:59", "2016-02-19 19:51:06",
"2016-02-19 20:02:26", "2016-02-19 20:13:38", "2016-02-19 20:42:27",
"2016-02-19 20:48:22", "2016-02-19 21:10:43", "2016-02-19 21:11:13",
"2016-02-19 21:17:33", "2016-02-19 22:02:45", "2016-02-19 22:05:37"
)),
user = factor(rep(
c(
"User1", "User2", "User1", "User3", "User4", "User1", "User4", "User3",
"User1", "User2", "User1", "User3", "User1", "User4", "User1", "User4"
),
c(
1L, 1L, 5L, 4L, 1L, 2L, 3L, 1L, 2L, 1L, 3L, 3L, 1L, 1L, 2L,
2L
)
)),
state = factor(rep(c("joined", "left"), c(18L, 15L)))
)
ts<-spread(test, state, datetime)
可以为数据集准备好很多工作。 - Tim Cokerreshape(samp,drop ='X',dir ='wide',idvar ='user',timevar ='state',v.names ='datetime')
。 - rawr