我有一个数据框,我想按照姓名和获得时间进行变量分组,但只选择包含位置10的组。
以下是我的数据框示例。
输出应该类似于这样:
以下是我的数据框示例。
Name Acquired Position Salary
1 Adam Dunn* Amateur Draft 7 250000
2 Adam Dunn* Amateur Draft 7 400000
3 Adam Dunn* Amateur Draft 7 445000
4 Adam Dunn* Amateur Draft 7 4600000
5 Adam Dunn* Amateur Draft 7 7500000
6 Adam Dunn* Amateur Draft 7 10500000
7 Adam Dunn* Amateur Draft 7 13000000
8 Adam Dunn* Free Agency 3 8000000
9 Adam Dunn* Free Agency 3 12000000
10 Adam Dunn* Free Agency 10 12000000
11 Adam Dunn* Free Agency 10 14000000
12 Adam Dunn* Free Agency 10 15000000
13 Adam Dunn* Free Agency 10 15000000
14 Adam LaRoche* Amateur Draft 3 300000
15 Adam LaRoche* Amateur Draft 3 300000
16 Adam LaRoche* Amateur Draft 3 337500
17 Adam LaRoche* Amateur Draft 3 337500
18 Adam LaRoche* Amateur Draft 3 420000
19 Adam LaRoche* Amateur Draft 3 420000
20 Adam LaRoche* Traded 3 3200000
21 Adam LaRoche* Traded 3 5000000
22 Adam LaRoche* Traded 3 7050000
23 Adam LaRoche* Free Agency 3 4500000
24 Adam LaRoche* Free Agency 3 7000000
25 Adam LaRoche* Free Agency 3 8000000
26 Adam LaRoche* Free Agency 3 10000000
27 Adam LaRoche* Free Agency 3 12000000
28 Adam LaRoche* Free Agency 10 12000000
29 Adam Lind* Amateur Draft 10 411800
30 Adam Lind* Amateur Draft 10 550000
31 Adam Lind* Amateur Draft 3 5150000
32 Adam Lind* Amateur Draft 3 5000000
33 Adam Lind* Amateur Draft 3 5000000
34 Adam Lind* Amateur Draft 3 7000000
35 Adam Lind* Traded 3 7500000
36 Adrian Gonzalez* Traded 10 316000
37 Adrian Gonzalez* Traded 3 327500
38 Adrian Gonzalez* Traded 3 500000
39 Adrian Gonzalez* Traded 3 875000
40 Adrian Gonzalez* Traded 3 3125000
41 Adrian Gonzalez* Traded 3 4875000
42 Adrian Gonzalez* Traded 3 6300000
43 Adrian Gonzalez* Traded 3 21000000
44 Adrian Gonzalez* Traded 3 21000000
45 Adrian Gonzalez* Traded 3 21000000
46 Adrian Gonzalez* Traded 3 21857000
47 Alan Bannister Traded 10 350000
48 Albert Belle Amateur Draft 9 68000
49 Albert Belle Amateur Draft 10 117000
50 Albert Belle Amateur Draft 7 130000
51 Albert Belle Amateur Draft 10 175000
52 Albert Belle Amateur Draft 7 1675000
53 Albert Belle Amateur Draft 7 2775000
54 Albert Belle Amateur Draft 7 4500000
55 Albert Belle Amateur Draft 7 5700000
56 Albert Belle Free Agency 7 10000000
57 Albert Belle Free Agency 7 10000000
58 Albert Pujols Amateur Draft 5 200000
59 Albert Pujols Amateur Draft 7 600000
60 Albert Pujols Amateur Draft 7 900000
61 Albert Pujols Amateur Draft 3 7000000
62 Albert Pujols Amateur Draft 3 11000000
63 Albert Pujols Amateur Draft 3 14000000
64 Albert Pujols Amateur Draft 3 12937813
65 Albert Pujols Amateur Draft 3 13870949
66 Albert Pujols Amateur Draft 3 14427326
67 Albert Pujols Amateur Draft 3 14595953
68 Albert Pujols Amateur Draft 3 14508395
69 Albert Pujols Free Agency 3 12000000
70 Albert Pujols Free Agency 10 16000000
71 Albert Pujols Free Agency 3 23000000
72 Albert Pujols Free Agency 3 24000000
73 Alex Rodriguez Amateur Draft 6 442333
74 Alex Rodriguez Amateur Draft 6 442333
75 Alex Rodriguez Amateur Draft 6 442334
76 Alex Rodriguez Amateur Draft 6 1062500
77 Alex Rodriguez Amateur Draft 6 2162500
78 Alex Rodriguez Amateur Draft 6 3112500
79 Alex Rodriguez Amateur Draft 6 4362500
80 Alex Rodriguez Free Agency 6 22000000
81 Alex Rodriguez Free Agency 6 22000000
82 Alex Rodriguez Free Agency 6 22000000
83 Alex Rodriguez Traded 5 22000000
84 Alex Rodriguez Traded 5 26000000
85 Alex Rodriguez Traded 5 21680727
86 Alex Rodriguez Free Agency 5 22708525
87 Alex Rodriguez Free Agency 5 28000000
88 Alex Rodriguez Free Agency 5 33000000
89 Alex Rodriguez Free Agency 5 33000000
90 Alex Rodriguez Free Agency 5 32000000
91 Alex Rodriguez Free Agency 5 29000000
92 Alex Rodriguez Free Agency 5 28000000
93 Alex Rodriguez Traded 10 22000000
94 Alexi Amarista* Amateur Free Agent 10 481000
95 Alexi Amarista* Traded 8 497400
96 Alexi Amarista* Traded 6 511100
97 Alexi Amarista* Traded 6 1150000
98 Allen Craig Amateur Draft 9 400000
99 Allen Craig Amateur Draft 7 414000
100 Allen Craig Amateur Draft 3 495000
输出应该类似于这样:
Name Acquired Position Salary
8 Adam Dunn* Free Agency 3 8000000
9 Adam Dunn* Free Agency 3 12000000
10 Adam Dunn* Free Agency 10 12000000
11 Adam Dunn* Free Agency 10 14000000
12 Adam Dunn* Free Agency 10 15000000
13 Adam Dunn* Free Agency 10 15000000
23 Adam LaRoche* Free Agency 3 4500000
24 Adam LaRoche* Free Agency 3 7000000
25 Adam LaRoche* Free Agency 3 8000000
26 Adam LaRoche* Free Agency 3 10000000
27 Adam LaRoche* Free Agency 3 12000000
28 Adam LaRoche* Free Agency 10 12000000
29 Adam Lind* Amateur Draft 10 411800
30 Adam Lind* Amateur Draft 10 550000
31 Adam Lind* Amateur Draft 3 5150000
32 Adam Lind* Amateur Draft 3 5000000
33 Adam Lind* Amateur Draft 3 5000000
34 Adam Lind* Amateur Draft 3 7000000
35 Adam Lind* Traded 3 7500000
36 Adrian Gonzalez* Traded 10 316000
37 Adrian Gonzalez* Traded 3 327500
38 Adrian Gonzalez* Traded 3 500000
39 Adrian Gonzalez* Traded 3 875000
40 Adrian Gonzalez* Traded 3 3125000
41 Adrian Gonzalez* Traded 3 4875000
42 Adrian Gonzalez* Traded 3 6300000
43 Adrian Gonzalez* Traded 3 21000000
44 Adrian Gonzalez* Traded 3 21000000
45 Adrian Gonzalez* Traded 3 21000000
46 Adrian Gonzalez* Traded 3 21857000
47 Alan Bannister Traded 10 350000
48 Albert Belle Amateur Draft 9 68000
49 Albert Belle Amateur Draft 10 117000
50 Albert Belle Amateur Draft 7 130000
51 Albert Belle Amateur Draft 10 175000
52 Albert Belle Amateur Draft 7 1675000
53 Albert Belle Amateur Draft 7 2775000
54 Albert Belle Amateur Draft 7 4500000
55 Albert Belle Amateur Draft 7 5700000
69 Albert Pujols Free Agency 3 12000000
70 Albert Pujols Free Agency 10 16000000
71 Albert Pujols Free Agency 3 23000000
72 Albert Pujols Free Agency 3 24000000
93 Alex Rodriguez Traded 10 22000000
94 Alexi Amarista* Amateur Free Agent 10 481000
基本上,如果我想按组选择仅位置为10的数据,我只需进行子集操作或编写以下代码:
library(dplyr)
df_1 <- df %>% group_by(Name, Acquired) %>% filter(Position == 10) %>% as.data.frame()
我希望保留所有包含10的组,过滤掉不包含10的组。因此,Adam Dunn Acquired by Amateur Draft就不行,但Adam Dunn Free Agency是可以的。我猜想这需要进行一些条件筛选,但我不确定具体是什么。
subset(DF, ave(Position, Name, Acquired, FUN = function(x) 10 %in% x) == 1)
- G. Grothendieckdata.table
,即library(data.table);setDT(df)[, if(any(Position ==10)) .SD, .(Name, Aquired)]
。 - akrundata.table
,那么怎么样:library(sqldf); sqldf("select t2.* from (select distinct Name,Acquired from dat where Position = 10) t1 left join dat t2 on t1.Name = t2.Name and t1.Acquired = t2.Acquired")
? :-)(可能还有更好的方法...但我想不到除了base、dplyr
、data.table
和某种形式的SQL之外的其他类别。) - r2evans