我在R中有以下数据框:
ID = c(rep(1,5),rep(2,3),rep(3,2),rep(4,6));ID
VAR = c("A","A","A","A","B","C","C","D",
"E","E","F","A","B","F","C","F");VAR
CATEGORY = c("ANE","ANE","ANA","ANB","ANE","BOO","BOA","BOO",
"CAT","CAT","DOG","ANE","ANE","DOG","FUT","DOG");CATEGORY
DATA = data.frame(ID,VAR,CATEGORY);DATA
这看起来像是下面的表格:
ID | VAR | CATEGORY |
---|---|---|
1 | A | ANE |
1 | A | ANE |
1 | A | ANA |
1 | A | ANB |
1 | B | ANE |
2 | C | BOO |
2 | C | BOA |
2 | D | BOO |
3 | E | CAT |
3 | E | CAT |
4 | F | DOG |
4 | A | ANE |
4 | B | ANE |
4 | F | DOG |
4 | C | FUT |
4 | F | DOG |
ID | TEXTS | category |
---|---|---|
1 | A | ANE |
2 | C | BOO |
3 | E | CAT |
4 | F | DOG |
如何在R中实现此操作? 假设这是一个样本示例。我的实际数据框包含850,000行,有14,000个唯一的ID。
Modeind
函数中,如果存在并列的情况,它只会选择第一个出现的。我不知道他们是否需要在这种情况下进行抽样。 - akrun