其他混乱的选择:
library(tidyr)
library(sqldf)
newdf <- gather(df, year, code, -id)
df$firstyear <- sqldf('SELECT min(rowid) rowid, id, year as firstyear
FROM newdf
WHERE code = 1
GROUP BY id')[3]
library(tidyr)
df2 <- gather(df, year, code, -id)
df2 <- df2[df2$code == 1, 1:2]
df2 <- df2[!duplicated(df2$id), ]
merge(df, df2)
library(tidyr)
library(dplyr)
newdf <- gather(df, year, code, -id)
df$firstyear <- (newdf %>%
filter(code==1) %>%
select(id, year) %>%
group_by(id) %>%
summarise(first = first(year)))[2]
Output:
id in05 in06 in07 in08 in09 year
1 a 1 0 1 0 0 in05
2 b 0 0 1 1 0 in07
3 c 0 0 0 1 0 in08
4 d 1 1 1 1 1 in05
A cleaner solution combining plaforts solution with alexises_laz is:
names(df) <- c("id", 2005, 2006, 2007, 2008, 2009)
df$firstyear <- names(df[-1])[apply(df[-1], 1, which.max)]
id 2005 2006 2007 2008 2009 firstyear
1 a 1 0 1 0 0 2005
2 b 0 0 1 1 0 2007
3 c 0 0 0 1 0 2008
4 d 1 1 1 1 1 2005
If we'd like to keep the original column names we could use the renaming provided by @David Arenburg.
df$firstYear <- gsub('in', '20', names(df[-1]))[apply(df[-1], 1, which.max)]
id in05 in06 in07 in08 in09 firstYear
1 a 1 0 1 0 0 2005
2 b 0 0 1 1 0 2007
3 c 0 0 0 1 0 2008
4 d 1 1 1 1 1 2005
max.col
函数 - 总是拯救我们于水深火热之中。尽管它默认处理并列结果时选择随机值实在很令人烦恼,考虑到which.max
/which.min
等函数总是返回首个遇到的结果。 - thelatemail