如何在R中将单变量数据转换为双变量数据（将国家年份转换为配对年份）？

Question

如何在R中将单变量数据转换为双变量数据（将国家年份转换为配对年份）？

4

我是一位有用的助手，可以为您进行文本翻译。以下是您需要翻译的内容：

我的数据按国家-年份组织，具有双边关系的ID。我想按dyad-year来组织这些数据。

这是我的数据组织方式：

     dyadic_id country_codes year
  1          1           200 1990
  2          1            20 1990
  3          1           200 1991
  4          1            20 1991
  5          2           300 1990
  6          2            10 1990
  7          3           100 1990
  8          3            10 1990
  9          4           500 1991
  10         4           200 1991

这是我想要组织我的数据的方式：

  dyadic_id_want country_codes_1 country_codes_2 year_want
1              1             200              20      1990
2              1             200              20      1991
3              2             300              10      1990
4              3             100              10      1990
5              4             500             200      1991

这里是可重现的代码：

dyadic_id<-c(1,1,1,1,2,2,3,3,4,4)
country_codes<-c(200,20,200,20,300,10,100,10,500,200)
year<-c(1990,1990,1991,1991,1990,1990,1990,1990,1991,1991)
mydf<-as.data.frame(cbind(dyadic_id,country_codes,year))

我想让mydf看起来像我想要的df。

dyadic_id_want<-c(1,1,2,3,4)
country_codes_1<-c(200,200,300,100,500)
country_codes_2<-c(20,20,10,10,200)
year_want<-c(1990,1991,1990,1990,1991)
my_df_i_want<-as.data.frame(cbind(dyadic_id_want,country_codes_1,country_codes_2,year_want))

- user46257

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

我们可以使用不同的方法将数据从“长格式”转换为“宽格式”。以下介绍其中两种方法。

使用“data.table”，我们将“data.frame”转换为“data.table”（setDT(mydf)），创建一个序列列（'ind'），按“dyadic_id”和“year”进行分组。然后，我们使用dcast将数据集从“长格式”转换为“宽格式”。

library(data.table)
setDT(mydf)[, ind:= 1:.N, by = .(dyadic_id, year)]
dcast(mydf, dyadic_id+year~ paste('country_codes', ind, sep='_'), value.var='country_codes')
#   dyadic_id year country_codes_1 country_codes_2
#1:         1 1990             200              20
#2:         1 1991             200              20
#3:         2 1990             300              10
#4:         3 1990             100              10
#5:         4 1991             500             200

或者使用 dplyr/tidyr，我们可以进行同样的操作，即按 'dyadic_id'、'year' 进行分组，创建一个名为 'ind' 的列 (mutate(...)，并使用 tidyr 中的 spread 将其转换为 'wide' 格式。

library(dplyr)
library(tidyr)
mydf %>% 
    group_by(dyadic_id, year) %>%
    mutate(ind= paste0('country_codes', row_number())) %>% 
    spread(ind, country_codes)
#    dyadic_id  year country_codes1 country_codes2
#       (dbl) (dbl)          (dbl)          (dbl)
#1         1  1990            200             20
#2         1  1991            200             20
#3         2  1990            300             10
#4         3  1990            100             10
#5         4  1991            500            200