如何将数据框转换为单行(R语言)

3

我有一个数据框,如下所示:

   days target probability
1    75   0.80   0.9060341
2   100   0.90   0.75

df <- structure(list(days = c(75, 100, 120, 150, 200, 300, 75, 100, 
120, 150, 200, 300, 75, 100, 120, 150, 200, 300, 75, 100, 120, 
150, 200, 300, 75, 100, 120, 150, 200, 300, 75, 100, 120, 150, 
200, 300), target = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.9, 0.9, 
0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1, 1.05, 1.05, 1.05, 1.05, 
1.05, 1.05, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.2, 1.2, 1.2, 1.2, 
1.2, 1.2), probability = c(0.90603410539241, 0.90603410539241, 
0.90603410539241, 0.90603410539241, 0.90603410539241, 0.904213051602258, 
0.733995206180212, 0.733995206180212, 0.733995206180212, 0.733995206180212, 
0.733995206180212, 0.731795453278156, 0.512082243536284, 0.512082243536284, 
0.512082243536284, 0.512082243536284, 0.512082243536284, 0.511492313399902, 
0.390943562448882, 0.390943562448882, 0.390943562448882, 0.390943562448882, 
0.390943562448882, 0.391451116324459, 0.282452594645645, 0.282452594645645, 
0.282452594645645, 0.282452594645645, 0.282452594645645, 0.283766337160544, 
0.106271449405461, 0.106271449405461, 0.106271449405461, 0.106271449405461, 
0.106271449405461, 0.107778317673786)), .Names = c("days", "target", 
"probability"), class = "data.frame", row.names = c(1L, 2L, 3L, 
4L, 5L, 7L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 20L, 
21L, 23L, 25L, 26L, 27L, 28L, 29L, 31L, 33L, 34L, 35L, 36L, 37L, 
40L, 43L, 44L, 45L, 46L, 47L, 49L))

我希望在CSV文件中发出一行,其中包含以下标题:

day75_target0.80,day100_target0.9等等——每行中的值应该只是相应的概率。

有什么想法吗?


4
请将dput函数中的+符号去掉。 - David Arenburg
2个回答

1

考虑通过简单地连接字段然后转置数据框的基本R方法:

# CONCATENATING DAYS AND TARGETS FIELDS 
newdf <- data.frame(daystarget = paste0("day", df$days, "_target", df$target,
                    probability = df$probability), stringsAsFactors=F)
# ROUND PROBABILITY TO ONE DIGIT
newdf$probability <- round(as.numeric(newdf$probability), 1)

# TRANSPOSE DATA FRAME
finaldf <- data.frame(t(newdf),stringsAsFactors=F)       
# RENAME COLUMNS TO FIRST ROW
names(finaldf) <- finaldf[1,]
# REMOVE PREVIOUS FIRST ROW
finaldf <- finaldf[2,]
# RESET ROW NAMES
row.names(finaldf) <- 1:nrow(finaldf)

write.csv(finaldf, "FinalDF.csv", row.names=F)

#  day75_target0.8   day100_target0.8   day120_target0.8  day150_target0.8  ... 
#1             0.9                0.9                0.9               0.9  ...         

0

这并不是对你的可怜数据最具吸引力的做法,但直接接受它。使用tidyverse很容易实现。

library(tidyverse)
#first create the columns:
> df %>% unite(daytarg, days, target, sep = "_target") %>% head
        daytarg probability
1  75_target0.8   0.9060341
2 100_target0.8   0.9060341
3 120_target0.8   0.9060341
4 150_target0.8   0.9060341
5 200_target0.8   0.9060341
7 300_target0.8   0.9042131

检查一下我们是否有唯一的列似乎是明智的。

> df %>% unite(daytarg, days, target, sep = "_target") %>% count(daytarg) %>% filter(n > 1)
# A tibble: 0 x 2
# ... with 2 variables: daytarg <chr>, n <int>

好的,很好。 现在我们可以添加一个spread:

> df %>% 
    unite(daytarg, days, target, sep = "_target") %>% 
    spread(daytarg, probability) %>% 
    write_csv("output.csv")

所以,所有这些都只是从所需的列中“创建所需名称”,并使用概率作为值将该名称转换为列。但是请注意,对于任何类似此类的东西,您都必须具有唯一的组合。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接