数据表中的行操作

Question

数据表中的行操作

5

我正在尝试使用data.table按行进行简单的求和和平均值，但我得到了意外的结果。我遵循了FAQ手册第2节中关于data.table的帮助。我找到了一种有效的方法，但我不确定为什么FAQ第2节中的这种方法不起作用。该方法给出了错误的结果（即它给出了第一列的值）。

dt[, genesum:=lapply(.SD,sum), by=gene]
head(dt)

      gene      TCGA_04_1348      TCGA_04_1362   genesum  
  1:    A1BG          0.94565          0.70585  0.94565   
  2: A1BG-AS          0.97610          1.15850  0.97610   
  3:    A1CF          0.00000          0.02105  0.00000   
  4:   A2BP1          0.00300          0.04150  0.00300   
  5:   A2LD1          4.57975          5.02820  4.57975  
  6:     A2M         60.37320         36.09715 60.37320

这样做可以得到我所期望的结果

dt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)]
head(dt)

       gene     TCGA_04_1348       TCGA_04_1362 genesum
  1:    A1BG          0.94565          0.70585  1.65150
  2: A1BG-AS          0.97610          1.15850  2.13460
  3:    A1CF          0.00000          0.02105  0.02105
  4:   A2BP1          0.00300          0.04150  0.04450
  5:   A2LD1          4.57975          5.02820  9.60795
  6:     A2M         60.37320         36.09715 96.47035

我有很多列和行，这只是其中的一部分。这与我设置密钥的方式有关吗？

tables()
 NAME        NROW    MB COLS                                               KEY                                             
 [1,] dt     20,785  2  gene,TCGA_04_1348_01A,TCGA_04_1362_01A,genesum    gene

- sahir

2个回答

2

以下是一种替代方案（基于这个stackoverflow问题）：

dt[ ,  genesum := sum(.SD[, -1, with=FALSE]), by = 1:NROW(dt) ]

另一种选择：

# OR... you can create a column with row positions and apply your function by row
dt[, rowpos := .I]
dt[ ,  genesum := sum(.SD[, -1, with=FALSE]), by = rowpos]

- rafa.pereira

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Steve Lianoglou · Accepted Answer

一些事项：

dt[, genesum:=lapply(.SD,sum), by=gene] and dt[, genesum:=apply(dt[ ,-1],1, sum)] are quite different.
- dt[, genesum:=lapply(.SD,sum), by=gene] loops over the columns of the .SD data.table and sums them
- dt[, genesum:=apply(dt[, -1], 1, sum)] is looping over the rows (ie. apply(x, 1, function) applies function to every row in x
I think you can get what you want by calling rowSums, like so:
```
dt[, genesum := rowSums(dt[, -1])]
```

这是你想要的吗？