将Rcpp对象分配到Rcpp List中会产生最后一个元素的重复。

Question

将Rcpp对象分配到Rcpp List中会产生最后一个元素的重复。

3

我想要把一个Rcpp::CharacterMatrix的每一行转换成Rcpp::List中的一个元素。

然而，我编写的函数有一个奇怪的行为，即列表的每个条目都对应矩阵的最后一行。这是为什么？这是指针相关的概念吗？请解释一下。

函数：

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List char_expand_list(CharacterMatrix A) {
  CharacterVector B(A.ncol());

  List output;

  for(int i=0;i<A.nrow();i++) {
    for(int j=0;j<A.ncol();j++) {
      B[j] = A(i,j);
    }

    output.push_back(B);
  }

  return output;
}

测试矩阵：

这是传递给上述函数的矩阵A。

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
mat
#     [,1] [,2] [,3]
# [1,] "a"  "a"  "a" 
# [2,] "b"  "b"  "b" 
# [3,] "c"  "c"  "c"

输出:

以上函数应将此矩阵作为输入，并返回类似于矩阵行的列表：

char_expand_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"

但是我得到了不同的东西：

char_expand_list(mat)
# [[1]]
# [1] "c" "c" "c"
#
# [[2]]
# [1] "c" "c" "c"
#
# [[3]]
# [1] "c" "c" "c"

可以看到，输出结果中最后一个元素（例如“c”矩阵行）会重复出现在第一个和第二个列表元素中。为什么会这样呢？

- cryptomanic

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- coatless · Accepted Answer

这里发生的情况主要是由于 Rcpp 对象的工作方式所致，特别是 CharacterVector 行为像一个指向内存位置的指针。通过在 for 循环外定义该内存位置，结果是一个“全局”指针。也就是说，当循环中更新 B 时，这会随后更新所有方便存储在 Rcpp::List 中的 B 变体。因此，在列表中重复出现的 "c" 行。

使用 .push_back() 在任何 Rcpp 数据类型上都是非常非常不好的想法，因为你最终会不断地复制不断扩大的对象。由于 Rcpp 数据类型隐藏了控制 R 对象的底层 SEXP，必须重新创建，因此将发生复制。因此，您应尝试以下一种方法:

- 将 Rcpp::CharacterVector 的创建重排到第一个 for 循环内部，并预先分配 Rcpp::List 空间。 - 切换到仅使用 C++ 标准库对象，并在结束时转换为适当的类型。

- std::list 与 T 类型为 std::vector (例如 std::string) - Rcpp::wrap(x) 返回正确的对象或修改函数返回类型从 Rcpp::List 到 std::list>。 - 预分配 Rcpp::List 空间并使用 std::vector 类型 T（例如 std::string）。 - 预分配 Rcpp::List 空间，并在存储在列表中之前对 Rcpp 对象进行 clone()。

在选项1中，我们通过将对B的声明移动到第一个循环中，预分配列表空间并正常访问输出列表来重新排列函数。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_rearrange(Rcpp::CharacterMatrix A) {
  Rcpp::List output(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {
    Rcpp::CharacterVector B(A.ncol());

    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    output[i] = B;
  }

  return output;
}

选项2

这里我们使用std::vector<std::string>替换了Rcpp::CharacterVector，并将std::list<std::vector<std::string>>代替了Rcpp::List。最后，我们通过Rcpp::wrap()将标准对象转换为Rcpp::List。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_std_to_list(Rcpp::CharacterMatrix A) {
  std::vector<std::string> B(A.ncol());

  std::list<std::vector<std::string> > o;

  for(int i = 0 ;i < A.nrow(); i++) {
    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    o.push_back(B);
  }

  return Rcpp::wrap(o);
}

提供：

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
char_expand_std_to_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"

选项 3

或者，您可以保留 Rcpp::List，但提前声明其期望的大小，并仍然使用 std::vector<T> 元素。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_vec(Rcpp::CharacterMatrix A) {
  std::vector<std::string> B(A.ncol());

  Rcpp::List o(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {
    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    o[i] = B;
  }

  return o;
}

选项4

最后，在为列表预定义空间的情况下，每次迭代都会显式克隆数据。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
Rcpp::List char_expand_list_clone(Rcpp::CharacterMatrix A) {
  Rcpp::CharacterVector B(A.ncol());
  Rcpp::List output(A.nrow());

  for(int i = 0; i < A.nrow(); i++) {

    for(int j = 0; j < A.ncol(); j++) {
      B[j] = A(i, j);
    }

    output[i] = clone(B);
  }

  return output;
}

基准测试

基准测试结果表明，通过重新排列和预分配空间的选项1表现最佳。其次是Rcpp::List中每个向量保存前进行克隆的选项4。

library("microbenchmark")
library("ggplot2")

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))

micro_mat_to_list = 
  microbenchmark(char_expand_list_rearrange(mat),
                 char_expand_std_to_list(mat),
                 char_expand_list_vec(mat),
                 char_expand_list_clone(mat))
micro_mat_to_list
# Unit: microseconds
#                             expr   min     lq    mean median     uq    max neval
#  char_expand_list_rearrange(mat) 1.501 1.9255 3.22054 2.1965 4.8445  6.797   100
#     char_expand_std_to_list(mat) 2.869 3.2035 4.90108 3.7740 6.4415 27.627   100
#        char_expand_list_vec(mat) 1.948 2.2335 3.83939 2.7130 5.2585 24.814   100
#      char_expand_list_clone(mat) 1.562 1.9225 3.60184 2.2370 4.8435 33.965   100