寻找更快的方法处理单元格和向量操作。

Question

寻找更快的方法处理单元格和向量操作。

3

我有一个单元格列表，每个元素包含访问向量所需的多个坐标。例如，

C ={ [1 2 3] , [4 5],  [6], [1 8 9 12 20]}

这只是一个例子，在实际情况中，C 的大小为 10^4 到 10^6，每个元素包含一个由 1 到 1000 个元素组成的向量。我需要使用每个元素作为坐标来访问相应向量中的元素。我正在使用循环来查找由单元格元素指定的向量元素的平均值。

 for n=1:size(C,1)
   x = mean(X(C{n}));
   % put x to somewhere  
 end

这里X是一个包含10000个元素的大向量。使用循环是可以的，但我想知道是否有不使用循环也能完成相同任务的方法？我问这个问题的原因是上述代码需要运行很多次，使用循环时速度很慢。

- user1285419

你的意思是坐标指的是索引吗？对于C的每个单元格，应该是x(n)，而不是x(i,j)吧？另外，循环可能是这样的 - for n=1:size(C,2)？n=1:numel(C) 在那里可能更合适？ - Divakar

还有一个问题想问您——在 C 语言中，最大的单元格可以有多大？例如，在这里给出的示例中，最大的单元格是 5，它位于最后一个单元格中。 - Divakar

我强烈建议您对代码进行分析，检查实际的瓶颈在哪里。 - bdecaf

很高兴收到接受的勾号，但我真的很想听听您使用解决方案中提出的方法在您的端上获得了什么样的加速效果，并希望能得到一些反馈/评论。此外，请澄清一下 x(n) 或 x 这个问题？ - Divakar

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Divakar · Answer 1

方法一

C_num = char(C{:})-0; %// 2D numeric array from C with cells of lesser elements 
             %// being filled with 32, which is the ascii equivalent of space

mask = C_num==32; %// get mask for the spaces
C_num(mask)=1; %// replace the numbers in those spaces with ones, so that we 
                %// can index into x witout throwing any out-of-extent error

X_array = X(C_num); %// 2D array obtained after indexing into X with C_num
X_array(mask) = nan; %// set the earlier invalid space indices with nans
x = nanmean(X_array,2); %// final output of mean values neglecting the nans

方法二

lens = cellfun('length',C); %// Lengths of each cell in C
maxlens = max(lens); %// max of those lengths

%// Create a mask array with no. of rows as maxlens and columns as no. of cells. 
%// In each column, we would put numbers from each cell starting from top until
%// the number of elements in that cell. The ones(true) in this mask would be the 
%// ones where those numbers are to be put and zeros(false) otherwise.
mask = bsxfun(@le,[1:maxlens]',lens) ; %//'

C_num = ones(maxlens,numel(lens)); %// An array where the numbers from C are to be put

C_num(mask) = [C{:}]; %// Put those numbers from C in C_num.
  %// NOTE: For performance you can also try out: double(sprintf('%s',C{:}))
X_array = X(C_num); %// Get the corresponding X elements
X_array(mask==0) = nan; %// Set the invalid locations to be NaNs
x = nanmean(X_array); %// Get the desired output of mean values for each cell

方法三

这个方法与方法二几乎相同，但在结尾处进行了一些更改，以避免使用nanmean。

因此，将方法二的最后两行编辑为以下内容-

X_array(mask1==0) = 0;
x = sum(X_array)./lens;