矩阵列中的连续数字计数

Question

矩阵列中的连续数字计数

5

我有一个由随机分布的0s组成的1s和-1s的矩阵：

%// create matrix of 1s and -1s

hypwayt = randn(10,5);
hypwayt(hypwayt > 0) =  1;
hypwayt(hypwayt < 0) = -1;

%// create numz random indices at which to insert 0s (pairs of indices may  
%// repeat, so final number of inserted zeros may be < numz)

numz = 15;
a = 1;
b = 10;
r = round((b-a).*rand(numz,1) + a);
s = round((5-1).*rand(numz,1) + a);

for nx = 1:numz
    hypwayt(r(nx),s(nx)) = 0
end

输入：

hypwayt =

-1     1     1     1     1
 1    -1     1     1     1
 1    -1     1     0     0
-1     1     0    -1     1
 1    -1     0     0     0
-1     1    -1    -1    -1
 1     1     0     1    -1
 0     1    -1     1    -1
-1     0     1     1     0
 1    -1     0    -1    -1

我想要统计一列中非零元素重复出现的次数，生成如下所示的内容：

基本思路是（由@rayryeng提供）对于每一列，独立统计每次遇到一个唯一的数字时，开始递增累积计数器，并且每次遇到与前一个相同的数字都递增。一旦遇到新数字，计数器就会被重置为1，除了当遇到0时，此时计数器为0。

期望输出：

hypwayt_runs =

 1     1     1     1     1
 1     1     2     2     2
 2     2     3     0     0
 1     1     0     1     1
 1     1     0     0     0
 1     1     1     1     1
 1     2     0     1     2
 0     3     1     2     3
 1     0     1     3     0
 1     1     0     1     1

什么是最干净的方法来完成这个任务？

- siegel

3

请问您能否进一步解释一下 hypwayt_runs 是如何计算的？我没有看出其中的规律（遗憾）。 - rayryeng

5

盯着它看了30分钟，我现在明白了。对于每一列独立地来说，每当你遇到一个唯一的数字，你就开始递增一个累积计数器，并且每当你遇到与前一个相同的数字时，它就会递增。一旦你遇到一个新数字，它就会被重置为1……除非你遇到0，那么它就是0。 - rayryeng

4个回答

2

我想应该有更好的方法，但这个方法应该可行

使用 cumsum、diff、accumarray 和 bsxfun

%// doing the 'diff' along default dim to get the adjacent equality
out = [ones(1,size(A,2));diff(A)];

%// Putting all other elements other than zero to 1 
out(find(out)) = 1;

%// getting all the indexes of 0 elements
ind = find(out == 0);

%// doing 'diff' on indices to find adjacent indices
out1 = [0;diff(ind)];

%// Putting all those elements which are 1 to zero and rest to 1
out1 = 0.*(out1 == 1) + out1 ~= 1;

%// counting each unique group's number of elements
out1 = accumarray(cumsum(out1),1);

%// Creating a mask for next operation
mask = bsxfun(@le, (1:max(out1)).',out1.');

%// Doing colon operation from 2 to maxsize
out1 = bsxfun(@times,mask,(2:size(mask,1)+1).');    %'

%// Assign the values from the out1 to corresponding indices of out
out(ind) = out1(mask);

%// finally replace all elements of A which were zero to zero
out(A==0) = 0

Results:

Input:

>> A

A =

-1     1     1     1     1
 1    -1     1     1     1
 1    -1     1     0     0
-1     1     0    -1     1
 1    -1     0     0     0
-1     1    -1    -1    -1
 1     1     0     1    -1
 0     1    -1     1    -1
-1     0     1     1     0
 1    -1     0    -1    -1

输出：

>> out

out =

 1     1     1     1     1
 1     1     2     2     2
 2     2     3     0     0
 1     1     0     1     1
 1     1     0     0     0
 1     1     1     1     1
 1     2     0     1     2
 0     3     1     2     3
 1     0     1     3     0
 1     1     0     1     1

- Santhan Salai

我无法优雅地解决这个问题。+1 - rayryeng

@rayryeng 谢谢。我觉得我的方法很冗长，应该有更好的方式用更少的代码优雅地完成这个任务 :) - Santhan Salai

@rayryeng - “优雅”什么时候成为“不使用循环”的同义词了？似乎这个问题可以通过嵌套循环和几个if语句（增加、重置或计数器）轻松解决。由于OP没有提到性能是一个问题 - 我认为基于循环的解决方案可能更易读...它基本上是将您对问题的第二个评论放入代码中。公平地说 - 我没有尝试解决这个问题，它可能比我描述的更困难... - Dev-iL

@Dev-iL - 我用循环写了一个答案，但我认为它不够优雅。如果你想用循环解决这个问题，请随意尝试，但当我用循环尝试时，我不喜欢我写的方式。祝你好运！顺便说一句，如果你想要，我可以发布我所做的，但那可能是我写过的最丑陋的代码之一。哈哈 - rayryeng

@Dev-iL - 我用循环发布了一个答案。尽管它易读，但我不喜欢这个解决方案。你必须逐个迭代每个元素。 - rayryeng

1

基于rayryeng的答案，以下是我对循环解决方案的看法。

输入：

hypwayt = [
    -1     1     1     1     1
     1    -1     1     1     1
     1    -1     1     0     0
    -1     1     0    -1     1
     1    -1     0     0     0
    -1     1    -1    -1    -1
     1     1     0     1    -1
     0     1    -1     1    -1
    -1     0     1     1     0
     1    -1     0    -1    -1 ];

expected_out = [
     1     1     1     1     1
     1     1     2     2     2
     2     2     3     0     0
     1     1     0     1     1
     1     1     0     0     0
     1     1     1     1     1
     1     2     0     1     2
     0     3     1     2     3
     1     0     1     3     0
     1     1     0     1     1 ];

实际代码：

CNT_INIT = 2;             %// a constant representing an initialized counter
out = hypwayt;            %// "preallocation"
out(2:end,:) = diff(out); %// ...we'll deal with the top row later
hyp_nnz = hypwayt~=0;     %// nonzero mask for later brevity
cnt = CNT_INIT;           %// first initialization of the counter

for ind1 = 2:numel(out)
    switch abs(out(ind1))
        case 2 %// switch from -1 to 1 and vice versa:
            out(ind1) = 1;
            cnt = CNT_INIT;
        case 0 %// means we have the same number again:
            out(ind1) = cnt*hyp_nnz(ind1); %//put cnt unless we're zero
            cnt = cnt+1;
        case 1 %// means we transitioned to/from zero:
            out(ind1) = hyp_nnz(ind1); %// was it a nonzero element?
            cnt = CNT_INIT;            
    end
end

%// Finally, take care of the top row:
out(1,:) = hyp_nnz(1,:);

正确性测试：

assert(isequal(out,expected_out))

我想可以使用一些"复杂"的MATLAB函数来进一步简化它，但在我看来，它已经足够优雅了 :)

注意：out的顶行计算了两次（一次在循环中，一次在结尾），因此计算值两次会带来微小的效率问题。然而，这允许将整个逻辑放入一个单独的循环中操作numel()，在我看来，这个微小的额外计算是合理的。

- Dev-iL

1

这是一个不错的问题，由于@rayryeng没有提出向量化解决方案，所以在这里我提供了我的解决方法——好吧，这并不公平，我花了半天时间才得出这个解决方法。基本思路是使用cumsum作为最终函数。

p = size(hypwayt,2);  % keep nb of columns in mind
% H1 is the mask of consecutive identical values, but kept as an array of double (it will be incremented later)
H1 = [zeros(1,p);diff(hypwayt)==0];

% H2 is the mask of elements where a consecutive sequence of identical values ends. Note the first line of trues.
H2 = [true(1,p);diff(~H1)>0];

% 1st trick: compute the vectorized cumsum of H1
H3 = cumsum(H1(:));

% 2nd trick: take the diff of H3(H2).
% it results in a vector of the lengths of consecutive sequences of identical values, interleaved with some zeros.
% substract it to H1 at the same locations
H1(H2) = H1(H2)-[0;diff(H3(H2))];

% H1 is ready to be cumsummed! Add one to the array, all lengths are decreased by one.
Output = cumsum(H1)+1;

% last force input zeros to be zero
Output(hypwayt==0) = 0;

并且期望的输出：

Output =

      1     1     1     1     1
      1     1     2     2     2
      2     2     3     0     0
      1     1     0     1     1
      1     1     0     0     0
      1     1     1     1     1
      1     2     0     1     2
      0     3     1     2     3
      1     0     1     3     0
      1     1     0     1     1

让我添加一些解释。当然，大技巧是第二个，我花了一些时间才弄清楚如何快速计算连续相同值的长度。第一个仅是一个小技巧，可以在没有任何for循环的情况下计算整个内容。如果您直接对H1进行cumsum，您将获得带有一些偏移量的结果。通过取某些关键值的局部差异并在这些序列的结束后立即删除它们，以cumsum兼容的方式去除这些偏移量。这些特殊值数量众多，我还取了第一行（H2的第一行）：每个第一列元素都被视为与前一列的最后一个元素不同。

我希望现在更清楚了（也没有特殊情况的缺陷...）。

- Bentoy13

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rayryeng · Accepted Answer

作为Dev-IL提出的动机，这里提供一种使用循环的解决方案。尽管代码易读，但我认为它很慢，因为您必须逐个迭代每个元素。

hypwayt = [-1     1     1     1     1;
 1    -1     1     1     1;
 1    -1     1     0     0;
-1     1     0    -1     1;
 1    -1     0     0     0;
-1     1    -1    -1    -1;
 1     1     0     1    -1;
 0     1    -1     1    -1;
-1     0     1     1     0;
 1    -1     0    -1    -1];

%// Initialize output array
out = ones(size(hypwayt));

%// For each column
for idx = 1 : size(hypwayt, 2)
    %// Previous value initialized as the first row
    prev = hypwayt(1,idx);
    %// For each row after this point...
    for idx2 = 2 : size(hypwayt,1)        
        % // If the current value isn't equal to the previous value...
        if hypwayt(idx2,idx) ~= prev
            %// Set the new previous value
            prev = hypwayt(idx2,idx);
            %// Case for 0
            if hypwayt(idx2,idx) == 0
                out(idx2,idx) = 0;            
            end
         %// Else, reset the value to 1 
         %// Already done by initialization

        %// If equal, increment
        %// Must also check for 0
        else
            if hypwayt(idx2,idx) ~= 0
               out(idx2,idx) = out(idx2-1,idx) + 1;
            else
               out(idx2,idx) = 0;
            end
        end
    end
end

输出

>> out

out =

     1     1     1     1     1
     1     1     2     2     2
     2     2     3     0     0
     1     1     0     1     1
     1     1     0     0     0
     1     1     1     1     1
     1     2     0     1     2
     0     3     1     2     3
     1     0     1     3     0
     1     1     0     1     1