从Ruby数组中删除连续重复项

Question

从Ruby数组中删除连续重复项

14

假设我有以下数组，希望去除连续的重复项：

arr = [1,1,1,4,4,4,3,3,3,3,5,5,5,1,1,1]

我想要获取以下内容：

=> [1,4,3,5,1]

如果有比我的解决方案（或其变体）更简单、更高效的东西就太好了：

(arr + [nil]).each_cons(2).collect { |i| i[0] != i[1] ? i[0] : nil }.compact

或者

(arr + [nil]).each_cons(2).each_with_object([]) { 
   |i, memo| memo << i[0] unless i[0] == i[1] 
 }

编辑：

看起来@ArupRakshit下面的解决方案非常简单。我仍在寻找比我的解决方案更高效的方法。

编辑：

我将对随着答复而来的响应进行基准测试：

require 'fruity'
arr = 10000.times.collect { [rand(5)] * (rand(4) + 2) }.flatten

compare do
  abdo { (arr + [nil]).each_cons(2).collect { 
    |i| i[0] != i[1] ? i[0] : nil }.compact 
  }
  abdo2 { 
          (arr + [nil]).each_cons(2).each_with_object([]) { 
           |i, memo| memo << i[0] unless i[0] == i[1] 
          }
  }
  arup { arr.chunk(&:to_i).map(&:first) }
  arupv2 { arr.join.squeeze.chars.map(&:to_i) }
  agis {
    i = 1
    a = [arr.first]

    while i < arr.size
      a << arr[i] if arr[i] != arr[i-1]
      i += 1
     end
    a
  }
  arupv3 { arr.each_with_object([]) { |el, a| a << el if a.last != el } }
end

基准测试结果：

agis is faster than arupv3 by 39.99999999999999% ± 10.0%
arupv3 is faster than abdo2 by 1.9x ± 0.1
abdo2 is faster than abdo by 10.000000000000009% ± 10.0%
abdo is faster than arup by 30.000000000000004% ± 10.0%
arup is faster than arupv2 by 30.000000000000004% ± 10.0%

如果我们使用：

arr = 10000.times.collect { rand(4) + 1 } # less likelihood of repetition

我们得到：

agis is faster than arupv3 by 19.999999999999996% ± 10.0%
arupv3 is faster than abdo2 by 1.9x ± 0.1
abdo2 is similar to abdo
abdo is faster than arupv2 by 2.1x ± 0.1
arupv2 is similar to arup

- Abdo

1

我将会在回应到来时进行基准测试...紧接着接受第一个答案，确保不会有其他回应... - Mark Thomas

1

我相信人们（尤其是Ruby程序员）在答案被接受后不会停止发帖。 - Abdo

1

@MarkThomas 我完全理解你的观点，但与此同时，在过去的几天里，我看到了一群人（例如：aruprakshit、careswoveland、steenslag、matt）回答了一些早前的问题，并且只是因为他们喜欢这样做而对它们进行了改进！ - Abdo

1

你为什么会更喜欢稍微快一点的解决方案而不是更清晰的解决方案呢？这段代码在你的应用程序中是否已经成为瓶颈了呢？ - Wayne Conrad

2

@WayneConrad 我已经接受下面更清晰的解决方案，你可以看到 =）请检查我的对Agis回复的评论 :-) - Abdo

显示剩余15条评论

2个回答

4

不太优雅但最有效的解决方案：

require 'benchmark'

arr = [1,1,1,4,4,4,3,3,3,3,5,5,5,1,1,1]

GC.disable
Benchmark.bm do |x|
  x.report do
    1_000_000.times do
      i = 1
      a = [arr.first]

      while i < arr.size
        a << arr[i] if arr[i] != arr[i-1]
        i += 1
      end
    end
  end
end
#      user     system      total        real
# 1.890000   0.010000   1.900000 (  1.901702)

GC.enable; GC.start; GC.disable

Benchmark.bm do |x|
  x.report do
    1_000_000.times do
      (arr + [nil]).each_cons(2).collect { |i| i[0] != i[1] ? i[0] : nil }.compact
    end
  end
end
#      user     system      total        real
# 6.050000   0.680000   6.730000 (  6.738690)

- Agis

是的，这比我最快的解决方案至少快2倍。（我的基准测试不同，因为我使用了一个更大的数组）。然而，你的解决方案有bug：你能试试：agis([0,1,4,4,4,3,3,3,3,5,5,5,1]) => 返回 [0, 0, 1, 4, 3, 5, 1] 预期结果是 [0, 1, 4, 3, 5, 1]。修复后请告诉我，这样我就可以和我的基准测试一起发布 =) - Abdo

我怀疑有多少 Ruby 程序员会使用这样的循环来解决这个问题.. :) - Arup Rakshit

2

@Agis，你得到了+1。ArupRakshit的最后一个解决方案在效率上非常接近你的解决方案。然而，他的解决方案在视觉上要简单得多。我的问题要求同时具备简洁和高效。 - Abdo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Arup Rakshit · Accepted Answer

使用Enumerable#chunk按以下方式执行：

arr = [1,1,1,4,4,4,3,3,3,3,5,5,5,1,1,1]
arr.chunk { |e| e }.map(&:first)
# => [1, 4, 3, 5, 1]
# if you have only **Fixnum**, something magic
arr.chunk(&:to_i).map(&:first)
# => [1, 4, 3, 5, 1]

更新

根据@abdo的评论，这里提供另一种选择:

arr.join.squeeze.chars.map(&:to_i)
# => [1, 4, 3, 5, 1]

另一个选择

arr.each_with_object([]) { |el, a| a << el if a.last != el }