两维数组的交集

Question

两维数组的交集

3

有没有一种简单的方法可以找到二维数组的交集？例如：

arr1 = [1,2,3,4,5]
arr2 = [5,6,7,8]
arr3 = [5]
bigarr = [arr1,arr1,arr3]

我知道可以实现以下功能：

intersection = arr1 & arr2 & arr3 # => 5
intersection = big_arr[0] & big_arr[1] & big_arr[2] # => 5

但是big_arr中的元素数量将会变化。我想知道是否有一种简单的方法可以交叉所有big_arr中的元素，而不管元素数量如何。

- Cosi

交集不是 5，而是 [5]。 - sawa

感谢澄清 ;) - Cosi

2个回答

0

你想要什么：一个漂亮的方法还是一个先到终点线的方法？我的朋友@Arup 提供了一个；我提供另一个。代码

def heavy_lifter(a)
  wee_one = a.min_by(&:size)
  return [] if wee_one.empty?
  wee_loc = a.index(wee_one)
  counts = wee_one.each_with_object({}) { |e,h| h.update(e=>1) }
  nbr_reqd = 1
  a.each_with_index do |b,i|
    next if i == wee_loc
    b.each do |e|
      cnt = counts[e]
      case
      when cnt.nil?
        next
      when cnt == nbr_reqd
        counts[e] = cnt + 1
      when cnt < nbr_reqd
        counts.delete(e)
        return [] if counts.empty?
      end
    end
    nbr_reqd += 1
  end
  counts.keys.each { |k| counts.delete(k) if counts[k] < nbr_reqd }
  counts.keys
end

例子

a1 = [1,2,3,4,5]
a2 = [5,6,7,8]
a3 = [5]
a = [a1,a2,a3]

heavy_lifter(a)
  #=> [5]

解释

这是方法的工作原理：

选择最小的数组（wee_one）。为了简化说明，假设它是 a 的第一个元素。
将 wee_one 转换为计数哈希表 counts，其中对于 wee_one 的每个元素，counts[e] = 1。
遍历其余的数组。
在处理数组时，将从 counts 中删除键。
在所有计算完成后，counts.keys 等于所有数组的交集。
在处理了 nbr_reqd 个数组（包括 wee_one）之后，counts[k] 等于包含 k 的数组的数量。显然，如果 counts[k] < nbr_reqd，则可以从 counts 中删除键 k（但我们不会在注意到它们或在结束时删除这样的键）。
假设现在要处理偏移量为 nbr_reqd 的数组 b，这意味着已经处理了 nbr_reqd 个数组（包括偏移量为零的 wee_one）。对于 b 的每个元素 e，我们获得 cnt = counts[e]。有四种可能性：
- cnt == nil，此时无需执行任何操作；
- cnt < nbr_reqd，此时从 counts 中删除键 e；
- cnt == nbr_reqd，表示 e 在之前处理的所有数组中都存在，此时执行 counts[k] = cnt + 1；以及
- cnt == nbr_read+1，表示 e 在之前处理的所有数组中都存在，并且是 b 中已经被处理过的另一个 e 的副本，此时无需执行任何操作。
nbr_reqd 加一，然后重复下一个数组的处理过程。
在处理完所有数组后，只需删除 counts 中满足 counts[k] < nbr_reqd 的每个键 k。

可爱方法

def cutie(a)
  a.reduce(:&)
end

测试数据

def test(mx, *sizes)
  sizes.map { |sz| Array.new(sz) { rand(mx) } }
end

例如：

test(10,5,6,7)
  #=> [[9, 1, 5, 1, 1], [0, 8, 7, 8, 5, 0], [5, 1, 7, 6, 7, 9, 5]]

基准测试代码

require 'benchmark'

def bench(tst)
  Benchmark.bm(12) do |bm|
    bm.report 'cutie' do
      cutie(tst)
    end
    bm.report 'heavy_lifter' do
      heavy_lifter(tst)
    end
  end
end

基准测试结果

tst = test(1_000_000, 400_000, 600_000, 800_000) 

cutie(tst).size
  #=> 81929
cutie(tst).sort == heavy_lifter(tst).size
  #=> true

bench(tst)
                   user     system      total        real
cutie          1.610000   0.030000   1.640000 (  1.639736)
heavy_lifter   1.800000   0.020000   1.820000 (  1.824281)

sizes = (700_000..890_000).step(10_000).to_a
  #=> [700000, 710000, 720000, 730000, 740000,
  #    750000, 760000, 770000, 780000, 790000,
  #    800000, 810000, 820000, 830000, 840000,
  #    850000, 860000, 870000, 880000, 890000]

tst = test(1_000_000, *sizes) 

bench(tst)
                   user     system      total        real
cutie         14.090000   0.440000  14.530000 ( 14.679101)
heavy_lifter   5.830000   0.030000   5.860000 (  5.935438)

- Cary Swoveland

说实话，我在寻找cutie，因为我甚至无法想象如何创建heavy_lifter。现在我可能不得不花费整个一天的时间来尝试弄清楚heavy_lifter在做什么。 - Cosi

我也会选择 cutie，但我一直对漂亮的外表情有独钟。我添加了一些解释性注释。 - Cary Swoveland

你应该向Matz展示你的算法！...他可能会考虑更改#reduce的实现.. :) - Arup Rakshit

@Arup，在进行交集之前必须有相当数量的大数组才有意义，而 & 只适用于两个数组。哦，等等，你说的是 reduce 而不是 &。我得考虑一下。 - Cary Swoveland

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Arup Rakshit · Accepted Answer

使用#reduce方法。

arr1 = [1,2,3,4,5]
arr2 = [5,6,7,8]
arr3 = [5]
bigarr = [arr1,arr2,arr3]
bigarr.reduce(:&) # => [5]