如何从数组的哈希表中删除部分元素？

Question

如何从数组的哈希表中删除部分元素？

3

我有一个哈希表：

{"path_1" => [1,2,3], "path_2" => [1,4,5], "path_3" => [1,2,3,4]}

我希望删除哈希表中所有“部分路径”。所以需要删除path_1，因为它是path_3的部分；[1,2,3]是[1,2,3,4]的不完整数组。所有“部分路径”都需要从此哈希表中删除。

这是我目前的代码，它可以运行，但处理大型哈希表时速度较慢：

# hash sorted by length of value
hash_array = {"path_1" => [1,2,3], "path_2" => [1,4,5], "path_3" => [1,2,3,4]}
# make a separate copy of the hash
cloned_hash_array = hash_array.clone

hash_array.each {|path_index, path|
  # delete this path from the cloned hash so it doesn't match itself
  cloned_hash_array.delete(path_index)

  cloned_hash_array.each{|cloned_path_index, cloned_path|
    if cloned_path[0,path.length] == path.clone
      hash_array.delete(path_index)
    end
  }
}

- Artem Kalinchuk

1

这只是一种惯例，但通常多行代码块使用 do ... end 而不是 { ... }，就像你在 each 中所做的那样。 - Andrew Marshall

同意，多行的 {...} 代码块看起来很奇怪 :-) - Sergio Tulentsev

这些数组的顺序重要吗？还是它们更适合作为集合？如果它们是集合，你可以利用Set#proper_subset。 - DGM

@AndrewMarshall，不想引发争论，但是...虽然我同意这更为常见，但我认为它并不总是有益的。我认为很多取决于以前的语言经验。此外，一些编辑器中的语法高亮比do ... end更适合使用{...}。 - SimonMayer

@SimonMayer 没有正确或错误的方式，我只是说通常被接受的惯例。至于语法高亮...使用一个更好的编辑器 ;) （它可以根据是否与def/class或do配对来不同地突出显示end）。 - Andrew Marshall

3个回答

1

这取决于你想要多快以及你有多少元素。你可以尝试像这样的东西（看起来很疯狂，但确实非常快）：

scatter = 
  lambda { |tree, path, name|
    if path.empty?
      tree[:tag] = name
      tree[:path] unless tree.has_key?(:path)
    else
      head, *tail = path
      unless tree[:path].has_key?(head)
        tree[:path][head] = {:path => {}}
      end
      scatter[tree[:path][head], tail, name]
    end
  }

gather = 
  lambda { |tree|
    if tree[:path].empty?
      [[tree[:tag], []]]
    else
      tree[:path].map { |k, v|
        gather[v].map do |tag, path|
          [tag, [k] + path]
        end
      }.flatten(1)
    end
  }

scatter_gather =
  lambda { |paths|
    tree = {:path => {}}
    paths.each do |tag, path|
      scatter[tree, path, tag]
    end
    Hash[gather[tree]]
  }

scatter_gather["path_1" => [1,2,3], "path_2" => [1,4,5], "path_3" => [1,2,3,4]]
#=> {"path_2"=>[1, 4, 5], "path_3"=>[1, 2, 3, 4]}

- Victor Moroz

维克多，你真是个天才。虽然其他方法也行，但这种方法最快，返回正确的结果。在109秒内从我的哈希表中运行。A+ - Artem Kalinchuk

0

这个怎么样？

hash_array = {"path_1" => [1,2,3], "path_2" => [1,4,5], "path_3" => [1,2,3,4]}
cloned_hash_array = hash_array.clone

cloned_hash_array.each do |key, value|

  hash_array.each do |key2, value2|
    if key != key2 and value.length <= value2.length
      if value.all? { |i| value2.include?(i) }
        cloned_hash_array.delete(key)
        break
      end
    end
  end
end

puts cloned_hash_array.inspect

- Andreas Helgegren

这个更快吗？它也有两个循环，所以我不知道它有多快。 - Artem Kalinchuk

你在if语句中使用了path.clone，这可能会拖慢程序的速度。尝试使用我的答案，看看是否更快 =) - Andreas Helgegren

.clone 不会拖慢速度，主要是循环比较慢，我已经进行了基准测试 ;) - Artem Kalinchuk

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sergio Tulentsev · Accepted Answer

你可以尝试这个，应该会快一点（没有双重循环）。

h = {"path_1" => [1,2,3], "path_2" => [1,4,5], "path_3" => [1,2,3,4]}

h2 = {}

a = h.sort{|l, r| r[1] <=> l[1]}
puts a.inspect
# => [["path_2", [1, 4, 5]], ["path_3", [1, 2, 3, 4]], ["path_1", [1, 2, 3]]]

last_path = nil
a.each do |key, path|
  # now all paths are sorted in descending order. 
  # if a current path is a prefix for last_path, then discard it.
  # otherwise, write it to a result and start comparing next ones against it.
  if !last_path || last_path[0, path.length] != path
    h2[key] = path
    last_path = path
  end
end

puts h2.inspect
# => {"path_2"=>[1, 4, 5], "path_3"=>[1, 2, 3, 4]}