在一个序列中返回重复项

Question

在一个序列中返回重复项

12

我能想到的最好的方法是：

(defn dups [seq]
  (map (fn [[id freq]] id) 
       (filter (fn [[id freq]] (> freq 1))
               (frequencies seq))))

有更简洁的方式吗？

- Joe Snikeris

我喜欢你的解决方案，但会简单地用键函数替换(fn [[id freq]] id)。 - Julien Chastang

5个回答

16

(map key (remove (comp #{1} val) 
                 (frequencies seq)))

- amalloy

你能解释一下 (comp #{1} val) 是做什么的吗？谢谢。 - Julien Chastang

(comp #{1} val) 基本上意味着 (fn [x] (#{1} (val x))) - 它基本上测试参数的值是否为 1（如果它包含数字 1 的集合中）。这里的 val 是频率对中的计数。 - Joost Diepenmaat

5

如果您想根据列表中项目的某个属性（例如，它是映射或记录/Java对象的列表）查找重复项。

(defn dups-with-function
  [seq f]
  (->> seq
       (group-by f)
       ; filter out map entries where its value has only 1 item 
       (remove #(= 1 (count (val %))))))

(let [seq [{:attribute    :one
            :other-things :bob}
           {:attribute    :one
            :other-things :smith}
           {:attribute    :two
            :other-things :blah}]]
  (dups-with-function seq :attribute))

输出：

 ([:one
   [{:attribute :one, :other-things :bob}
    {:attribute :one, :other-things :smith}]])

如果您有一个Java对象列表，并想查找所有具有重复名字的对象：

(dups-with-function my-list #(.getFirstName %))

- Joe Pinsonault

1

感谢您为我的问题“在一系列映射中返回关键字x的重复项”添加答案；-) - leontalbot

2

以下是一行代码，可以实现最小的过滤和频率处理：

(filter #(< 1 ((frequencies col) %)) col)

然而它在大量数据上表现不佳。您需要通过以下方式帮助编译器：

(let [ freqs (frequencies col) ]
  (filter #(< 1 (freqs %)) col))

- Ulrik Algulin

请问您能否详细解释一下这如何/为什么有助于编译器呢？ - Jack Westmore

1

通过将频率计算放在词法上下文的let子句中，您可以强制首先进行一次评估，而不是针对集合中的每个项目进行评估（您可能希望编译器能够检测并避免这种情况）。 - Ulrik Algulin

1

some 是这种情况下完美的函数。

(defn dups [coll]
  (some (fn [[k v]] (when (< 1 v) k))
    (frequencies coll)))

然而，它基本上与列表推导式执行相同的操作。

- theloeschzwerg

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Matt Fenwick · Accepted Answer

使用列表推导式:

(defn dups [seq]
  (for [[id freq] (frequencies seq)  ;; get the frequencies, destructure
        :when (> freq 1)]            ;; this is the filter condition
   id))                              ;; just need the id, not the frequency