两个列表的重叠百分比

Question

两个列表的重叠百分比

18

这更像是一个数学问题而不是其他什么。假设我在Python中有两个大小不同的列表。

listA = ["Alice", "Bob", "Joe"]
listB = ["Joe", "Bob", "Alice", "Ken"]

我想找出这两个列表重叠的百分比。列表内部顺序不重要。找出重叠的部分很容易，我已经看到其他帖子上如何做到这一点，但是我无法将其扩展到找出重叠的百分比上。如果我以不同的顺序比较列表，结果会有所不同吗？最好的方法是什么？

- OneManRiot

顺序在这里并不太重要，但是首先需要定义百分比的公式，可以是像这样的东西：2*匹配数/(lista长度+len(listb)长度)或其他。 - ZdaR

1

如果列表是 [1,1,1] 和 [1]，重叠部分是100%还是33%？ - tobias_k

这两个列表的预期输出是什么？ - Ofiris

你关心的是两个列表中不同元素的百分比（这种情况下 set() 非常有用），还是包括重复元素在内的所有元素的百分比？ - pcurry

4个回答

9

>>> len(set(listA)&set(listB)) / float(len(set(listA) | set(listB))) * 100
75.0

我会计算总不同项目中的常见项目。 len(set(listA)&set(listB))返回常见项（在您的示例中为3）。 len(set(listA) | set(listB))返回总不同项目的数量（4）。

乘以100，即可得到百分比。

- Ofiris

请注意，此答案和@JuniorCompressor的答案不同，两者都是正确的，但取决于具体要求。 - Ofiris

8

当两个列表具有完全不同的元素时，最大差异就会出现。因此，我们最多有n + m个离散元素，其中n是第一个列表的大小，m是第二个列表的大小。一种度量方法可以是：

2 * c / (n + m)

其中c是共同元素的数量。可以按以下百分比计算：

200.0 * len(set(listA) & set(listB)) / (len(listA) + len(listB))

- JuniorCompressor

2

以下示例无法通过测试： listA = ["Alice", "Alice"] listB = ["Alice", "Alice"] - Shubham Saini

1

def computeOverlap(L1, L2):
    d1, d2 = {}, {}
    for e in L1:
        if e not in d1:
            d1[e] = 1
        d1[e] += 1

    for e in L2:
        if e not in d2:
            d2[e] = 0
        d2[e] += 1

    o1, o2 = 0, 0
    for k in d1:
        o1 += min(d1[k], d2.get(k,0))
    for k in d2:
        o2 += min(d1.get(k,0), d2[k])

    print((100*o1) if o1 else 0 "% of the first list overlaps with the second list")
    print((100*o2) if o2 else 0 "% of the second list overlaps with the first list")

当然，您可以使用defaultdict和counter来完成此操作，以使事情变得更加容易：

from collections import defaultdict, Counter

def computeOverlap(L1, L2):
    d1 = defaultdict(int, Counter(L1))
    d2 = defaultdict(int, Counter(L2))

    o1, o2 = 0, 0
    for k in d1:
        o1 += min(d1[k], d2[k])
    for k in d2:
        o2 += min(d1[k,0], d2[k])

    print((100*o1) if o1 else 0 "% of the first list overlaps with the second list")
    print((100*o2) if o2 else 0 "% of the second list overlaps with the first list")

- inspectorG4dget

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- geckon · Accepted Answer

从校长的角度来看，我认为你可能会问两个明智的问题：

与第一个列表相比，重叠部分占多少百分比？即与第一个列表相比，共同部分有多大？
对于第二个列表，同样的情况。
如果与“宇宙”（即两个列表的并集）相比，重叠部分占多少百分比？

当然也可以找到其他含义，而且可能有很多。总之，你应该知道自己要解决的问题。

从编程角度来看，解决方案很容易：

listA = ["Alice", "Bob", "Joe"]
listB = ["Joe", "Bob", "Alice", "Ken"]

setA = set(listA)
setB = set(listB)

overlap = setA & setB
universe = setA | setB

result1 = float(len(overlap)) / len(setA) * 100
result2 = float(len(overlap)) / len(setB) * 100
result3 = float(len(overlap)) / len(universe) * 100