两个列表中的第一个共同元素

Question

两个列表中的第一个共同元素

15

x = [8,2,3,4,5]
y = [6,3,7,2,1]

如何以简洁优雅的方式找到两个列表中的第一个共同元素（在此例中为“2”）？任何一个列表都可以为空，或者没有共同的元素-这种情况下使用None即可。

我需要向一个刚接触Python的人展示这个，所以越简单越好。

更新：对于我的目的来说，顺序并不重要，但假设我正在寻找x中也出现在y中的第一个元素。

- georg

2

“2”不是第一个共同点，“1”才是。 - volcano

3

现在，“教学”也算一个标签了吗？ - Martijn Pieters

1

@volcano 为什么不呢？x，y是完全有效的数学表达式，对于迭代器和类似迭代器的变量，i，j，k也是一样。此外，在处理几何图形时，a，b，c，d是参数方程中属性的常见名称。 - Vyktor

3

@Vyktor 别忘了在迭代时提到 k,v，并且在 x,y 中使用 z。此外，n 通常代表任何自然数。 - user2032433

4

我不确定这个问题是否有很清晰的定义。"第一个共同元素"是否总是由在"x"列表中的位置来确定？答案可能是3吗，因为它在"y"列表中出现得更早（或者因为它在两个列表中的索引之和最小）？ - Blckknght

显示剩余8条评论

10个回答

10

这应该很简单，~~并且几乎是最有效的解决方案之一~~（如需更有效的解决方案，请查看Ashwini Chaudharys answer，对于最有效的解决方案，请查看jamylaks answer和评论）：

result = None
# Go trough one array
for i in x:

    # The element repeats in the other list...
    if i in y:

        # Store the result and break the loop
        result = i
        break

更加优雅的做法是将相同的功能封装到函数中_{使用类似PEP 8的编码风格规范}：

def get_first_common_element(x,y):
    ''' Fetches first element from x that is common for both lists
        or return None if no such an element is found.
    '''
    for i in x:
        if i in y:
            return i

    # In case no common element found, you could trigger Exception
    # Or if no common element is _valid_ and common state of your application
    # you could simply return None and test return value
    # raise Exception('No common element found')
    return None

如果您想获取所有常见元素，可以简单地按照以下方式进行：

>>> [i for i in x if i in y]
[1, 2, 3]

- Vyktor

任何列表都可以为空，或者没有共同元素 - 在这种情况下，使用None是可以的。我认为你不应该引发异常，只需让函数返回None即可。此外，您最后三行的缩进有误。 - user2032433

@MarkusMeskanen 我列举了多种方法来解决这个问题...我更喜欢引发异常而不是返回 None（当你编写每天需要处理成千上万个文件的大型脚本时，很容易忘记检查某些错误状态... 对我来说 最好让脚本崩溃，这样你就会得到完整的通知并处理它）...当然，这不是终极规则。如果他想让用户输入相关列表，那么当没有公共元素被找到时，实际上是异常。如果它是应用程序的常见且有效状态，则返回 None。 - Vyktor

1

@jamylak 谢谢，你已经说服了我，我修改了答案。但是我不相信检查成员存在性是 O(1)（除非你使用了某种索引），我认为它只可能在 O(log(N)) 的时间内完成。 - Vyktor

2

@Vyktor 你能否请阅读一下哈希表是什么，并停止争论一个完全错误的观点... - jamylak

1

@Vyktor 只需要在网上搜索，维基百科或其他地方，你不必成为专家，只需理解概念即可。顺便说一句，我在我的答案中也发布了时间，供其他不相信我的人参考！ - jamylak

显示剩余11条评论

6

使用next从生成器中获取第一个项的一行代码:

x = [8,2,3,4,5]
y = [6,3,7,2,1]

first = next((a for a in x if a in y), None)

或者更高效地使用set.__contains__，因为它比list.__contains__更快：

set_y = set(y)
first = next((a for a in x if a in set_y), None)

更高效但仍在一行中（不要这样做）：

first = next((lambda set_y: a for a in x if a in set_y)(set(y)), None)

- Eric

1

时间复杂度可以通过仅使用“集合(set)”来大大提高，与解释“next”和生成器相比，这并不复杂。 - Thijs van Dien

1

是的，这就是在Python中的做法（假设你知道它 :))。 - georg

+1 但你应该将 set_y = set(y)，然后它就完美了。 - jamylak

@jamylak：next((a for a in x if a in set(y)), None) 这段代码会导致集合每次都被构建吗？ - Eric

1

你的第二个版本：next(ifilter(set(y).__contains__, x), None)。（不要在家尝试，因为你不应该使用双下划线方法） - jamylak

显示剩余2条评论

3

使用带有in的for循环将导致O(N^2)的复杂度，但您可以在这里对y进行排序并使用二分查找来改善时间复杂度为O(NlogN)。

def binary_search(lis,num):
    low=0
    high=len(lis)-1
    ret=-1  #return -1 if item is not found
    while low<=high:
        mid=(low+high)//2
        if num<lis[mid]:
            high=mid-1
        elif num>lis[mid]:
            low=mid+1
        else:
            ret=mid
            break

    return ret

x = [8,2,3,4,5]
y = [6,3,7,2,1]
y.sort()

for z in x:
    ind=binary_search(y,z)
    if ind!=-1
        print z
        break

输出：2

使用bisect模块执行与上述相同的操作：

import bisect

x = [8,2,3,4,5]
y = [6,3,7,2,1]
y.sort()

for z in x:
    ind=bisect.bisect(y,z)-1  #or use `ind=min(bisect.bisect_left(y, z), len(y) - 1)`
    if ind!=-1 and y[ind] ==z:
        print z      #prints 2
        break

- Ashwini Chaudhary

然而，对于新手来说并不是很友好 ;) - tyteen4a03

哈哈，+1，喜欢你的回答...（并从我的回答中删除了关于有效性的注释）...但我担心它不太适合新手...但仍然绝对值得提及，供学术目的使用。;) - Vyktor

1

@GP89 二分查找返回索引，而 for z in x: if z in y 实际上等同于两个 for 循环。 - Ashwini Chaudhary

1

@Vyktor BS非常有名，你可以在上面找到大量的教程。 Youtube - 什么是二分查找, 维基百科 - 二分查找算法 - Ashwini Chaudhary

1

@AshwiniChaudhary 谢谢:D 顺便说一下，你知道有一个bisect模块，它非常被忽视，但它可以在一行中完成你想要的操作。 - jamylak

显示剩余15条评论

3

我假设您想教这个人Python，而不仅仅是编程。因此，我毫不犹豫地使用zip代替丑陋的循环变量；它是Python中非常有用的一部分，而且不难解释。

def first_common(x, y):
    common = set(x) & set(y)
    for current_x, current_y in zip(x, y):
        if current_x in common:
            return current_x
        elif current_y in common:
            return current_y

print first_common([8,2,3,4,5], [6,3,7,2,1])

如果你真的不想使用zip，那么无需使用它的方法如下：

def first_common2(x, y):
    common = set(x) & set(y)
    for i in xrange(min(len(x), len(y))):
        if x[i] in common:
            return x[i]
        elif y[i] in common:
            return y[i]

对于那些感兴趣的人，以下是如何将其扩展到任意数量的序列：

def first_common3(*seqs):
    common = set.intersection(*[set(seq) for seq in seqs])
    for current_elements in zip(*seqs):
        for element in current_elements:
            if element in common:
                return element

最后请注意，与其他一些解决方案不同的是，如果第一个公共元素首先出现在第二个列表中，这也同样适用。

我刚刚注意到你的更新，这使得解决方案变得更加简单：

def first_common4(x, y):
    ys = set(y) # We don't want this to be recreated for each element in x
    for element in x:
        if element in ys:
            return element

上述内容可以说比生成器表达式更易读。

可惜Python中没有内置的有序集合。这使得解决方案不够优雅。

- Thijs van Dien

2

izip_longest() 可以被更常见的 zip() 替代：当最短序列用尽时，没有必要继续在最长序列中搜索，因为任何共同元素都已经在最短列表中找到。 - Eric O. Lebigot

这可能仍然是最好的答案。谢谢！ - blakev

1

使用for循环似乎最容易向新手解释。

for number1 in x:
    for number2 in y:
        if number1 == number2:
            print number1, number2
            print x.index(number1), y.index(number2)
            exit(0)
print "No common numbers found."

NB 没有测试，只是我脑海中的想法。

- Pythonidae

1

实际上，第二个循环是不必要的 - 请参见顶部答案。 - georg

对于功能来说，这并不是必需的，我认为它可以在教学的同时帮助展示程序的工作原理。但是，如果用于高效的程序中，我同意这会太长了。 - Pythonidae

1

这个使用集合。它返回第一个共同元素，如果没有共同元素则返回None。

def findcommon(x,y):
    common = None
    for i in range(0,max(len(x),len(y))):
        common = set(x[0:i]).intersection(set(y[0:i]))
        if common: break
    return list(common)[0] if common else None

- jujaro

1

def first_common_element(x,y):
    common = set(x).intersection(set(y))
    if common:
        return x[min([x.index(i)for i in common])]

- JJJW

1

仅供娱乐（可能不太高效），另一种使用itertools的版本：

from itertools import dropwhile, product
from operator import __ne__

def accept_pair(f):
    "Make a version of f that takes a pair instead of 2 arguments."
    def accepting_pair(pair):
        return f(*pair)
    return accepting_pair

def get_first_common(x, y):
    try:
        # I think this *_ unpacking syntax works only in Python 3
        ((first_common, _), *_) = dropwhile(
            accept_pair(__ne__),
            product(x, y))
    except ValueError:
        return None
    return first_common

x = [8, 2, 3, 4, 5]
y = [6, 3, 7, 2, 1]
print(get_first_common(x, y))  # 2
y = [6, 7, 1]
print(get_first_common(x, y))  # None

使用lambda pair: pair[0] != pair[1]比accept_pair(__ne__)更简单，但不够有趣。

- bli

0

使用 set - 这是任意数量列表的通用解决方案：

def first_common(*lsts):
    common = reduce(lambda c, l: c & set(l), lsts[1:], set(lsts[0]))
    if not common:
        return None
    firsts = [min(lst.index(el) for el in common) for lst in lsts]
    index_in_list = min(firsts)
    trgt_lst_index = firsts.index(index_in_list)
    return lsts[trgt_lst_index][index_in_list]

一个事后的想法 - 不是一种有效的解决方案，但可以减少冗余开销。

def first_common(*lsts):
    common = reduce(lambda c, l: c & set(l), lsts[1:], set(lsts[0]))
    if not common:
        return None
    for lsts_slice in itertools.izip_longest(*lsts):
        slice_intersection = common.intersection(lsts_slice)
        if slice_intersection:
            return slice_intersection.pop()

- volcano

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jamylak · Accepted Answer

使用排序并不是完成此操作的最快方法，通过使用一个集合（哈希映射），可以在O(N)时间内完成。

>>> x = [8,2,3,4,5]
>>> y = [6,3,7,2,1]
>>> set_y = set(y)
>>> next((a for a in x if a in set_y), None)
2

或者:

next(ifilter(set(y).__contains__, x), None)

这是它的功能：

>>> def foo(x, y):
        seen = set(y)
        for item in x:
            if item in seen:
                return item
        else:
            return None


>>> foo(x, y)
2

为了展示不同方法之间的时间差异（朴素方法，二分查找和集合），以下是一些计时结果。我必须这样做来证明二分查找比许多人认为的更快...

from itertools import ifilter
from bisect import bisect_left

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

def common_set(x, y, ifilter=ifilter, set=set, next=next):
    return next(ifilter(set(y).__contains__, x), None)
    pass

def common_b_sort(x, y, bisect=bisect_left, sorted=sorted, min=min, len=len):
    sorted_y = sorted(y)
    for a in x:
        if a == sorted_y[min(bisect_left(sorted_y, a),len(sorted_y)-1)]:
            return a
    else:
        return None

def common_naive(x, y):
    for a in x:
        for b in y:
            if a == b: return a
    else:
        return None

from timeit import timeit
from itertools import repeat
import threading, thread

print 'running tests - time limit of 20 seconds'

for x, y in [('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'h'), ('i', 'j')]:
    for func in ('common_set', 'common_b_sort', 'common_naive'):        
        try:
            timer = threading.Timer(20, thread.interrupt_main)   # 20 second time limit
            timer.start()
            res = timeit(stmt="print '[', {0}({1}, {2}), ".format(func, x, y),
                         setup='from __main__ import common_set, common_b_sort, common_naive, {0}, {1}'.format(x, y),
                         number=1)
        except:
            res = "Too long!!"
        finally:
            print '] Function: {0}, {1}, {2}. Time: {3}'.format(func, x, y, res)
            timer.cancel()

测试数据为：

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

结果：

running tests - time limit of 20 seconds
[ 9 ] Function: common_set, a, b. Time: 0.00569520707241
[ 9 ] Function: common_b_sort, a, b. Time: 0.0182240340602
[ 9 ] Function: common_naive, a, b. Time: 0.00978832505249
[ 7 ] Function: common_set, c, d. Time: 0.249175872911
[ 7 ] Function: common_b_sort, c, d. Time: 1.86735751332
[ 7 ] Function: common_naive, c, d. Time: 0.264309220865
[ 40000 ] Function: common_set, e, f. Time: 0.00966861710078
[ 40000 ] Function: common_b_sort, e, f. Time: 0.0505980508696
[ ] Function: common_naive, e, f. Time: Too long!!
[ None ] Function: common_set, g, h. Time: 1.11300018578
[ None ] Function: common_b_sort, g, h. Time: 14.9472068377
[ ] Function: common_naive, g, h. Time: Too long!!
[ 5411743 ] Function: common_set, i, j. Time: 1.88894859542
[ 5411743 ] Function: common_b_sort, i, j. Time: 6.28617268396
[ 5411743 ] Function: common_naive, i, j. Time: 1.11231867458

这能让你了解它在处理更大的输入时会如何缩放，O(N)比O(N log N)和O(N^2)更快。