我有一组字符串列表和一个字符串列表,例如:
L1=[["cat","dog","apple"],["orange","green","red"]]
L2=["cat","red"]
如果 L1[i] 包含 L2 中的任何一个项目,我需要将这些项目配对(用于在图中创建边),例如,在我的示例中,我需要以下配对:("cat","dog")、("cat,apple")、("red,orange")、("red","green")。
为了使效率最高,我应该使用什么方法?(我的列表 L1 很大)
我有一组字符串列表和一个字符串列表,例如:
L1=[["cat","dog","apple"],["orange","green","red"]]
L2=["cat","red"]
set()
}}和{{link2:itertools.product()
}}来完成这个任务:from itertools import product
def generate_edges(iterable, control):
edges = []
control_set = set(control)
for e in iterable:
e_set = set(e)
common = e_set & control_set
to_pair = e_set - common
edges.extend(product(to_pair, common))
return edges
例子:
>>> L1 = [["cat","dog","apple"],
... ["orange","green","red"],
... ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
('dog', 'cat'),
('orange', 'red'),
('green', 'red'),
('hand', 'red'),
('hand', 'cat')]
set
,并使用集合操作(交集)来确定L2中的哪些术语在每个L1项中。然后,您可以使用集合减法来获取需要配对的项目列表。edges = []
L2set = set(L2)
for L1item in L1:
L1set = set(L1item)
items_in_L1item = L1set & L2set
for item in items_in_L1item:
items_to_pair = L1set - set([item])
edges.extend((item, i) for i in items_to_pair)
L1
和 L2
很大的情况下也能够实现最优化,可以使用 izip
生成器,而非创建一个包含大量元组的列表。如果你正在使用 Python3,则直接使用 zip
即可。from itertools import izip
pairs = []
for my_list, elem in izip(L1, L2):
if elem in my_list:
pairs += [(elem, e) for e in my_list if e!=elem]
print pairs
这段代码非常易懂,几乎是纯英语!首先,您正在循环遍历每个列表及其对应的元素,然后询问该元素是否在列表中,如果是,则打印除了(x, x)这一对之外的所有对。
输出:
[('cat', 'dog'), ('cat', 'apple'), ('red', 'orange'), ('red', 'green')]
如果L1非常大,您可能希望考虑使用bisect。它要求您先将L1压平并排序。您可以执行以下操作:
from bisect import bisect_left, bisect_right
from itertools import chain
L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]
M1 = [[i]*len(j) for i, j in enumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]
for item in L2:
s = bisect_left(L1flat, item)
e = bisect_right(L1flat, item)
print item, M1[s:e]
#apple [0, 1]
#cat [0]
#red [1]