快速确定列表中所有非支配项的方法（更快）

Question

快速确定列表中所有非支配项的方法（更快）

3

我有一个包含n个长度为4的数组列表，例如(n=2):

l = [[1, 2, 3, 4], [5, 6, 7, 8]]

我正在尝试查找列表中的所有“非支配”元素-也就是它们不被列表中的任何其他元素所支配。如果一个数组中的每个项都小于或等于另一个数组中对应的项，那么该数组就支配另一个数组。

dominates([1, 2, 3, 4], [5, 6, 7, 8]) == True

作为 1 <= 5 and 2 <= 6 and 3 <= 7 and 4 <= 8 ，但

dominates([1, 2, 3, 9], [5, 6, 7, 8]) == False

作为 9 > 8。这个函数相对容易编写，例如：

def dominates(a, b):
    return all(i <= j for i, j in zip(a, b))

更简洁地说，给定 l = [a1, a2, a3, .., an]，其中 a 是长度为4的数组，我想找到所有没有被其他 a in l 支配的 a。

我的解决方案如下：

def get_non_dominated(l):
    to_remove = set()
    for ind, item_1 in enumerate(l):
        if item_2 in to_remove:
            continue
        for item_2 in l[ind + 1:]:
            if dominates(item_2, item_1):
                to_remove.add(item_1)
                break
            elif dominates(item_1, item_2):
                to_remove.add(item_2)
    return [i for i in l if i not in to_remove]

所以get_non_dominated([[1, 2, 3, 4], [5, 6, 7, 8]])应该返回[[1, 2, 3, 4]]。同样地，get_non_dominated([[1, 2, 3, 9], [5, 6, 7, 8]])应该按照上述逻辑不变地返回该列表（没有任何元素占优）。

但这个检查会发生很多次，而且l可能非常大。我在想是否有办法加速这个过程？我的第一个想法是尝试使用numpy对这段代码进行向量化，但我对此的经验相对较少，有些困难。您可以假设l包含所有唯一的数组。非常感谢您提供任何想法！

- user8415803

我不清楚你想要实现什么。请更新你的问题，包括带有示例输入和输出的工作代码。 - MB-F

我添加了一些额外的细节，希望能够更清楚地解释。 - user8415803

已更新答案以反映您想要的过滤（恰好是我展示的示例）。 - Imanol Luengo

2个回答

1

怎么样：

import numpy as np
np.all((np.asarry(l[1])-np.asarry(l[0]))>=0)

如果您能够直接将列表创建为numpy数组，即type(l) == np.ndarray，那么可以采用类似的方法。然后语法将是：

np.all(p[1])-p[0])>=0)

- Nyps

np.all 应该更快。 - Imanol Luengo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Imanol Luengo · Accepted Answer

@Nyps 回答的另一个版本：

def dominates(a, b):
    return (np.asarray(a) <= b).all()

使用numpy对代码进行向量化处理。

如果您需要遍历所有行，则此方法可能仍然较慢。如果您有一个包含所有行的列表，并且想要逐对比较它们，可以使用scipy创建一个N x N数组（其中N是行数）。

import numpy as np
a = np.random.randint(0, 10, size=(1000, 10))

a 在这里是一个 1000 x 10 的数组，模拟了 1000 行、每行 10 个元素的情况。

from scipy.spatial.distance import cdist
X = cdist(a, a, metric=dominates).astype(np.bool)

X 是一个 1000 x 1000 的矩阵，包含所有条目之间的成对比较。也就是说，X[i, j] 包含了样本 i 是否优于样本 j 的信息，如果是则为 True，否则为 False。

现在您可以从 X 中提取出一些高级结果，例如支配所有样本的样本：

>>> a[50] = 0 # set a row to all 0s to fake a dominant row
>>> X = cdist(a, a, metric=dominates).astype(np.bool)
>>> non_dominated = np.where(X.all(axis=1))[0]
>>> non_dominated
array([50])

如果你的人口样本中位置50是统治者，那么你应该密切关注它。

现在，如果你只想保留主导地位，可以这样做：

if non_dominated.size > 0:
    return [a[i] for i in non_dominated]
else: # no one dominates every other
    return a

作为简要回顾：

import numpy as np
from scipy.spatial.distance import cdist

def get_ruler(a):
    X = cdist(a, a, metric=dominates).astype(np.bool)
    rulers = np.where(X.all(axis=1))[0]
    if rulers.size > 0:
        return [a[i] for i in rulers]
    else: # no one dominates every other
        return a