寻找与另一点距离在一定范围内的所有点的算法

Question

寻找与另一点距离在一定范围内的所有点的算法

algorithmvoronoi

4

我曾经在一次工作面试中遇到了这个问题，但我没有通过这个测试。出于对公司的尊重，我不会透露具体问题。

假设你在一个 A x B 的公园里有 N 个人。如果一个人周围50英尺内没有其他人，他就可以享受自己的私人空间。否则，他的私人空间就会被侵犯。给定一组 (x, y) 坐标，有多少人的空间会被侵犯？

例如，在 Python 中给出以下列表：

people = [(0,0), (1,1), (1000, 1000)]

我们将找到2个人的空间被侵犯：1, 2。

我们不需要找到所有人的组合，只需要找到唯一的人数。

不能使用暴力方法解决该问题。换句话说，不能使用简单的数组嵌套。

我已经花费几周时间一直在解决这个问题，虽然我已经得到了比n^2更快的解决方案，但还没有想出一个可扩展的解决方案。

我认为解决这个问题的唯一正确方式是使用 Fortune 算法？

以下是我的 Python 代码（未使用 Fortune 算法）：

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability
TEST = True

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

def _get_distance(x1, y1, x2, y2):
    """
    require: x1, y1, x2, y2: all integers
    return: a distance as a float
    """
    distance = math.sqrt(math.pow((x1 - x2), 2) + math.pow((y1 - y2),2))
    return distance

def check_real_distance(people1, people2, conflict_radius):
    """
    determine if two people are too close

    """
    if people2[1] - people1[1] > conflict_radius:
        return False
    d = _get_distance(people1[0], people1[1], people2[0], people2[1])
    if d >= conflict_radius:
        return False
    return True

def check_for_conflicts(peoples, conflict_radius):
    # sort  people
    def sort_func1(the_tuple):
        return the_tuple[0]
    _peoples = []
    index = 0
    for people in peoples:
        _peoples.append((people[0], people[1], index))
        index += 1
    peoples = _peoples
    peoples = sorted(peoples, key = sort_func1)
    conflicts_dict = {}
    i = 0
    # use a type of sweep strategy
    while i < len(peoples) - 1:
        x_len = peoples[i + 1][0] - peoples[i][0]
        conflict = False
        conflicts_list =[peoples[i]]
        j = i + 1
        while x_len <= conflict_radius and j < len(peoples):
            x_len = peoples[j][0] - peoples[i][0]
            conflict = check_real_distance(peoples[i], peoples[j], conflict_radius)
            if conflict:
                people1 = peoples[i][2]
                people2 = peoples[j][2]
                conflicts_dict[people1] = True
                conflicts_dict[people2] = True
            j += 1
        i += 1
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

- user3520634

这个问题不能通过三角不等式简化吗？如果|ab| + |bc| < 50ft，则|ac| < 50ft，同样地：如果|ac| - |bc| > 50ft，则|ab| > 50ft。 - m69 ''snarky and unwelcoming''

1

这里有一些阅读材料：空间划分，最近邻搜索，舍入和哈希。 - user3386109

1

此外，在检查距离时，最好不要进行平方根运算。计算平方根是耗时的，并且容易出现舍入误差。因此，与其检查 sqrt(dx*dx + dy*dy) >= 50，不如检查 dx*dx + dy*dy >= 2500。后者可以使用整数运算来完成，并给出精确结果（没有舍入误差的可能性）。 - user3386109

像Fortune算法这样的线性扫描方法是一个好主意。你不喜欢它的哪些方面？ - Matt Timmermans

1

我相信答案涉及到kd树和范围搜索。 - aichao

显示剩余3条评论

5个回答

0

您可以查看 this topcoder 链接和“Closest pair”部分。您可以修改最近点对算法，使距离h始终为50。

所以，您基本上要做的是：

按X坐标对人进行排序
从左到右扫描。
保持平衡的二叉树，并将所有在50半径内的点放入二叉树中。二叉树的键将是点的Y坐标
选择Y-50和Y+50的点，这可以在lg(n)时间内使用二叉树完成。
因此，总体复杂度变为nlg(n)
一定要标记您找到的点，以便将来跳过这些点。

您可以在C++中使用set作为二叉树。但我找不到Python set是否支持范围查询或upper_bound和lower_bound。如果有人知道，请在评论中指出。

- Tamim Addari

0

这是我对这个有趣问题的解决方案：

from math import sqrt
import math
import random


class Person():

    def __init__(self, x, y, conflict_radius=500):
        self.position = [x, y]
        self.valid = True
        self.radius = conflict_radius**2

    def validate_people(self, people):
        P0 = self.position

        for p in reversed(people):
            P1 = p.position
            dx = P1[0] - P0[0]
            dy = P1[1] - P0[1]
            dx2 = (dx * dx)

            if dx2 > self.radius:
                break

            dy2 = (dy * dy)
            d = dx2 + dy2

            if d <= self.radius:
                self.valid = False
                p.valid = False

    def __str__(self):
        p = self.position
        return "{0}:{1} - {2}".format(p[0], p[1], self.valid)


class Park():

    def __init__(self, num_people=10000, park_size=128000):
        random.seed(1)
        self.num_people = num_people
        self.park_size = park_size

    def gen_coord(self):
        return int(random.random() * self.park_size)

    def generate(self):
        return [[self.gen_coord(), self.gen_coord()] for i in range(self.num_people)]


def naive_solution(data):
    sorted_data = sorted(data, key=lambda x: x[0])
    len_sorted_data = len(sorted_data)
    result = []

    for index, pos in enumerate(sorted_data):
        print "{0}/{1} - {2}".format(index, len_sorted_data, len(result))
        p = Person(pos[0], pos[1])
        p.validate_people(result)
        result.append(p)

    return result

if __name__ == '__main__':
    people_positions = Park().generate()

    with_conflicts = len(filter(lambda x: x.valid, naive_solution(people_positions)))
    without_conflicts = len(filter(lambda x: not x.valid, naive_solution(people_positions)))
    print("people with conflicts: {}".format(with_conflicts))
    print("people without conflicts: {}".format(without_conflicts))

我相信这段代码仍然可以进一步优化。

- BPL

是的，我确定它可以——这是一个经典的 n^2 解决方案。 - Mark Ransom

你确定这段代码是正确的吗？使用简单的暴力方法（n^2），我得到了不同的答案。 - user3520634

@user3520634 嗯，我昨晚只用了10分钟编写了这个东西，所以我不能确定你的答案是哪些参数和解决方案。今天我会重新审查它。 - BPL

@MarkRansom 欢迎提出建议，我会在今天进行审核。 - BPL

@user3520634 我已经编辑了我的代码，展示了有冲突的人和没有冲突的人，我在代码中没有发现任何错误，请告诉我你的解决方案是什么。 - BPL

0

我找到了一个相对简单的解决方案。将坐标列表按X值排序，然后逐个查看每个X值。向右扫描，检查下一个位置的位置，直到扫描区域的末尾（500米）或发现冲突为止。

如果没有发现冲突，则以相同方式向左扫描。这种方法避免了不必要的检查。例如，如果公园里有100万人，则所有人都会发生冲突。算法只会检查每个人一次：一旦发现冲突，搜索就停止了。

我的时间似乎是O（N）。

以下是代码：

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
        and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)


def check_for_conflicts(peoples, conflict_radius):
    peoples.sort(key = lambda x: x[0])
    conflicts_dict = {}
    i = 0
    num_checks = 0
    # use a type of sweep strategy
    while i < len(peoples)  :
        conflict = False
        j = i + 1
        #sweep right
        while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j += 1
        j = i - 1
        #sweep left
        while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j -= 1
        i += 1
    print("num checks is {0}".format(num_checks))
    print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

- user3520634

0

我想出了一个看起来需要O(N)时间的答案。策略是按X值对数组进行排序。对于每个X值，向左扫描直到发现冲突或距离超过冲突距离（500米）。如果没有发现冲突，则以相同方式向左扫描。使用这种技术，可以限制搜索量。

以下是代码：

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
        and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)


def check_for_conflicts(peoples, conflict_radius):
    peoples.sort(key = lambda x: x[0])
    conflicts_dict = {}
    i = 0
    num_checks = 0
    # use a type of sweep strategy
    while i < len(peoples)  :
        conflict = False
        j = i + 1
        #sweep right
        while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j += 1
        j = i - 1
        #sweep left
        while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j -= 1
        i += 1
    print("num checks is {0}".format(num_checks))
    print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

- user3520634

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- samgak · Accepted Answer

从评论中可以看出，有很多解决这个问题的方法。在面试中，你可能想列举尽可能多的方法，并说出每种方法的优缺点。

对于所述问题，当半径固定时，最简单的方法可能是四舍五入和哈希。k-d树等强大的数据结构，但它们也相当复杂，如果您不需要重复查询它们或添加和删除对象，则可能会过度使用它们。哈希可以实现线性时间，而空间树则为n log n，尽管这可能取决于点的分布。

要理解哈希和四舍五入，只需将其视为将空间划分为一个正方形网格，每个正方形的边长都等于您要检查的半径。每个正方形都被赋予自己的“邮政编码”，您可以使用它作为散列键来存储该正方形中的值。您可以通过将x和y坐标除以半径并向下舍入来计算点的邮政编码，如下所示：

def get_zip_code(x, y, radius):
    return str(int(math.floor(x/radius))) + "_" + str(int(math.floor(y/radius)))

我使用字符串是因为它很简单，但只要您为每个正方形生成唯一的邮政编码，您可以使用任何内容。

创建一个字典，其中键是邮政编码，值是该邮政编码中所有人的列表。为了检查冲突，请逐个添加人员，并在添加每个人之前，测试与同一邮政编码中的所有人以及邮政编码的8个邻居的冲突。我重用了您用于跟踪冲突的方法：

def check_for_conflicts(peoples, conflict_radius):

    index = 0
    d = {}
    conflicts_dict = {}
    for person in peoples:  

        # check for conflicts with people in this person's zip code
        # and neighbouring zip codes:
        for offset_x in range(-1, 2):
            for offset_y in range(-1, 2):
                 offset_zip_code = get_zip_code(person[0] + (offset_x * conflict_radius), person[1] + (offset_y * conflict_radius), conflict_radius)

                 if offset_zip_code in d:
                     # get a list of people in this zip:
                     other_people = d[offset_zip_code]
                     # check for conflicts with each of them:
                     for other_person in other_people:
                         conflict = check_real_distance(person, other_person, conflict_radius)
                         if conflict:
                             people1 = index
                             people2 = other_person[2]
                             conflicts_dict[people1] = True
                             conflicts_dict[people2] = True

        # add the new person to their zip code
        zip_code = get_zip_code(person[0], person[1], conflict_radius)
        if not zip_code in d:
            d[zip_code] = []
        d[zip_code].append([person[0], person[1], index])
        index += 1

    return len(conflicts_dict.keys())

这取决于一些东西的时间复杂度。如果你增加了人数，但没有增加分配空间的大小，则它将是O(N²)，因为冲突数量将呈二次增长，而且您必须全部计算。但是，如果您随着人数的增加增加了空间大小，使得密度相同，则它会更接近O(N)。

如果你只是统计独立的人数，你可以在每个邮政编码中保持至少有一个冲突的人数计数。如果等于邮政编码中的所有人数，则可以在检查给定邮政编码的冲突时，通过与新人的第一个冲突后就退出循环，因为不会再找到任何唯一的人。您还可以循环两次，在第一次循环中添加所有人员，并在第二次测试，对于每个人找到第一个冲突时退出循环。