给定N个二维坐标系中的点,每个点有x和y坐标,找到一个点P(在给定的N个点中)使得其他(N-1)个点到P的距离之和最小。
这个点通常被称为几何中位数。除了朴素的 O(N^2) 方法外,是否存在有效的算法来解决这个问题?
我曾经用模拟退火算法为本地在线评测机解决了一个类似的问题。这也是官方的解决方案,程序得到了AC。
唯一的区别是我需要找到的点不必是给定的N
个点中的一部分。
以下是我的C++代码,而N
可以达到50000
。该程序在2ghz pentium 4上执行时间为0.1s
。
// header files for IO functions and math
#include <cstdio>
#include <cmath>
// the maximul value n can take
const int maxn = 50001;
// given a point (x, y) on a grid, we can find its left/right/up/down neighbors
// by using these constants: (x + dx[0], y + dy[0]) = upper neighbor etc.
const int dx[] = {-1, 0, 1, 0};
const int dy[] = {0, 1, 0, -1};
// controls the precision - this should give you an answer accurate to 3 decimals
const double eps = 0.001;
// input and output files
FILE *in = fopen("adapost2.in","r"), *out = fopen("adapost2.out","w");
// stores a point in 2d space
struct punct
{
double x, y;
};
// how many points are in the input file
int n;
// stores the points in the input file
punct a[maxn];
// stores the answer to the question
double x, y;
// finds the sum of (euclidean) distances from each input point to (x, y)
double dist(double x, double y)
{
double ret = 0;
for ( int i = 1; i <= n; ++i )
{
double dx = a[i].x - x;
double dy = a[i].y - y;
ret += sqrt(dx*dx + dy*dy); // classical distance formula
}
return ret;
}
// reads the input
void read()
{
fscanf(in, "%d", &n); // read n from the first
// read n points next, one on each line
for ( int i = 1; i <= n; ++i )
fscanf(in, "%lf %lf", &a[i].x, &a[i].y), // reads a point
x += a[i].x,
y += a[i].y; // we add the x and y at first, because we will start by approximating the answer as the center of gravity
// divide by the number of points (n) to get the center of gravity
x /= n;
y /= n;
}
// implements the solving algorithm
void go()
{
// start by finding the sum of distances to the center of gravity
double d = dist(x, y);
// our step value, chosen by experimentation
double step = 100.0;
// done is used to keep track of updates: if none of the neighbors of the current
// point that are *step* steps away improve the solution, then *step* is too big
// and we need to look closer to the current point, so we must half *step*.
int done = 0;
// while we still need a more precise answer
while ( step > eps )
{
done = 0;
for ( int i = 0; i < 4; ++i )
{
// check the neighbors in all 4 directions.
double nx = (double)x + step*dx[i];
double ny = (double)y + step*dy[i];
// find the sum of distances to each neighbor
double t = dist(nx, ny);
// if a neighbor offers a better sum of distances
if ( t < d )
{
update the current minimum
d = t;
x = nx;
y = ny;
// an improvement has been made, so
// don't half step in the next iteration, because we might need
// to jump the same amount again
done = 1;
break;
}
}
// half the step size, because no update has been made, so we might have
// jumped too much, and now we need to head back some.
if ( !done )
step /= 2;
}
}
int main()
{
read();
go();
// print the answer with 4 decimal points
fprintf(out, "%.4lf %.4lf\n", x, y);
return 0;
}
(x, y)
最接近的那个是正确的选择。step
,则被视为相邻点。如果更好,则可以放弃当前点,因为正如我所说,由于您正在尝试最小化的函数的性质,这不会将您困在局部最小值中。eps
常量控制)。li = 到 xi 左侧所有元素的距离之和 = (xi-x1) + (xi-x2) + .... + (xi-xi-1) , 而
sli = 到 xi 左侧所有元素的距离平方和 = (xi-x1)^2 + (xi-x2)^2 + .... + (xi-xi-1)^2
注意,给定 li 和 sli,我们可以按照如下方式在 O(1)
时间内计算出 li+1 和 sli+1:
令 d = xi+1-xi。则:
li+1 = li + id and sli+1 = sli + id^2 + 2*i*d
因此,我们可以通过从左到右扫描来线性计算所有的li和sli。类似地,对于每个元素,我们可以在线性时间内计算出ri:到所有右侧元素的距离之和和sri:到所有右侧元素的距离平方和。将每个i的sri和sli相加,以线性时间计算所有元素的水平距离平方和。同样地,计算所有元素的垂直距离平方和。然后,我们可以扫描原始点数组,并像以前一样找到最小化垂直和水平距离平方和的点。
0, 0, a, a+b, a+b+c
。使得到其他点距离之和最小的点是在a
处。但是,如果2c > b + 4a
,则使得到其他点距离平方和最小的点是在a+b
处。 - krjampani如前所述,使用的算法类型取决于你测量距离的方式。由于你的问题没有指定这种测量方式,因此以下是曼哈顿距离和欧几里得距离的平方的C语言实现。对于二维点,请使用dim = 2
。复杂度为O(n log n)
。
曼哈顿距离
double * geometric_median_with_manhattan(double **points, int N, int dim) {
for (d = 0; d < dim; d++) {
qsort(points, N, sizeof(double *), compare);
double S = 0;
for (int i = 0; i < N; i++) {
double v = points[i][d];
points[i][dim] += (2 * i - N) * v - 2 * S;
S += v;
}
}
return min(points, N, dim);
}
#include <stdio.h>
#include <stdlib.h>
int d = 0;
int compare(const void *a, const void *b) {
return (*(double **)a)[d] - (*(double **)b)[d];
}
double * min(double **points, int N, int dim) {
double *min = points[0];
for (int i = 0; i < N; i++) {
if (min[dim] > points[i][dim]) {
min = points[i];
}
}
return min;
}
int main(int argc, const char * argv[])
{
// example 2D coordinates with an additional 0 value
double a[][3] = {{1.0, 1.0, 0.0}, {3.0, 1.0, 0.0}, {3.0, 2.0, 0.0}, {0.0, 5.0, 0.0}};
double *b[] = {a[0], a[1], a[2], a[3]};
double *min = geometric_median_with_manhattan(b, 4, 2);
printf("geometric median at {%.1f, %.1f}\n", min[0], min[1]);
return 0;
}
平方欧几里得距离
double * geometric_median_with_square(double **points, int N, int dim) {
for (d = 0; d < dim; d++) {
qsort(points, N, sizeof(double *), compare);
double T = 0;
for (int i = 0; i < N; i++) {
T += points[i][d];
}
for (int i = 0; i < N; i++) {
double v = points[i][d];
points[i][dim] += v * (N * v - 2 * T);
}
}
return min(points, N, dim);
}
简短解释:与之前的方法基本相同,但是推导略微复杂。假设TT = v_0^2 + .. + v_(N-1)^2
,我们得到TT + N * v_i^2 - 2 * v_i^2 * T
。再次将TT添加到所有内容中,以便省略它。如需更多说明,请提出要求。
xLDist[0] := 0
for i := 1 to n - 1
xLDist[i] := xLDist[i-1] + ( ( p[i].x - p[i-1].x ) * i)
xRDist[n - 1] := 0
for i := n - 2 to 0
xRDist[i] := xRDist[i+1] + ( ( p[i+1].x - p[i].x ) * i)
for i := 0 to n - 1
p[i].xDist = xLDist[i] + xRDist[i]
您可以将问题解决为凸规划(目标函数并不总是凸的)。可以使用迭代方法(如L-BFGS)来解决凸规划。每次迭代的成本为O(N),通常所需的迭代次数不多。减少所需迭代次数的一个重要点是我们知道最优答案是输入中的一个点。因此,当优化结果接近于输入点之一时,可以停止优化。
#include <bits/stdc++.h>
using namespace std;
int main()
{
int n;
cin >> n;
int a[n],b[n];
for(int i=0;i<n;i++)
cin >> a[i] >> b[i];
int res = 0;
sort(a,a+n);
sort(b,b+n);
int m1 = a[n/2];
int m2 = b[n/2];
for(int i=0;i<n;i++)
res += abs(m1 - a[i]);
for(int i=0;i<n;i++)
res += abs(m2 - b[i]);
cout << res << '\n';
}
[(-L,0), (L,0)]*25 + [(0,1), (0,2), (0,3)]
,其中 L 很大,你会选择(0,1)
而不是(0,2)
。” - Nabb