将正交网格拟合到带有噪声的坐标上

Question

将正交网格拟合到带有噪声的坐标上

pythonnumpyimage-processingscipylinear-algebra

3

问题

表格可以被旋转（更新）。

Grid Problem

应用

目标

使用纯列表的 Python 代码

# Noisy [x, y] coordinates (origin is upper-left corner)
pts = [[103,101],
       [198,103],
       [300, 99],
       [ 97,205],
       [304,202],
       [102,295],
       [200,303],
       [104,405],
       [205,394],
       [298,401]]

def row_col_avgs(num_list, ratio):
    # Finds the average of each row and column. Coordinates are
    # assigned to a row and column by specifying an error ratio.
    last_num, sum_nums, count_nums, avgs = 0, 0, 0, []
    num_list.sort()
    for num in num_list:
        # Calculate average for last row or column and begin new row or column
        if num > (1+ratio)*last_num and count_nums != 0:
            avgs.append(int(round(sum_nums/count_nums,0)))
            sum_nums = num
            count_nums = 1
        # Or continue with current row or column
        else:
            sum_nums += num
            count_nums += 1
        last_num = num
    avgs.append(int(round(sum_nums/count_nums,0)))
    return avgs

# Split coordinates into two lists of x's and y's
xs, ys = map(list, zip(*pts))

# Find averages of each row and column of the grid
x_avgs = row_col_avgs(xs, 0.1)
y_avgs = row_col_avgs(ys, 0.1)

# Return vertices of completed averaged grid
avg_grid = []
for y_avg in y_avgs:
    avg_row = []
    for x_avg in x_avgs:
        avg_row.append([int(x_avg), int(y_avg)])
    avg_grid.append(avg_row)

print(avg_grid)

输出

[[[102, 101], [201, 101], [301, 101]], 
 [[102, 204], [201, 204], [301, 204]], 
 [[102, 299], [201, 299], [301, 299]], 
 [[102, 400], [201, 400], [301, 400]]]

- Jakub

你能详细解释一下你的聚类平均算法是如何工作的吗？此外，通过使用numpy，这段代码肯定可以更加高效。例如，你现在所做的一切都是使用列表，而你可以很容易地使用numpy。例如，xs, ys = map(list, zip(*xys)) 在numpy中可以很容易地写成 xs, ys = xys[:,0], xys[:, 1]。也许你可以自己试试看？ - undefined

PyWalker，我的cluster_avgs函数循环遍历一个排序好的数字列表。当一个数字与它前面的数字的比值大于1.1时，这就是当前聚类的结束，并且计算该聚类的平均值。我会在使用Numpy时采纳你的建议。谢谢。 - undefined

3个回答

1

如果你在垂直或水平轴上投影所有点，则问题变成了等间距聚类问题。

要执行这些聚类，您可以考虑连续（排序）点之间的距离。它们将形成两个簇：短距离对应于噪声，较长距离为网格大小。您可以使用Otsu方法解决双向聚类问题。

- user1196549

我不确定如何实现双向聚类来找到网格。我需要更多信息。听起来比根据距离分割点更昂贵。我正在使用SciPy cKDTree来查找行和列之间的距离。我正在使用平行斜率OLS模型处理噪声。我已经将我使用的内容作为对我的原始帖子的答案发布了。我在OpenCV中使用了Otsu的阈值处理，但并非直接用于数据聚类。我需要进一步研究一下。谢谢。此外，行和列不一定是等间距的，如果这有关系的话。 - undefined

我忘了添加一点，就是在我的应用中，网格点可以相对于坐标轴进行旋转，这对于投射点来说可能会成为一个问题。 - undefined

@Jakub: 嗷嗷…… - user1196549

@Ives Daoust：grrrr没有给我太多信息。我冒犯到你了吗？实际上，我花了很多时间研究大津法，因为我尊重你的建议。 - undefined

我其实浪费了时间描述这个方法，因为它并不适用于旋转的情况。 - user1196549

显示剩余2条评论

0

下面是您代码的numpy实现。由于已知AvgGrid的大小，我预先分配所需的内存（而不是追加）。这应该具有速度优势，特别是如果输出顶点的数量很大。

import numpy as np

# Input of [x, y] coordinates of a sparse grid with errors
xys = np.array([[103,101],
       [198,103],
       [300, 99],
       [ 97,205],
       [304,202],
       [102,295],
       [200,303],
       [104,405],
       [205,394],
       [298,401]])

# Function to average
def ColAvgs(CoordinateList, CutoffRatio = 1.1):

    # Length of CoordinateList
    L = len(CoordinateList)

    # Sort input
    SortedList = np.sort(CoordinateList)

    # Determine indices to average
    RelativeIncrease = SortedList[-(L-1):]/SortedList[:(L-1)]
    CriticalIndices = np.flatnonzero(RelativeIncrease > CutoffRatio) + 1
    Indices = np.hstack((0,CriticalIndices))
    if (Indices[-1] != L):
        Indices = np.hstack((Indices,L))
    #print(Indices)     # Uncomment to show index construction

    # Compute averages
    Avgs = np.empty((len(Indices)-1)); Avgs[:] = np.NaN
    for iter in range(len(Avgs)):
        Avgs[iter] = int( round(np.mean(SortedList[Indices[iter]:Indices[(iter+1)]]) ) )

    # Return output
    return Avgs

# Compute x- and y-coordinates of vertices
AvgsXcoord = ColAvgs(xys[:,0])
AvgsYcoord = ColAvgs(xys[:,1])

# Return all vertices
AvgGrid = np.empty((len(AvgsXcoord)*len(AvgsYcoord),2)); AvgGrid[:] = np.NaN
iter = 0
for y in AvgsYcoord:
    for x in AvgsXcoord:
        AvgGrid[iter, :] = np.hstack((x,y))
        iter = iter+1
print(AvgGrid)

- Hanno Reuvers

感谢您的numpy实现。我编写了新的代码，以并行斜率OLS模型替代使用行和列平均值。它可以处理旋转的网格。它同时对水平和垂直网格线进行单一最小二乘法计算。我已将其发布为答案。欢迎提出任何加速改进的建议。 - undefined

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jakub · Accepted Answer

平行斜率普通最小二乘（OLS）模型：
y = mx + grp + b，其中m=斜率，b=y截距，grp=分类变量。

这是一种可以处理旋转网格的替代算法。

OLS模型包括原始方向上的数据点和同一数据点的90°旋转。这是必要的，以使所有网格线都平行且具有相同的斜率。

算法：

选择第一行或最后一行中斜率最接近零的两个相邻点作为参考网格线，用于与其余点进行比较。
计算此参考线与其余点之间的距离。
根据计算出的距离将点分成组（每个网格线一个组）。
针对90度旋转的网格重复步骤1至3，并合并结果。
创建平行斜率OLS模型，以确定网格线的线性方程。
将旋转后的网格线旋转回其原始方向。
计算交点。

注意：如果噪声、角度和/或缺失数据过多，则会失败。

示例：

Python代码创建示例

def create_random_example():
    # Requires import of numpy and random packages
    # Creates grid with random noise and missing points
    # Example will fail if std_dev, rotation, pct_removed too large
    
    # Parameters
    first_row, last_row = 100, 900
    first_col, last_col = 100, 600
    num_rows = 6
    num_cols = 4
    rotation = 3 # degrees that grid is rotated
    sd = 3 # percent std dev of avg x and avg y coordinates
    pct_remove = 30 # percent of points to randomly remove from data
    
    # Create grid
    x = np.linspace(first_col, last_col, num_cols)
    y = np.linspace(first_row, last_row, num_rows)
    xx, yy = np.meshgrid(x, y)
    
    # Add noise
    x = xx.flatten() + sd * np.mean(xx) * np.random.randn(xx.size) / 100
    y = yy.flatten() + sd * np.mean(yy) * np.random.randn(yy.size) / 100
    
    # Randomly remove points
    random_list = random.sample(range(0, num_cols*num_rows), 
                          int(pct_remove*num_cols*num_rows/100))
    x, y = np.delete(x, random_list), np.delete(y, random_list)
    
    pts = np.column_stack((x, y))
    
    # Rotate points
    radians = np.radians(rotation)
    rot_mat = np.array([[np.cos(radians),-np.sin(radians)],
                        [np.sin(radians), np.cos(radians)]])
    einsum = np.einsum('ji, mni -> jmn', rot_mat, [pts])
    pts = np.squeeze(einsum).T
    
    return np.rint(pts)

Python代码用于配合网格线

import numpy as np
import pandas as pd
import itertools
import math
import random
from statsmodels.formula.api import ols
from scipy.spatial import KDTree
import matplotlib.pyplot as plt

def pt_line_dist(pt, ref_line):
    pt1, pt2 = [ref_line[:2], ref_line[2:]]
    # Distance from point to line defined by two other points
    return np.linalg.norm(np.cross(pt1 - pt2, [pt[0],pt[1]])) \
         / np.linalg.norm(pt1 - pt2)

def segment_pts(amts, grp_var, grp_label):
    # Segment on amounts (distances here) in last column of array
    # Note: need to label groups with string for OLS model
    amts = amts[amts[:, -1].argsort()]
    first_amt_in_grp = amts[0][-1]
    group, groups, grp = [], [], 0
    for amt in amts:
        if amt[-1] - first_amt_in_grp > grp_var:
            groups.append(group)
            first_amt_in_grp = amt[-1]
            group = []; grp += 1
        group.append(np.append(amt[:-1],[[grp_label + str(grp)]]))
    groups.append(group)
    return groups

def find_reference_line(pts):
    # Find point with minimum absolute slope relative both min y and max y
    y = np.hsplit(pts, 2)[1] # y column of array
    m = []
    for i, y_pt in enumerate([ pts[np.argmin(y)], pts[np.argmax(y)] ]):
        m.append(np.zeros((pts.shape[0]-1, 5))) # dtype default is float64
        m[i][:,2:4] = np.delete(pts, np.where((pts==y_pt).all(axis=1))[0], axis=0)
        m[i][:,4] = abs( (m[i][:,3]-y_pt[1]) / (m[i][:,2]-y_pt[0]) )
        m[i][:,:2] = y_pt
    m = np.vstack((m[0], m[1]))
    return m[np.argmin(m[:,4]), :4]

# Ignore division by zero (slopes of vertical lines)
np.seterr(divide='ignore')

# Create dataset and plot
pts = create_random_example()
plt.scatter(pts[:,0], pts[:,1], c='r') # plot now because pts array changes

# Average distance to the nearest neighbor of each point
tree = KDTree(pts)
nn_avg_dist = np.mean(tree.query(pts, 2)[0][:, 1])

# Find groups of points representing each gridline
groups = []
for orientation in ['o', 'r']: #  original and rotated orientations
    
    # Rotate points 90 degrees (note: this moves pts to 2nd quadrant)
    if orientation == 'r':
        pts[:,1] = -1 * pts[:,1]
        pts[:, [1, 0]] = pts[:, [0, 1]]
    
    # Find reference line to compare remaining points for grouping
    ref_line = find_reference_line(pts) # line is defined by two points
    
    # Distances between points and reference line
    pt_dists = np.zeros((pts.shape[0], 3))
    pt_dists[:,:2] = pts
    pt_dists[:,2] = np.apply_along_axis(pt_line_dist, 1, pts, ref_line).T
    
    # Segment pts into groups w.r.t. distances (one group per gridline)
    # Groups have range less than nn_avg_dist.
    groups += segment_pts(pt_dists, 0.7*nn_avg_dist, orientation)

# Create dataframe of groups (OLS model requires a dataframe)
df = pd.DataFrame(np.row_stack(groups), columns=['x', 'y', 'grp'])
df['x'] = pd.to_numeric(df['x'])
df['y'] = pd.to_numeric(df['y'])

# Parallel slopes OLS model
ols_model = ols("y ~ x + grp + 0", data=df).fit()

# OLS parameters
grid_lines = ols_model.params[:-1].to_frame() # panda series to dataframe
grid_lines = grid_lines.rename(columns = {0:'b'})
grid_lines['grp'] = grid_lines.index.str[4:6]
grid_lines['m'] = ols_model.params[-1] # slope

# Rotate the rotated lines back to their original orientation
grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'b'] = grid_lines['b'] / grid_lines['m']
grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'm'] = -1 / grid_lines['m']

# Find grid intersection points by combinations of gridlines
comb = list(itertools.combinations(grid_lines['grp'], 2))
comb = [i for i in comb if i[0][0] != 'r']
comb = [i for i in comb if i[1][0] != 'o']
df_comb = pd.DataFrame(comb, columns=['grp', 'r_grp'])

# Merge gridline parameters with grid points
grid_pts = df_comb.merge(grid_lines.drop_duplicates('grp'),how='left',on='grp')
grid_lines.rename(columns={'grp': 'r_grp'}, inplace=True)
grid_pts.rename(columns={'b':'o_b', 'm': 'o_m', 'grp':'o_grp'}, inplace=True)
grid_pts = grid_pts.merge(grid_lines.drop_duplicates('r_grp'),how='left',on='r_grp')
grid_pts.rename(columns={'b':'r_b', 'm': 'r_m'}, inplace=True)

# Calculate x, y coordinates of gridline interception points
grid_pts['x'] = (grid_pts['r_b']-grid_pts['o_b']) \
              / (grid_pts['o_m']-grid_pts['r_m'])
grid_pts['y'] = grid_pts['o_m'] * grid_pts['x'] + grid_pts['o_b']

# Results output
print(grid_lines)
print(grid_pts)

plt.scatter(grid_pts['x'], grid_pts['y'], s=8, c='b') # for setting axes

axes = plt.gca()
axes.invert_yaxis()
axes.xaxis.tick_top()
axes.set_aspect('equal')
axes.set_xlim(axes.get_xlim())
axes.set_ylim(axes.get_ylim())

x_vals = np.array(axes.get_xlim())
for idx in grid_lines.index:
    y_vals = grid_lines['b'][idx] + grid_lines['m'][idx] * x_vals
    plt.plot(x_vals, y_vals, c='gray')

plt.show()