什么是在Mathematica中四舍五入后匹配列表条目的最佳方法？

Question

什么是在Mathematica中四舍五入后匹配列表条目的最佳方法？

5

我在Mathematica中有两个列表：

list1 = {{a1, b1, c1}, ... , {an, bn, cn}}

并且

list2 = {{d1, e1, f1}, ... , {dn, en, fn}}

这些列表包含数值结果，每个列表大约包含50000个三元组。每个三元组表示两个坐标和这些坐标上某个属性的数值。每个列表的长度不同，而且坐标范围也不完全相同。我的意图是将每个列表中第三个属性的数值进行相关性分析，因此我需要扫描这些列表并确定坐标匹配的属性。我的输出结果将类似于：

list3 = {{ci, fj}, ... , {cl, fm}}

在哪里

{ai, bi}, ..., {al, bl}

将会（大致）相等，分别为

{dj, ej}, ..., {dm, em}

“粗略地”指的是一旦四舍五入到某个期望的精度，坐标将会匹配：

list1(2) = Round[{#[[1]], #[[2]], #[[3]]}, {1000, 500, 0.1}] & /@ list1(2)

经过这个过程后，我将拥有两个包含一些匹配坐标的列表。我的问题是如何以最佳方式执行识别并挑选属性对的操作？

一个6元素列表的示例如下：

list1 = {{-1.16371*10^6, 548315., 14903.}, {-1.16371*10^6, 548322., 14903.9}, 
   {-1.16371*10^6, 548330., 14904.2}, {-1.16371*10^6, 548337., 14904.8}, 
   {-1.16371*10^6, 548345., 14905.5}, {-1.16371*10^6, 548352., 14911.5}}

- gpap

2

你能否提供list1和list2的小数字示例以及预期输出list3？同时，请定义“大致”。 - user616736

2个回答

4

这里是我的方法，使用 Nearest 来匹配点。

假设 list1 的元素不少于 list2。（否则你可以使用 {list1, list2} = {list2, list1} 交换它们）

(* extract points *)

points1=list1[[All,{1,2}]];
points2=list2[[All,{1,2}]];

(* build a "nearest-function" for matching them *)

nf=Nearest[points1]

(* two points match only if they're closer than threshold *)
threshold=100;

(* This function will find the match of a point from points2 in points1.  
   If there's no match, the point is discarded using Sequence[]. *)
match[point_]:= 
   With[{m=First@nf[point]}, 
       If[Norm[m-point]<threshold, {m,point}, Unevaluated@Sequence[]]
   ]

(* find matching point-pairs *)
matches=match/@points1;

(* build hash tables to retrieve the properties associated with points quickly *)
Clear[values1,values2]
Set[values1[{#1,#2}],#3]&@@@list1;
Set[values2[{#1,#2}],#3]&@@@list2;

(* get the property-pairs *)
{values1[#1],values2[#2]}&@@@matches

一个替代方案是在nearest中使用自定义的DistanceFunction来避免使用values1和values2，并缩短程序。这可能会更快或更慢，我没有完全用大数据测试过。

注意：实现需要多复杂实际上取决于您的特定数据集。第一个集合中的每个点是否都有与第二个集合中的点匹配？是否存在任何重复项？同一数据集中的点可以有多近？等等。我试图提供一些可以被调整为相对健壮的东西，以长代码为代价。

- Szabolcs

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mr.Wizard · Accepted Answer

您可能想使用这样的东西：

{Round[{#, #2}], #3} & @@@ Join[list1, list2];

% ~GatherBy~ First ~Select~ (Length@# > 1 &)

这将分组所有在四舍五入后具有匹配坐标的数据点。您可以使用第二个参数来指定要四舍五入的小数位数。

这假设单个列表中没有重复的点。如果有，请删除它们以获得有用的对。如果是这种情况，请告诉我，我会更新我的答案。

这是另一种使用Sow和Reap的方法。同样的警告适用。这两个示例只是实现功能的指南。

Reap[
  Sow[#3, {Round[{#, #2}]}] & @@@ Join[list1, list2],
  _,
  List
][[2]] ~Cases~ {_, {_, __}}

为了处理每个列表中的重复元素，您可以按照以下方式在每个列表上使用Round和GatherBy。

newList1 = GatherBy[{Round[{#, #2}], #3} & @@@ list1, First][[All, 1]];

newList2 = GatherBy[{Round[{#, #2}], #3} & @@@ list2, First][[All, 1]];

然后继续进行：

newList1 ~Join~ newList2 ~GatherBy~ First ~Select~ (Length@# > 1 &)