Delphi中N个数组的交集

3
为了找出N个数组的交集,我有以下实现方式,但效率极低。我知道一定有算法能够加速这个过程。
注意:myarray是包含所有需要找到交集的其他数组的数组。
var
i, j, k: integer;
myarray: Array of Array of integer;
intersection: array of integer;

for I := 0 to length(myarray)-1 do
  begin
    for J := 0 to length(myarray)-1 do
    begin
      if i = j then
        continue;
      for k := 0 to length(myarray[i])-1 do
      begin
        if myarray[i][j] = myarray[j][k] then
        begin
          setLength(intersection, length(intersection)+1);
          intersection[length(intersection)-1] := myarray[j][k];
        end;
      end;
    end;
  end;

我该如何优化以提高速度?有更快的方法吗?
编辑:数组中的数据未排序。

1
我不明白这段代码一开始就是有效的。你在同一个表达式中使用 j 来索引外部数组和内部数组。只有当所有整数数组的长度都保证与你拥有的整数数组数量相同时,才是有效的。此外,这些数组的内容是否按可预测和一致的顺序排列(排序)? - Rob Kennedy
许多优化取决于数组数据的性质。例如,如果它们已排序,则可以用二分查找替换线性查找。 - Kenneth Cochran
我刚刚查看了代码,意识到我的实现方式行不通。事实上,我还没有编译它。 - Daisetsu
3
大拙,你犯了一个大错!如果你没有掌握正确性,那么即使全世界最快速度也毫无意义。 - Rob Kennedy
3个回答

10
有一种更快的方法:列表比较算法。它允许你在线性时间而不是平方时间内比较两个列表。基本思路如下:
  1. 按相同标准对两个列表进行排序(如果需要保留原始顺序,请先复制列表)。
  2. 从两个列表的顶部开始。选择每个列表中的第一个项目并进行比较。
  3. 如果它们匹配,处理该情况并将两个列表的索引都向前移动。
  4. 如果它们不匹配,循环遍历,每次将“较小”为值的列表的索引向前移动,直到找到匹配项。
  5. 当你到达其中一个列表的末尾时,你就完成了。(除非你想处理另一个列表中的任何剩余部分。)
这可以通过一些努力扩展以处理多于2个列表。

5
很遗憾,您还没有更新您的问题,所以我们仍然不清楚您的问题是什么。例如,您谈到了一个交集(应该搜索存在于每个数组中的值),但从(不起作用的)代码中看来,您只是在任何一个数组中搜索重复项。
尽管Mason's answer指向这些算法的明显通用解决方案,但我认为对于这样一个多维数组来说有些不同。我制定了两个程序来确定(1)交集和(2)重复项。两者都假定数组中的内容无序且长度不等。
首先,我决定引入一些新类型:
type
  PChain = ^TChain;
  TChain = array of Integer;
  TChains = array of TChain;

其次,这两个例程都需要一些排序机制。一个非常快速但不太规范的方法是使用/滥用 TList
function CompareInteger(Item1, Item2: Pointer): Integer;
begin
  Result := Integer(Item1) - Integer(Item2);
end;

procedure SortChain(var Chain: TChain);
var
  List: TList;
begin
  List := TList.Create;
  try
    List.Count := Length(Chain);
    Move(Chain[0], List.List[0], List.Count * SizeOf(Integer));
    List.Sort(CompareInteger);
    Move(List.List[0], Chain[0], List.Count * SizeOf(Integer));
  finally
    List.Free;
  end;
end;

但更好的实现是通过调整来自Classes.QuickSort的RTL代码来完成的,它与上面的代码完全相同,而不需要复制数组(两次):

procedure SortChain(Chain: PChain; L, R: Integer);
var
  I: Integer;
  J: Integer;
  Value: Integer;
  Temp: Integer;
begin
  repeat
    I := L;
    J := R;
    Value := Chain^[(L + R) shr 1];
    repeat
      while Chain^[I] < Value do
        Inc(I);
      while Chain^[J] > Value do
        Dec(J);
      if I <= J then
      begin
        Temp := Chain^[I];
        Chain^[I] := Chain^[J];
        Chain^[J] := Temp;
        Inc(I);
        Dec(J);
      end;
    until I > J;
    if L < J then
      SortChain(Chain, L, J);
    L := I;
  until I >= R;
end;

交集:

要获取所有数组的交集,只需将最短数组中的所有值与其他所有数组中的值进行比较即可。因为最短数组可能包含重复值,所以需要对该小数组进行排序以便忽略重复项。从那时起,只需找到(或者说没有找到)其他数组中的一个相同值即可。不需要对所有其他数组进行排序,因为在排序数组之前找到值的机会是50%。

function GetChainsIntersection(const Chains: TChains): TChain;
var
  IShortest: Integer;
  I: Integer;
  J: Integer;
  K: Integer;
  Value: Integer;
  Found: Boolean;
  FindCount: Integer;
begin
  // Determine which of the chains is the shortest
  IShortest := 0;
  for I := 1 to Length(Chains) - 1 do
    if Length(Chains[I]) < Length(Chains[IShortest]) then
      IShortest := I;
  // The length of result will at maximum be the length of the shortest chain
  SetLength(Result, Length(Chains[IShortest]));
  Value := 0;
  FindCount := 0;
  // Find for every value in the shortest chain...
  SortChain(@Chains[IShortest], 0, Length(Chains[IShortest]) - 1);
  for K := 0 to Length(Chains[IShortest]) - 1 do
  begin
    if (K > 0) and (Chains[IShortest, K] = Value) then
      Continue;
    Value := Chains[IShortest, K];
    Found := False;
    for I := 0 to Length(Chains) - 1 do
      if I <> IShortest then
      begin
        Found := False;
        for J := 0 to Length(Chains[I]) - 1 do
          // ... the same value in other chains
          if Chains[I, J] = Value then
          begin
            Found := True;
            Break;
          end;
        if not Found then
          Break;
      end;
    // Add a found value to the result
    if Found then
    begin
      Result[FindCount] := Value;
      Inc(FindCount);
    end;
  end;
  // Truncate the length of result to the actual number of found values
  SetLength(Result, FindCount);
end;

重复项:

这也不需要单独对所有数组进行排序。 所有值都被复制到一个一维临时数组中。 排序该数组后,很容易找到重复项。

function GetDuplicateShackles(const Chains: TChains): TChain;
var
  Count: Integer;
  I: Integer;
  Temp: TChain;
  PrevValue: Integer;
begin
  // Foresee no result
  SetLength(Result, 0);
  // Count the total number of values
  Count := 0;
  for I := 0 to Length(Chains) - 1 do
    Inc(Count, Length(Chains[I]));
  if Count > 0 then
  begin
    // Copy all values to a temporary chain...
    SetLength(Temp, Count);
    Count := 0;
    for I := 0 to Length(Chains) - 1 do
    begin
      Move(Chains[I][0], Temp[Count], Length(Chains[I]) * SizeOf(Integer));
      Inc(Count, Length(Chains[I]));
    end;
    // Sort the temporary chain
    SortChain(@Temp, 0, Count - 1);
    // Find all duplicate values in the temporary chain
    SetLength(Result, Count);
    Count := 0;
    PrevValue := Temp[0];
    for I := 1 to Length(Temp) - 1 do
    begin
      if (Temp[I] = PrevValue) and
        ((Count = 0) or (Temp[I] <> Result[Count - 1])) then
      begin
        Result[Count] := PrevValue;
        Inc(Count);
      end;
      PrevValue := Temp[I];
    end;
    SetLength(Result, Count);
  end;
end;

示例应用程序:

因为我喜欢测试所有的代码,所以只需要很少的工作就可以使其有些代表性。

unit Unit1;

interface

uses
  SysUtils, Classes, Controls, Forms, StdCtrls, Grids;

type
  PChain = ^TChain;
  TChain = array of Integer;
  TChains = array of TChain;

  TForm1 = class(TForm)
    Grid: TStringGrid;
    IntersectionFullButton: TButton;
    IntersectionPartialButton: TButton;
    DuplicatesFullButton: TButton;
    DuplicatesPartialButton: TButton;
    Memo: TMemo;
    procedure FormCreate(Sender: TObject);
    procedure IntersectionButtonClick(Sender: TObject);
    procedure DuplicatesButtonClick(Sender: TObject);
  private
    procedure ClearGrid;
    procedure ShowChains(const Chains: TChains);
    procedure ShowChain(const Chain: TChain; const Title: String);
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

const
  MaxDepth = 20;

procedure FillChains(var Chains: TChains; FillUp: Boolean; MaxValue: Integer);
var
  X: Integer;
  Y: Integer;
  Depth: Integer;
begin
  SetLength(Chains, MaxDepth);
  for X := 0 to MaxDepth - 1 do
  begin
    if FillUp then
      Depth := MaxDepth
    else
      Depth := Random(MaxDepth - 2) + 3; // Minimum depth = 3
    SetLength(Chains[X], Depth);
    for Y := 0 to Depth - 1 do
      Chains[X, Y] := Random(MaxValue);
  end;
end;

procedure SortChain(Chain: PChain; L, R: Integer);
var
  I: Integer;
  J: Integer;
  Value: Integer;
  Temp: Integer;
begin
  repeat
    I := L;
    J := R;
    Value := Chain^[(L + R) shr 1];
    repeat
      while Chain^[I] < Value do
        Inc(I);
      while Chain^[J] > Value do
        Dec(J);
      if I <= J then
      begin
        Temp := Chain^[I];
        Chain^[I] := Chain^[J];
        Chain^[J] := Temp;
        Inc(I);
        Dec(J);
      end;
    until I > J;
    if L < J then
      SortChain(Chain, L, J);
    L := I;
  until I >= R;
end;

function GetChainsIntersection(const Chains: TChains): TChain;
var
  IShortest: Integer;
  I: Integer;
  J: Integer;
  K: Integer;
  Value: Integer;
  Found: Boolean;
  FindCount: Integer;
begin
  IShortest := 0;
  for I := 1 to Length(Chains) - 1 do
    if Length(Chains[I]) < Length(Chains[IShortest]) then
      IShortest := I;
  SetLength(Result, Length(Chains[IShortest]));
  Value := 0;
  FindCount := 0;
  SortChain(@Chains[IShortest], 0, Length(Chains[IShortest]) - 1);
  for K := 0 to Length(Chains[IShortest]) - 1 do
  begin
    if (K > 0) and (Chains[IShortest, K] = Value) then
      Continue;
    Value := Chains[IShortest, K];
    Found := False;
    for I := 0 to Length(Chains) - 1 do
      if I <> IShortest then
      begin
        Found := False;
        for J := 0 to Length(Chains[I]) - 1 do
          if Chains[I, J] = Value then
          begin
            Found := True;
            Break;
          end;
        if not Found then
          Break;
      end;
    if Found then
    begin
      Result[FindCount] := Value;
      Inc(FindCount);
    end;
  end;
  SetLength(Result, FindCount);
end;

function GetDuplicateShackles(const Chains: TChains): TChain;
var
  Count: Integer;
  I: Integer;
  Temp: TChain;
  PrevValue: Integer;
begin
  SetLength(Result, 0);
  Count := 0;
  for I := 0 to Length(Chains) - 1 do
    Inc(Count, Length(Chains[I]));
  if Count > 0 then
  begin
    SetLength(Temp, Count);
    Count := 0;
    for I := 0 to Length(Chains) - 1 do
    begin
      Move(Chains[I][0], Temp[Count], Length(Chains[I]) * SizeOf(Integer));
      Inc(Count, Length(Chains[I]));
    end;
    SortChain(@Temp, 0, Count - 1);
    SetLength(Result, Count);
    Count := 0;
    PrevValue := Temp[0];
    for I := 1 to Length(Temp) - 1 do
    begin
      if (Temp[I] = PrevValue) and
        ((Count = 0) or (Temp[I] <> Result[Count - 1])) then
      begin
        Result[Count] := PrevValue;
        Inc(Count);
      end;
      PrevValue := Temp[I];
    end;
    SetLength(Result, Count);
  end;
end;

{ TForm1 }

procedure TForm1.FormCreate(Sender: TObject);
begin
  Grid.ColCount := MaxDepth;
  Grid.RowCount := MaxDepth;
end;

procedure TForm1.ClearGrid;
var
  I: Integer;
begin
  for I := 0 to Grid.ColCount - 1 do
    Grid.Cols[I].Text := '';
end;

procedure TForm1.ShowChains(const Chains: TChains);
var
  I: Integer;
  J: Integer;
begin
  for I := 0 to Length(Chains) - 1 do
    for J := 0 to Length(Chains[I]) - 1 do
      Grid.Cells[I, J] := IntToStr(Chains[I, J]);
end;

procedure TForm1.ShowChain(const Chain: TChain; const Title: String);
var
  I: Integer;
begin
  if Length(Chain) = 0 then
    Memo.Lines.Add('No ' + Title)
  else
  begin
    Memo.Lines.Add(Title + ':');
    for I := 0 to Length(Chain) - 1 do
      Memo.Lines.Add(IntToStr(Chain[I]));
  end;
end;

procedure TForm1.IntersectionButtonClick(Sender: TObject);
var
  FillUp: Boolean;
  Chains: TChains;
  Chain: TChain;
begin
  ClearGrid;
  Memo.Clear;
  FillUp := Sender = IntersectionFullButton;
  if FillUp then
    FillChains(Chains, True, 8)
  else
    FillChains(Chains, False, 4);
  ShowChains(Chains);
  Chain := GetChainsIntersection(Chains);
  ShowChain(Chain, 'Intersection');
end;

procedure TForm1.DuplicatesButtonClick(Sender: TObject);
var
  Chains: TChains;
  Chain: TChain;
begin
  ClearGrid;
  Memo.Clear;
  FillChains(Chains, Sender = DuplicatesFullButton, 900);
  ShowChains(Chains);
  Chain := GetDuplicateShackles(Chains);
  ShowChain(Chain, 'Duplicates');
end;

initialization
  Randomize;

end.

Unit1.DFM:

object Form1: TForm1
  Left = 343
  Top = 429
  Width = 822
  Height = 459
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  DesignSize = (
    806
    423)
  PixelsPerInch = 96
  TextHeight = 13
  object Memo: TMemo
    Left = 511
    Top = 63
    Width = 295
    Height = 360
    Anchors = [akLeft, akTop, akRight, akBottom]
    ScrollBars = ssVertical
    TabOrder = 5
  end
  object IntersectionFullButton: TButton
    Left = 511
    Top = 7
    Width = 141
    Height = 25
    Caption = 'Intersection (full chains)'
    TabOrder = 1
    OnClick = IntersectionButtonClick
  end
  object Grid: TStringGrid
    Left = 0
    Top = 0
    Width = 503
    Height = 423
    Align = alLeft
    ColCount = 20
    DefaultColWidth = 24
    DefaultRowHeight = 20
    FixedCols = 0
    RowCount = 20
    FixedRows = 0
    TabOrder = 0
  end
  object DuplicatesFullButton: TButton
    Left = 658
    Top = 7
    Width = 141
    Height = 25
    Caption = 'Duplicates (full chains)'
    TabOrder = 3
    OnClick = DuplicatesButtonClick
  end
  object IntersectionPartialButton: TButton
    Left = 511
    Top = 35
    Width = 141
    Height = 25
    Caption = 'Intersection (partial chains)'
    TabOrder = 2
    OnClick = IntersectionButtonClick
  end
  object DuplicatesPartialButton: TButton
    Left = 658
    Top = 35
    Width = 141
    Height = 25
    Caption = 'Duplicates (partial chains)'
    TabOrder = 4
    OnClick = DuplicatesButtonClick
  end
end

1
if myarray[i][j] = myarray[j][k] then

这不应该是

if myarray[i][k] = myarray[j][k] then

?

无论如何,您可以对此代码进行的最明显、最简单的优化是更改此处

for I := 0 to length(myarray)-1 do
  begin
    for J := 0 to length(myarray)-1 do
    begin
      if i = j then
        continue;

变成这个

for I := 0 to length(myarray)-1 do
  begin
    for J := I+1 to length(myarray)-1 do
    begin

我的下一步将是在内部循环中消除外部索引表达式:

if myarray[i][j] = myarray[j][k] then

在I和J循环中,创建指向两个整数数组的指针,然后执行。
for I := 0 to length(myarray)-1 do
  begin
    pia := @myarray[i];
    for J := I+1 to length(myarray)-1 do
    begin
      pja := @myarray[j];

然后在内部循环中可以这样做

if pia^[j] = pja^[k] then

你可能是想说 if pia^[k] = pja^[k] then。不过,为什么要去掉外部索引的引用呢? - Andriy M
是的,我按照原帖的方式进行。外部引用在每次迭代中不会改变。 - 500 - Internal Server Error

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接