Levenshtein距离:从矩阵中推断编辑操作

8

我用C++编写了Levenshtein算法。

如果我输入:
字符串s: democrat
字符串t: republican

我得到填充好的矩阵D,操作次数(Levenshtein距离)可以在D[10][8] = 8中读取。
除了已填充的矩阵,我想构建最优解。这个最优解应该是什么样的?我不知道。
请告诉我这个例子下最优解应该是什么样子。


2
维基百科页面上有一个例子(http://en.wikipedia.org/wiki/Levenshtein_distance)。这个能帮到你吗? - Oliver Charlesworth
1
到处都是填充的矩阵。但我需要从这个矩阵中构建解决方案。但是解决方案会是什么样子呢? - borebardha
@borebardha:你读了这篇文章吗?还是伪代码?它说答案是矩阵的右下角元素。 - Oliver Charlesworth
@borebardha:你说的“要打印什么”是什么意思?Levenshtein距离由矩阵的右下角元素给出。如果@Matthieu是正确的,那么答案是8。 - Oliver Charlesworth
2
Oli,8是插入、删除、替换的平均操作次数。这8个操作是什么?如何输出? - borebardha
8个回答

41

问题是:
给定Levenshtein算法生成的矩阵,如何找到"最优解"?
也就是说,我们如何找到将's string'转换为't string'所需的插入、删除和替换(单个字母)操作的精确序列?

首先,应该注意到在许多情况下存在多个最优解
虽然Levenshtein算法提供了最少的操作数(民主党/共和党示例中为8),但有许多序列(8个操作)可以产生此转换。

通过“解码”Levenshtein矩阵,可以枚举所有这样的最优序列。
一般思路是,所有最优解都遵循一条“路径”,从左上角到右下角(或反向),在此路径上的矩阵单元格值要么保持不变,要么增加一(或在相反方向上减少一),从0开始,以字符串的最佳操作数结束(在民主党/共和党案例中为0到8)。当需要进行操作时数字会增加,当对应位置的字符相同时数字保持不变。

可以轻松地生成一个算法来产生这样的路径(稍微复杂一些的是生成所有可能的路径),并从这样的路径推断出操作序列。

此路径查找算法应从右下角开始向后工作。采用此方法的原因是我们确信要成为最优解,它必须以此角落结束,并且要以此角落结束,它必须来自其左侧、上方或对角线上的3个单元格之一。通过选择满足我们的“相同值或递减1”要求的其中一个单元格,我们有效地选择了一个最佳路径上的单元格。通过重复此操作,直到达到左上角(或者直到到达值为0的单元格),我们可以在最佳路径上回溯。

民主党-共和党示例说明

还应注意,可以使用'democrat'水平或垂直地构建矩阵。这不会改变Levenshtein距离的计算,也不会改变所需的操作列表;它只改变了我们解释矩阵的方式,例如在“路径”上水平移动意味着插入字符[来自t字符串]或删除字符[来自s字符串],具体取决于在矩阵中'string s'是“水平”还是“垂直”的。

我将使用以下矩阵。因此,约定如下(仅在从左到右和/或从上到下的方向上)

  • 水平移动是从“t字符串”中插入字母
  • 垂直移动是从“s字符串”中删除字母
  • 对角线移动可以是:
    • 不操作(各自位置上的字母相同); 数字不变
    • 替换(各自位置上的字母不同); 数字增加1。

s =“democrat”,t =“republican”的Levenshtein矩阵

      r  e  p  u  b  l  i  c  a  n
   0  1  2  3  4  5  6  7  8  9  10
d  1  1  2  3  4  5  6  7  8  9  10
e  2  2  1  2  3  4  5  6  7  8  9
m  3  3  2  2  3  4  5  6  7  8  9
o  4  4  3  3  3  4  5  6  7  8  9
c  5  5  4  4  4  4  5  6  6  7  8
r  6  5  5  5  5  5  5  6  7  7  8
a  7  6  6  6  6  6  6  6  7  7  8
t  8  7  7  7  7  7  7  7  7  8  8

我使用的任意方法来从多个可能的最优路径中选择一条路径,大致描述如下:

Starting at the bottom-rightmost cell, and working our way backward toward 
the top left.

For each "backward" step, consider the 3 cells directly adjacent to the current
cell   (in the left, top or left+top directions)

   if the value in the diagonal cell (going up+left) is smaller or equal to the
      values found in the other two cells
   AND 
      if this is same or 1 minus the value of the current cell 
   then  "take the diagonal cell"
         if the value of the diagonal cell is one less than the current cell:
            Add a SUBSTITUTION operation (from the letters corresponding to
            the _current_ cell)
         otherwise: do not add an operation this was a no-operation.

   elseif the value in the cell to the left is smaller of equal to the value of
       the of the cell above current cell
   AND 
       if this value is same or 1 minus the value of the current cell 
   then "take the cell to left", and
        add an INSERTION of the letter corresponding to the cell
   else
       take the cell above, add
       Add a DELETION operation of the letter in 's string'

按照这个非正式的伪代码,我们得到以下结果:

从右下角的“n”、“t”单元格开始。
选择[对角线]“a”、“a”单元格作为下一个目标,因为它小于另外两个单元格(并满足相同或-1条件)。
注意新单元格比当前单元格少1
因此步骤8用“n”替换“t”:democraN

继续使用“a”、“a”单元格,
选择[对角线]“c”、“r”单元格作为下一个目标...
注意新单元格与当前单元格具有相同的值==> 无需操作

继续使用“c”、“r”单元格,
选择[对角线]“i”、“c”单元格作为下一个目标...
注意新单元格比当前单元格少1
因此步骤7用“c”替换“r”:democCan

继续使用“i”、“c”单元格,
选择[对角线]“l”、“o”单元格作为下一个目标...
注意新单元格比当前单元格少1
因此步骤6用“I”替换“c”:demoIcan

继续使用“l”、“o”单元格,
选择[对角线]“b”、“m”单元格作为下一个目标...
注意新单元格比当前单元格少1
因此步骤5用“L”替换“o”:demLican

继续使用“b”、“m”单元格,
选择[对角线]“u”、“e”单元格作为下一个目标...
注意新单元格比当前单元格少1
因此步骤4用“B”替换“m”:deBlican

继续使用“u”、“e”单元格,
注意到“对角线”单元格不符合条件,因为“左”单元格小于它。 选择[左侧]“p”、“e”单元格作为下一个目标...
因此步骤3在“e”后插入“u”:deUblican

继续使用“p”、“e”单元格,
同样,“对角线”单元格不符合条件 选择[左侧]“e”、“e”单元格作为下一个目标...
因此步骤2在“e”后插入“p”:dePublican

继续使用 "e","e" 单元格,
选择 [对角线] "r","d" 单元格作为下一个目标...
注意新单元格的值与当前单元格相同 ==> 无需操作

继续使用 "r","d" 单元格,
选择 [对角线] "start" 单元格作为下一个目标...
注意新单元格的值比当前单元格少 1
因此步骤 1 是用 "d" 替换 "r":     R epublican

您已到达值为 0 的单元格: 您的工作完成了


这个视频对我很有帮助。https://www.youtube.com/watch?v=We3YDTzNXEk - MyounghoonKim
如何实现所有路径? - Matías Guzmán Naranjo

3

用 Python 实现矩阵推理的回溯算法:

    def _backtrack_string(matrix, output_word):
    '''
    Iteratively backtrack DP matrix to get optimal set of moves

    Inputs: DP matrix (list:list:int),
            Input word (str),
            Output word (str),
            Start x position in DP matrix (int),
            Start y position in DP matrix (int)
    Output: Optimal path (list)
    '''

    i = len(matrix) - 1
    j = len(matrix[0]) - 1
    optimal_path = []
    while i > 0 and j > 0:
        diagonal = matrix[i-1][j-1]
        vertical = matrix[i-1][j]
        horizontal = matrix[i][j-1]
        current = matrix[i][j]
        if diagonal <= vertical and diagonal <= horizontal and (diagonal <= current):
            i = i - 1
            j = j - 1
            if diagonal == current - 1:
                optimal_path.append("Replace " + str(j) + ", " + str(output_word[j]) )
            elif horizontal <= vertical and horizontal <= current:
                j = j - 1
                optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
            elif vertical <= horizontal and vertical <= current:
                i = i - 1
                optimal_path.append("Delete " + str(i))
        elif horizontal <= vertical and horizontal <= current:
            j = j - 1
            optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
        else:
            i = i - 1
            optimal_path.append("Delete " + str(i))

    return reversed(optimal_path)

当我使用原始单词“OPERATING”和期望单词“CONSTANTINE”运行算法时,得到的输出如下所示。
    Insert 0, C
    Replace 2, N
    Replace 3, S
    Replace 4, T
    Insert 6, N
    Replace 10, E

       ""  C  O  N  S  T  A  N  T  I  N   E

    "" [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 
              <--                              Insert 0, C
    O  [1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9,  10]
                \                              Replace 2, N
    P  [2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9,  10]
                   \                           Replace 3, S
    E  [3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9,  9]
                      \                        Replace 4, T
    R  [4, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9,  10]  No move
                         \ <--                 Insert 6, N
    A  [5, 5, 5, 5, 5, 5, 4, 5, 6, 7, 8,  9]
                               \               No move
    T  [6, 6, 6, 6, 6, 5, 5, 5, 5, 6, 7,  8]
                                  \            No move
    I  [7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 6,  7]
                                     \         No move
    N  [8, 8, 8, 7, 8, 7, 7, 6, 7, 6, 5,  6]
                                        \      Replace 10, E
    G  [9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6,  6]

请注意,如果对角线上的元素与当前元素相同,则需要添加额外条件。这取决于垂直(向上)和水平(向左)位置的值,可能会发生删除或插入。只有在出现以下情况时,我们才会得到“无操作”或“替换”操作。
# assume bottom right of a 2x2 matrix is the reference position 
# and has value v
# the following is the situation where we get a replace operation
    [v + 1 , v<]
    [  v<  , v]
# the following is the situation where we get a "no operation"
    [v , v<]
    [v<, v ] 

我认为第一个答案中描述的算法可能会出现问题。当没有正确操作时,2x2矩阵中可能有其他排列方式。以输入“OPERATING”和输出“CONSTANTINE”为例,除非考虑到这一点,否则该算法将失效。


1

我已经有一段时间没有用它了,但在我看来,矩阵应该长这样:

. . r e p u b l i c a n
. 0 1 2 3 4 5 6 7 8 9 10
d 1 1 2 3 4 5 6 7 8 9 10
e 2 2 1 2 3 4 5 6 7 8 9
m 3 3 2 2 3 4 5 6 7 8 9
o 4 4 3 3 3 4 5 6 7 8 9
c 5 5 4 4 4 4 5 6 7 8 9
r 6 5 5 5 5 5 5 6 7 8 9
a 7 6 6 6 6 6 6 6 7 7 8
t 8 7 7 7 7 7 7 7 7 7 8

但不要把它视为理所当然。


2
谢谢。但是,除了这个矩阵之外,我需要构建解决方案。解决方案应该是什么样子? - borebardha
1
这与寻找结果操作无关。 - jhpratt

1
这里是基于mjv答案的VBA算法。(非常好地解释了,但有些情况缺失。)
    Sub TU_Levenshtein()

        Call Levenshtein("democrat", "republican")

        Call Levenshtein("ooo", "u") 

        Call Levenshtein("ceci est un test", "ceci n'est pas un test")

    End Sub

    Sub Levenshtein(ByVal string1 As String, ByVal string2 As String)

        ' Fill Matrix Levenshtein (-> array 'Distance')

        Dim i As Long, j As Long
        Dim string1_length As Long
        Dim string2_length As Long
        Dim distance() As Long

        string1_length = Len(string1)
        string2_length = Len(string2)
        ReDim distance(string1_length, string2_length)

        For i = 0 To string1_length
            distance(i, 0) = i
        Next

        For j = 0 To string2_length
            distance(0, j) = j
        Next

        For i = 1 To string1_length
            For j = 1 To string2_length
                If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
                    distance(i, j) = distance(i - 1, j - 1)
                Else
                    distance(i, j) = Application.WorksheetFunction.min _
                    (distance(i - 1, j) + 1, _
                     distance(i, j - 1) + 1, _
                     distance(i - 1, j - 1) + 1)
                End If
            Next
        Next

        LevenshteinDistance = distance(string1_length, string2_length) ' for information only

        ' Write Matrix on VBA sheets (only for visuation, not used in calculus)

        Cells.Clear

        For i = 1 To UBound(distance, 1)
            Cells(i + 2, 1).Value = Mid(string1, i, 1)
        Next i

        For i = 1 To UBound(distance, 2)
            Cells(1, i + 2).Value = Mid(string2, i, 1)
        Next i

        For i = 0 To UBound(distance, 1)

            For j = 0 To UBound(distance, 2)

                Cells(i + 2, j + 2) = distance(i, j)

            Next j

        Next i

        ' One solution

        current_posx = UBound(distance, 1)
        current_posy = UBound(distance, 2)

        Do

            cc = distance(current_posx, current_posy)

            Cells(current_posx + 1, current_posy + 1).Interior.Color = vbYellow ' visualisation again

            ' Manage border case

            If current_posy - 1 < 0 Then

                MsgBox ("deletion. " & Mid(string1, current_posx, 1))

                current_posx = current_posx - 1
                current_posy = current_posy

                GoTo suivant

            End If

            If current_posx - 1 < 0 Then

                MsgBox ("insertion. " & Mid(string2, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            End If

            ' Middle cases

            cc_L = distance(current_posx, current_posy - 1)
            cc_U = distance(current_posx - 1, current_posy)
            cc_D = distance(current_posx - 1, current_posy - 1)

            If (cc_D <= cc_L And cc_D <= cc_U) And (cc_D = cc - 1 Or cc_D = cc) Then

                If (cc_D = cc - 1) Then

                    MsgBox "substitution. " & Mid(string1, current_posx, 1) & " by " & Mid(string2, current_posy, 1)

                    current_posx = current_posx - 1
                    current_posy = current_posy - 1

                    GoTo suivant

                Else

                    MsgBox "no operation"

                    current_posx = current_posx - 1
                    current_posy = current_posy - 1

                    GoTo suivant

                End If

            ElseIf cc_L <= cc_D And cc_L = cc - 1 Then

                MsgBox ("insertion. " & Mid(string2, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            Else

                MsgBox ("deletion." & Mid(string1, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            End If

    suivant:

        Loop While Not (current_posx = 0 And current_posy = 0)

    End Sub

0

最近我在使用Levenshtein距离算法的矩阵方面做了一些工作。我需要生成将一个列表转换为另一个列表所需的操作。(这对字符串也适用。)

以下(vows)测试是否展示了您正在寻找的功能?

  , "lev - complex 2"
  : { topic
    : lev.diff([13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11], [9, 13, 6, 5, 1, 8, 2, 15, 12, 11])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 9, val: 7 },
                                                 { op: 'delete', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 0, val: 9 },
                                                ]); }
    }
  , "lev - complex 3"
  : { topic
    : lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
                                                 { op: 'insert', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 9, val: 7 }
                                                ]); }
    }
  , "lev - complex 4"
  : { topic
    : lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11, 16], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11, 17])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
                                                 { op: 'insert', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 9, val: 7 },
                                                 { op: 'replace', pos: 11, val: 17 }
                                                ]); }
    }

0
C#实现了JackIsJack的回答,并进行了一些修改:
- 操作以“正向”顺序输出(JackIsJack的输出是反向的); - 原始回答中的最后一个“else”子句工作不正确(看起来像是复制粘贴错误)。
控制台应用程序代码:
class Program
{
    static void Main(string[] args)
    {
        Levenshtein("1", "1234567890");
        Levenshtein( "1234567890", "1");

        Levenshtein("kitten", "mittens");
        Levenshtein("mittens", "kitten");
        Levenshtein("kitten", "sitting");
        Levenshtein("sitting", "kitten");
        Levenshtein("1234567890", "12356790");
        Levenshtein("12356790", "1234567890");
        Levenshtein("ceci est un test", "ceci n'est pas un test");
        Levenshtein("ceci n'est pas un test", "ceci est un test");
    }

    static void Levenshtein(string string1, string string2)
    {
        Console.WriteLine("Levenstein '" + string1 + "' => '" + string2 + "'");

        var string1_length = string1.Length;
        var string2_length = string2.Length;

        int[,] distance = new int[string1_length + 1, string2_length + 1];

        for (int i = 0; i <= string1_length; i++)
        {
            distance[i, 0] = i;
        }


        for (int j = 0; j <= string2_length; j++)
        {
            distance[0, j] = j;
        }


        for (int i = 1; i <= string1_length; i++)
        {
            for (int j = 1; j <= string2_length; j++)
            {
                if (string1[i - 1] == string2[j - 1])
                {
                    distance[i, j] = distance[i - 1, j - 1];
                }
                else
                {
                    distance[i, j] = Math.Min(distance[i - 1, j] + 1, Math.Min(
                       distance[i, j - 1] + 1,
                       distance[i - 1, j - 1] + 1));
                }

            }
        }


        var LevenshteinDistance = distance[string1_length, string2_length];// for information only
        Console.WriteLine($"Levernstein distance: {LevenshteinDistance}");

        // List of operations
        var current_posx = string1_length;
        var current_posy = string2_length;

        var stack = new Stack<string>(); // for outputting messages in forward direction

        while (current_posx != 0 || current_posy != 0)
        {
            var cc = distance[current_posx, current_posy];
            // edge cases
            if (current_posy - 1 < 0)
            {
                stack.Push("Delete '" + string1[current_posx - 1] + "'");
                current_posx--;
                continue;
            }

            if (current_posx - 1 < 0)
            {
                stack.Push("Insert '" + string2[current_posy - 1] + "'");
                current_posy--;
                continue;
            }

            // Middle cases
            var cc_L = distance[current_posx, current_posy - 1];
            var cc_U = distance[current_posx - 1, current_posy];
            var cc_D = distance[current_posx - 1, current_posy - 1];

            if ((cc_D <= cc_L && cc_D <= cc_U) && (cc_D == cc - 1 || cc_D == cc))
            {
                if (cc_D == cc - 1)
                {
                    stack.Push("Substitute '" + string1[current_posx - 1] + "' by '" + string2[current_posy - 1] + "'");
                    current_posx--;
                    current_posy--;
                }
                else
                {
                    stack.Push("Keep '" + string1[current_posx - 1] + "'");
                    current_posx--;
                    current_posy--;
                }
            }
            else if (cc_L <= cc_D && cc_L == cc - 1)
            {
                stack.Push("Insert '" + string2[current_posy - 1] + "'");
                current_posy--;                   
            }
            else
            {
                stack.Push("Delete '" + string1[current_posx - 1]+"'");
                current_posx--;                   
            }
        }

        while(stack.Count > 0)
        {
            Console.WriteLine(stack.Pop());
        }
    }
}

0

这里有一些Matlab代码,你认为它正确吗?看起来给出了正确的结果 :)

clear all

s = char('democrat');
t = char('republican');

% Edit Matrix
m=length(s);
n=length(t);
mat=zeros(m+1,n+1);
for i=1:1:m
    mat(i+1,1)=i;
end
for j=1:1:n
    mat(1,j+1)=j;
end
for i=1:m
    for j=1:n
        if (s(i) == t(j))
            mat(i+1,j+1)=mat(i,j);
        else
            mat(i+1,j+1)=1+min(min(mat(i+1,j),mat(i,j+1)),mat(i,j));
        end
    end
end

% Edit Sequence
s = char('democrat');
t = char('republican');
i = m+1;
j = n+1;
display([s ' --> ' t])
while(i ~= 1 && j ~= 1)
    temp = min(min(mat(i-1,j-1), mat(i,j-1)), mat(i-1,j));
    if(mat(i-1,j) == temp)
        i = i - 1;
        t = [t(1:j-1) s(i) t(j:end)];
        disp(strcat(['iinsertion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
    elseif(mat(i-1,j-1) == temp)
        if(mat(i-1,j-1) == mat(i,j))
            i = i - 1;
            j = j - 1;
            disp(strcat(['uunchanged: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
        else
            i = i - 1;
            j = j - 1;
            t(j) = s(i);
            disp(strcat(['substition: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
        end
    elseif(mat(i,j-1) == temp)
        j = j - 1;
        t(j) = [];
        disp(strcat(['dddeletion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
    end
end

0
获取所有的编辑路径代码与编辑矩阵、源和目标相对应。如果有任何错误,请留下评论。非常感谢!
import copy
from typing import List, Union


def edit_distance(source: Union[List[str], str],
                  target: Union[List[str], str],
                  return_distance: bool = False):
    """get the edit matrix
    """
    edit_matrix = [[i + j for j in range(len(target) + 1)] for i in range(len(source) + 1)]
    
    for i in range(1, len(source) + 1):
        for j in range(1, len(target) + 1):
            if source[i - 1] == target[j - 1]:
                d = 0
            else:
                d = 1
            edit_matrix[i][j] = min(edit_matrix[i - 1][j] + 1,
                                    edit_matrix[i][j - 1] + 1,
                                    edit_matrix[i - 1][j - 1] + d)
    if return_distance:
        return edit_matrix[len(source)][len(target)]
    return edit_matrix


def get_edit_paths(matrix: List[List[int]],
                   source: Union[List[str], str],
                   target: Union[List[str], str]):
    """get all the valid edit paths
    """
    all_paths = []
    
    def _edit_path(i, j, optimal_path):
        if i > 0 and j > 0:
            diagonal = matrix[i - 1][j - 1]  # the diagonal value
            vertical = matrix[i - 1][j]      # the above value
            horizontal = matrix[i][j - 1]    # the left value
            current = matrix[i][j]           # current value
            
            # whether the source and target token are the same
            flag = False
            # compute the minimal value of the diagonal, vertical and horizontal
            minimal = min(diagonal, min(vertical, horizontal))
            
            # if the diagonal is the minimal
            if diagonal == minimal:
                new_i = i - 1
                new_j = j - 1
                path_ = copy.deepcopy(optimal_path)
                # if the diagnoal value equals to current - 1
                # it means `replace`` operation
                if diagonal == current - 1:
                    path_.append(f"Replace | {new_j} | {target[new_j]}")
                    _edit_path(new_i, new_j, path_)
                # if the diagonal value equals to current value
                # and corresponding positional value of source and target equal
                # it means this is current best path
                elif source[new_i] == target[new_j]:
                    flag = True
                    # path_.append(f"Keep | {new_i}")
                    _edit_path(new_i, new_j, path_)
            # if the position doesn't have best path
            # we need to consider other situations
            if not flag:
                # if vertical value equals to minimal
                # it means delete source corresponding value
                if vertical == minimal:
                    new_i = i - 1
                    new_j = j
                    path_ = copy.deepcopy(optimal_path)
                    path_.append(f"Delete | {new_i}")
                    _edit_path(new_i, new_j, path_)
                
                # if horizontal value equals to minimal
                # if mean insert target corresponding value to source
                if horizontal == minimal:
                    new_i = i
                    new_j = j - 1
                    path_ = copy.deepcopy(optimal_path)
                    path_.append(f"Insert | {new_j} | {target[new_j]}")
                    _edit_path(new_i, new_j, path_)
        else:
            all_paths.append(list(reversed(optimal_path)))
    # get the rows and columns of the edit matrix
    row_len = len(matrix) - 1
    col_len = len(matrix[0]) - 1
    _edit_path(row_len, col_len, optimal_path=[])
    return all_paths


if __name__ == "__main__":
    source = "BBDEF"
    target = "ABCDF"
    matrix = edit_distance(source, target)
    print("print paths")
    paths = get_edit_paths(matrix, source=list(source), target=list(target))
    for path in paths:
        print(path)


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接