在O（NlogN）时间内找到最长公共子序列

Question

在O（NlogN）时间内找到最长公共子序列

algorithmdynamic-programminglcs

6

有没有一种方法可以在O（NlogN）时间内找到两个序列的最长公共子序列？

我在某个地方看到过使用二分搜索实现这一点的方法。

我知道需要O（N²）时间的dp方法。

- LTim

6个回答

4

动态规划方法适用于一般情况下的O(n²)。对于某些其他情况，存在更低复杂度的算法：

对于固定的字母表大小（不随n增长），有Four Russians方法，将时间降至O(n²/log n)（参见此处）。
请参见此处另一个进一步优化的情况。

- Ami Tavory

1

还有Meyers的非常实用的O(nd)方法，其中d是两个字符串之间的Levenshtein距离 - 如果存在有限数量的差异，则为O(n)。据我所知，这仍然是“diff”中使用的方法。 - j_random_hacker

2

假设指数时间假设（比P不等于NP更严格，但广泛认为是真实的），不可能在时间复杂度为O(N^{2-eps})的情况下解决问题，其中eps是任何正常数。请参阅Karl Bringmann和Marvin Kunnemann的"Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping"（可在arXiv上找到预印本）。

大致来说，这意味着这个问题的一般情况不能在O(N^2/log N)之外的时间内解决，因此，如果您想要更快的算法，必须考虑其他约束条件（字符串的某些属性）或寻找近似解决方案。

- Andrew Ryzhikov

1

两个序列之间的最长公共子序列基本上是n平方级别的。

Masek和Patterson（1980）使用所谓的“Four Russians”技术对n平方/ log n进行了轻微改进。

在大多数情况下，这种复杂的方法引入的额外复杂性并不能证明小幅收益。对于实际应用，您可以将n平方方法视为典型应用中合理的最优解。

- Tyler Durden

还有Meyers的非常实用的O(nd)方法，其中d是两个字符串之间的Levenshtein距离 - 如果存在有限数量的差异，则为O(n)。据我所知，这仍然是“diff”中使用的方法。 - j_random_hacker

0

vector <int> LIS;
int LongestIncreasingSubsequence(int n){
    if(!n) return 0;
    LIS.emplace_back(arr[0]);
    for(int i = 1; i < n; i++){
        if(arr[i] > LIS.back()) LIS.emplace_back(arr[i]);
        else *lower_bound(LIS.begin(), LIS.end(), arr[i]) = arr[i];
    }
    return LIS.size();
}

- user17656768

由于当前的回答表述不清，请[编辑]以添加更多细节，以帮助其他人理解它如何解决所问的问题。您可以在帮助中心找到有关撰写良好答案的更多信息。 - Community

0

以下是一种使用二分查找的O(NLOGM)方法来解决LCS问题：

//If the strings are not the same length, take the longest one
//Map each letter with a ArrayList representing the occurences on that letter (indexes)
//The ArrayLists are thus sorted. This will enable binary search later on
Map<Character, List<Integer>> map = new HashMap<>();
for (int i = 0; i < first.length(); i++) {
    char c = first.charAt(i);
    if (!map.containsKey(c)) {
        map.put(c, new ArrayList<>());
    }
    map.get(c).add(i);
}

//Loop on the second string and increment an index to check that any matching character is a valid match
//For sequence match we need to maintain the same order as in the initial string so we use an increasing index for that

int index = 0;
StringBuilder lcs = new StringBuilder();

for (int j = 0; j < second.length(); j++) {
    char c = second.charAt(j);
    if (map.containsKey(c)) {
        List<Integer> indexes = map.get(c);
        //The following method uses Binary Search to find the first bigger or equal index
        //This is possible since the indexes list is sorted by definition
        int firstBigger = firstBiggerOrEqualIndex(indexes, index);
        if (firstBigger != -1) {
            index = firstBigger+1;
            lcs.append(c);
        }
    }
}
return lcs.toString();

方法 firstBiggerOrEqualIndex 使用类似二分查找的方式：

private static int firstBiggerOrEqualIndex(List<Integer> indexes, int index) {

    int start = 0;
    int end = indexes.size()-1;
    int res = -1;
    while(start <= end) {
        int mid = (start+end)/2;
        int value = indexes.get(mid);
        if (value > index) {
            res = value;
            end = mid-1;
        }
        else if (value == index) {
            res = value;
            break;
        }
        else {
            start = mid + 1;
        }
    }
    return res;
}

- Rabih Sarieddine

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- sparik · Accepted Answer

对于一般情况，O(N^2)的动态规划算法是您能够做到的最好的。然而，在某些特殊情况下存在更好的算法。

1. 字母表大小受限

这是一种非常常见的情况。由来自某个字母表（例如英语）的字母组成的序列属于此类别。对于这种情况，可以使用四个俄罗斯人方法将O(N*M)算法优化为O(N^2/logN)。我不知道具体如何操作，您可以搜索一下。

2. 两个序列都由不同元素组成

一个例子问题是“给定两个从1到N的数字排列，找到它们的LCS”。这个问题可以在O(N*logN)中解决。让序列A和B。定义一个序列C。C[i]是B[i]在A中的索引。(A[C[i]] = B[i]) A和B的LCS是C的最长递增子序列。