寻找包含另一个字符串所有字符的最短子串的算法

6
我有两个字符串: string1 - hello how are you, String2 - olo (包括空格字符)
输出: lo ho ( hello how are you ) lo ho是唯一包含所有string2字符的子字符串。 请问是否可以提供一个好的算法来解决这个问题(我只能想到Brute Force算法-O(n^2))。
此外,输出应为最短长度的字符串(如果有多个选项)。

暴力破解不是O(n^2),而是O(n^3) - 检查每个子字符串本身就是O(n),而这些子字符串有O(n^2)个。除非你有其他想法? - amit
输出的结果应该包含空格还是像原问题中一样没有空格? - Aprillion
@deathApril 输出的子字符串包含所有字符,包括这个空格。它也包括 h - Niklas B.
1
我不明白你的问题... 两个字符串如何被“lo ho”验证,因为String1和输出都不包含“olo”。 - Naveed Butt
1
如果我们考虑单个字符,那么就会有多个输出... - Naveed Butt
1
@NaveedButt 字符串"lo ho"是字符串string1中最小的子字符串,其字符的多重集合是string2字符的多重集合的超集。从问题中的语句“lo ho是唯一包含string2所有字符的子字符串”必须是一个错误,否则对我也毫无意义。 - Niklas B.
4个回答

6
保持两个指针l和r,以及一个哈希表M=字符->计数,用于存储字符串2中不在s[l..r]中出现的字符。
最初将l设置为0,并设置r,使得string1[l..r]包含所有string2的字符(如果可能的话)。通过从M中删除字符直到为空来实现这一点。
然后,每次将r递增1,然后尽可能多地递增l,同时仍保持M为空。所有r-l+1(子串s[l..r]的长度)中的最小值即为解决方案。
Pythonish伪代码:
n = len(string1)
M = {}   # let's say M is empty if it contains no positive values
for c in string2:
    M[c]++
l = 0
r = -1
while r + 1 < n and M not empty:
    r++
    M[string1[r]]--
if M not empty: 
    return "no solution"
answer_l, answer_r = l, r
while True:
    while M[string1[l]] < 0:
        M[string1[l]]++
        l++
    if r - l + 1 < answer_r - anwer_l + 1:
        answer_l, answer_r = l, r
    r++
    if r == n:
        break
    M[string1[r]]--
return s[answer_l..answer_r]

如果在执行增量和减量操作时,保持正数条目的数量,则“为空”检查可以在O(1)中实现。
nstring1的长度,mstring2的长度。
请注意,lr仅进行增加,因此最多会进行O(n)次增加,因此在最后的外部循环中最多执行O(n)个指令。
如果将M实现为数组(假设字母表的大小是常量),则运行时为O(n + m),这是最佳的。如果字母表太大,则可以使用哈希表来获得期望的O(n + m)。
示例执行:
string1 = "abbabcdbcb"
string2 = "cbb"

# after first loop
M = { 'a': 0, 'b': 2, 'c': 1, 'd': 0 }

# after second loop
l = 0
r = 5
M = { 'a': -2, 'b': -1, 'c': 0, 'd': 0 }

# increment l as much as possible:
l = 2
r = 5
M = { 'a': -1, 'b': 0, 'c': 0, 'd': 0 }

# increment r by one and then l as much as possible
l = 2
r = 6
M = { 'a': -1, 'b': 0, 'c': 0, 'd': -1 }

# increment r by one and then l as much as possible
l = 4
r = 7
M = { 'a': 0, 'b': 0, 'c': 0, 'd': -1 }

# increment r by one and then l as much as possible
l = 4
r = 8
M = { 'a': 0, 'b': 0, 'c': -1, 'd': -1 }

# increment r by one and then l as much as possible
l = 7
r = 9
M = { 'a': 0, 'b': 0, 'c': 0, 'd': 0 }

最佳的解决方案是 s[7..9]。

1
等等..一开始我把伪代码丢掉了,因为它过于简化了,但现在我看不到任何逃脱O(n²)最坏情况的方法来使其返回正确的解决方案?例如对于OP的数据,你不能停在“llho wo”,所以“while M[string1[l]] < 0”是不够的,它需要继续到最后一个“l”。另一个例子是“s1 ='abacccccbc'”,“s2 ='bc'”不能停在“bac”,它需要一直走到“bc”,这将使算法成为二次方程(即使具有良好的常数因子,对于大多数数百万个字符的情况可能足够快)..或者我错过了什么? - Aprillion
1
@deathApril l和r只会增加,不会减少,因此循环迭代的总次数为O(n)。我没有说你应该在某个地方停止,只是尝试每个可能的“r”,所以在你的第二个例子中,你最终到达s[l..r] = bc。诀窍在于与增加“r”相关联的“l”是单调递增的。 - Niklas B.
我明白了,谢谢。我也花了一些时间来理解M的工作原理,你可以添加注释,如# M = {o:2, l:1}# M = {h:-2, e: -1, l: -1, o: 0}以进行说明。 - Aprillion
哦,如果字符串可以是Unicode,那么M最好用defaultdict而不是array。 - Aprillion
@deathApril 这要看情况。从技术上讲,字母表只需要是恒定大小的,这在Unicode中是成立的。但当然,如果你想节省空间,也可以使用哈希表。 - Niklas B.

0

有一个算法,可以在 O(N) 时间内完成。

思路:准备两个数组,即 isRequired[256] 和 isFound[256],分别用于记录字符串 S 中每个字符的频率以及在解析字符串 S 时已经发现的每个字符的频率。同时,还需要一个计数器来指示何时找到一个有效的窗口。一旦找到了一个有效的窗口,我们就可以移动窗口(向右移动),并保持问题所给出的不变式。

C++ 程序:

void findMinWindow(const char *text, const char *pattern, int &start, int &end){
        //Calcuate lengths of text and pattern
        int textLen = strlen(text);
        int patternLen = strlen(pattern);

        // Declare 2 arrays which keep tab of required & found frequency of each char in pattern
        int isRequired[256] ; //Assuming the character set is in ASCII
        int isFound[256];
        int count = 0; //For ascertaining whether a valid window is found

        // Keep a tab of minimum window 
        int minimumWindow = INT_MAX;

        //Prepare the isRequired[] array by parsing the pattern
        for(int i=0;i<patternLen;i++){
            isRequired[pattern[i]]++;
        }

        //Let's start parsing the text now
        // Have 2 pointers: i and j - both starting at 0
        int i=0;
        int j=0;
        //Keep moving j forward, keep i fixed till we get a valid window
        for(c=j;c<textLen;c++){
           //Check if the character read appears in pattern or not
           if(isRequired[text[c]] == 0){
              //This character does not appear in the pattern; skip this
              continue;
           }
           //We have this character in the pattern, lets increment isFound for this char
           isFound[text[c]]++;

           //increment the count if this character satisfies the invariant
           if(isFound[text[c]] <= isRequired[text[c]]){
              count++;
           }

           //Did we find a valid window yet?
           if(count == patternLen){
              //A valid window is found..lets see if we can do better from here on
              //better means: increasing i to reduce window length while maintaining invariant
              while(isRequired[s[i]] == 0 || isFound[s[i]] > isRequired[s[i]]){
                   //Either of the above 2 conditions means we should increment i; however we 
                   // must decrease isFound for this char as well.
                   //Hence do a check again
                   if(isFound[s[i]] > isRequired[s[i]]){
                      isFound[s[i]]--;
                   }
                   i++;
              }

               // Note that after the while loop, the invariant is still maintained
               // Lets check if we did better
               int winLength = j-i+1;
               if(winLength < minimumWindow){
                  //update the references we got
                  begin = i;
                  end = j;
                  //Update new minimum window lenght
                  minimumWindow = winLength;
               }
          } //End of if(count == patternLen)
     } //End of for loop 
}

0

这是一个使用JavaScript实现的示例。逻辑与@Aprillion上面写的类似。

演示:http://jsfiddle.net/ZB6vm/4/

var s1 = "hello how are you";
var s2 = "olo";
var left, right;
var min_distance;
var answer = "";

// make permutation recursively
function permutate(ar, arrs, k) {
    // check if the end of recursive call
    if (k == arrs.length) {
        var r = Math.max.apply(Math, ar);
        var l = Math.min.apply(Math, ar);
        var dist = r - l + 1;
        if (dist <= min_distance) {
            min_distance = dist;
            left = l;
            right = r;
        }
        return;
    }
    for (var i in arrs[k]) {
        var v = arrs[k][i];
        if ($.inArray(v, ar) < 0) {
            var ar2 = ar.slice();
            ar2.push(v);
             // recursive call
            permutate(ar2, arrs, k + 1);
        }
    }
}

function solve() {
    var ar = [];   // 1-demension array to store character position
    var arrs = []; // 2-demension array to store character position
    for (var i = 0; i < s2.length; i++) {
        arrs[i] = [];
        var c = s2.charAt(i);
        for (var k = 0; k < s1.length; k++) { // loop by s1
            if (s1.charAt(k) == c) {
                if ($.inArray(k, arrs[i]) < 0) {
                    arrs[i].push(k); // save position found
                }
            }
        }
    }
    // call permutate
    permutate(ar, arrs, 0);
    answer = s1.substring(left, right + 1);
    alert(answer);
}

solve();

希望这能有所帮助。

0
我会计算出在string1中的字符位置,然后选择最小距离的排列方式,使得最低和最高字符位置之间的距离最小。
#          positions are:
#          01234567890123456
string1 = 'hello how are you'
string2 = 'olo'

# get string1 positions for each character from set(string2)
positions = {'o': [4, 7, 15],
             'l': [2, 3]}

# get all permutations of positions (don't repeat the same element)
# then pick the permutation with minimum distance between min and max position
# (obviously, this part can be optimized, this is just an illustration)
permutations = positions['o'] * positions['l'] * positions['o']
permutations = [[4,2,7], [4,3,7], [4,2,15], ...]
the_permutation = [4,3,7]

# voilà
output = string1_without_spaces[3:7]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接