找到包含一个字符串的最短可能子串

4

这是最近一次编程面试中提出的问题。

给定一个随机字符串S和另一个只包含唯一元素的字符串T,找到S中包含T所有元素的最小连续子串。例如,

S='adobecodebanc' 
T='abc' 
Answer='banc'

我想出了一个解决方案,

public static String completeSubstring(String T, String S){

        String minSub = T;
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i <T.length()-1; i++) {
            for (int j = i + 1; j <= T.length() ; j++) {
                String sub = T.substring(i,j);
                if(stringContains(sub, S)){
                    if(sub.length() < minSub.length()) minSub = sub;
                }
            }
        }
        return minSub;

    }
    private static boolean stringContains(String t, String s){
        //if(t.length() <= s.length()) return false;

        int[] arr = new int[256];

        for (int i = 0; i <t.length() ; i++) {
            char c = t.charAt(i);
            arr[c -'a'] = 1;
        }
        boolean found = true;
        for (int i = 0; i <s.length() ; i++) {
            char c = s.charAt(i);
            if(arr[c - 'a'] != 1){
                found = false;
                break;
            }else continue;
        }
        return found;
    }

这个算法的时间复杂度为O(n3),显然并不理想。有没有人能提供更好的算法呢?


示例中是否有错别字?“band”似乎根本不是S的连续子字符串... - yaccob
1
是的,你可以在O(N)时间内完成这个任务。不过这是一个很好的面试问题,我不想泄露答案。 - Matt Timmermans
@yaccob,我不明白你在说什么。你在哪里找到了 band - Zeus
@PaulBoddington 已添加 O(N) 解决方案。我想在面试前有人偶然读到它的可能性非常小。 - Matt Timmermans
@Zeus乐队在被纠正之前就已经存在了。与此同时,现在一切都好。 - yaccob
显示剩余3条评论
3个回答

7

以下是O(N)解决方案。

需要注意的复杂度问题在于,每个工作单位都涉及将startend加1,它们不会减少,并且算法在它们都到达末尾之前停止。

public static String findSubString(String s, String t)
{
    //algorithm moves a sliding "current substring" through s
    //in this map, we keep track of the number of occurrences of
    //each target character there are in the current substring

    Map<Character,int[]> counts = new HashMap<>();
    for (char c : t.toCharArray())
    {
        counts.put(c,new int[1]);
    }

    //how many target characters are missing from the current substring
    //current substring is initially empty, so all of them
    int missing = counts.size();

    //don't waste my time
    if (missing<1)
    {
        return "";
    }

    //best substring found
    int bestStart = -1, bestEnd = -1;

    //current substring
    int start=0, end=0;
    while (end<s.length())
    {
        //expand the current substring at the end
        int[] cnt = counts.get(s.charAt(end++));
        if (cnt!=null)
        {
            if (cnt[0]==0)
            {
                --missing;
            }
            cnt[0]+=1;
        }
        //while the current substring is valid, remove characters
        //at the start to see if a shorter substring that ends at the
        //same place is also valid 
        while(start<end && missing<=0)
        {
            //current substring is valid
            if (end-start < bestEnd-bestStart || bestEnd<0)
            {
                bestStart = start;
                bestEnd = end;
            }
            cnt = counts.get(s.charAt(start++));
            if (cnt != null)
            {
                cnt[0]-=1;
                if (cnt[0]==0)
                {
                    ++missing;
                }
            }
        }
        //current substring is no longer valid.  we'll add characters
        //at the end until we get another valid one
        //note that we don't need to add back any start character that
        //we just removed, since we already tried the shortest valid string
        //that starts at start-1

    }
    return(bestStart<=bestEnd ? s.substring(bestStart,bestEnd) : null);
}

1
听起来有些天真,您能否用几行逻辑注释一下您的代码,我有点迷失了。 - Zeus
@Zeus,当然。这些有帮助吗? - Matt Timmermans
1
很棒的答案(+1)。 - Paul Boddington

1

我知道已经有一个足够的O(N)复杂度的答案,但是我尝试自己解决这个问题而不查找,只是因为它是一个有趣的问题,并想分享一下。这是我想出来的O(N)解决方案:

public static String completeSubstring(String S, String T){
    int min = S.length()+1, index1 = -1, index2 = -1;
    ArrayList<ArrayList<Integer>> index = new ArrayList<ArrayList<Integer>>(); 
    HashSet<Character> targetChars = new HashSet<Character>();
    for(char c : T.toCharArray()) targetChars.add(c);

    //reduce initial sequence to only target chars and keep track of index
    //Note that the resultant string does not allow the same char to be consecutive

    StringBuilder filterS = new StringBuilder();
    for(int i = 0, s = 0 ; i < S.length() ; i++) {
        char c = S.charAt(i);
        if(targetChars.contains(c)) {
            if(s > 0 && filterS.charAt(s-1) == c) {
                index.get(s-1).add(i);
            } else {
                filterS.append(c);
                index.add(new ArrayList<Integer>());
                index.get(s).add(i);
                s++;
            }
        }
    }

    //Not necessary to use regex, loops are fine, but for readability sake
    String regex = "([abc])((?!\\1)[abc])((?!\\1)(?!\\2)[abc])";
    Matcher m = Pattern.compile(regex).matcher(filterS.toString());

    for(int i = 0, start = -1, p1, p2, tempMin, charSize = targetChars.size() ; m.find(i) ; i = start+1) {
        start = m.start();
        ArrayList<Integer> first = index.get(start);
        p1 = first.get(first.size()-1);
        p2 = index.get(start+charSize-1).get(0);
        tempMin = p2-p1;

        if(tempMin < min) {
            min = tempMin;
            index1 = p1;
            index2 = p2;
        }
    }

    return S.substring(index1, index2+1);   
}

我非常确定这个复杂度是O(N),如果我错了请纠正


1

这是@MattTimmermans提出的O(N)算法的另一种实现方式,它使用Map<Integer, Integer>来计数出现次数,并使用Set<Integer>来存储T中当前子字符串中存在的字符:

public static String completeSubstring(String s, String t) {
    Map<Integer, Integer> occ 
        = t.chars().boxed().collect(Collectors.toMap(c -> c, c -> 0));

    Set<Integer> found = new HashSet<>();      // characters from T found in current match
    int start = 0;                             // current match
    int bestStart = Integer.MIN_VALUE, bestEnd = -1;

    for (int i = 0; i < s.length(); i++) {
        int ci = s.charAt(i);                  // current char
        if (!occ.containsKey(ci))              // not from T
            continue;
        occ.put(ci, occ.get(ci) + 1);          // add occurrence
        found.add(ci);
        for (int j = start; j < i; j++) {      // try to reduce current match
            int cj = s.charAt(j);
            Integer c = occ.get(cj);
            if (c != null) { 
                if (c == 1) {                  // cannot reduce anymore
                    start = j;
                    break;
                } else 
                    occ.put(cj, c - 1);        // remove occurrence
            }
        }
        if (found.size() == occ.size()         // all chars found
            && (i - start < bestEnd - bestStart)) {
            bestStart = start;
            bestEnd = i;
        }
    }
    return bestStart < 0 ? null : s.substring(bestStart, bestEnd + 1); 
}

你能否简单解释一下你正在使用的Java8语法? - Zeus
这只是为了将T中的字符键设置为0值。 t.chars()返回一个int流(每个int表示T中的一个字符),.boxed()将其转换为Integer流,并且.toMap()中的两个参数都是函数,第一个将项映射到键(c -> c表示“返回项本身”),第二个将项映射到值(因此c -> 0表示“对于任何项返回0”)。 - Alex Salauyou

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接