如何使用二分查找在已排序的数组中查找重复项？

Question

如何使用二分查找在已排序的数组中查找重复项？

3

我想扩展一个通过二分查找算法来查找整数匹配数量的函数，通过重置高变量，但是它会陷入循环。我猜测解决方法可能是复制此函数以获取最后一个索引来确定匹配数量，但我认为这不是一种优雅的解决方案。

从这个开始：

public static Matches findMatches(int[] values, int query) {
    int firstMatchIndex = -1;
    int lastMatchIndex = -1;
    int numberOfMatches = 0;

    int low = 0;
    int mid = 0;
    int high = values[values.length - 1];
    boolean searchFirst = false;

    while (low <= high){
        mid = (low + high)/2;

        if (values[mid] == query && firstMatchIndex == -1){
            firstMatchIndex = mid;

            if (searchFirst){
                high = mid - 1;
                searchFirst = false;
            } else { 
                low = mid + 1;
            }

        } else if (query < values[mid]){
            high = mid - 1;
        } else {
            low = mid + 1;
        }           
    }

    if (firstMatchIndex != -1) { // First match index is set
        return new Matches(firstMatchIndex, numberOfMatches);
    }
    else { // First match index is not set
        return new Matches(-1, 0); 
    }
}

转化为类似以下的形式：

像这样：

public static Matches findMatches(int[] values, int query) {
    int firstMatchIndex = -1;
    int lastMatchIndex = -1;
    int numberOfMatches = 0;

    int low = 0;
    int mid = 0;
    int high = values[values.length - 1];
    boolean searchFirst = false;

    while (low <= high){
        mid = (low + high)/2;

        if (values[mid] == query && firstMatchIndex == -1){
            firstMatchIndex = mid;

            if (searchFirst){
                high = values[values.length - 1]; // This is stuck in a loop
                searchFirst = false;
            } 
        } else if (values[mid] == query && lastMatchIndex == -1){
            lastMatchIndex = mid;

            if (!searchFirst){
                high = mid - 1;
            } else { 
                low = mid + 1;
            }
        } else if (query < values[mid]){
            high = mid - 1;
        } else {
            low = mid + 1;
        }

    }

    if (firstMatchIndex != -1) { // First match index is set
        return new Matches(firstMatchIndex, numberOfMatches);
    }
    else { // First match index is not set
        return new Matches(-1, 0); 
    }
}

- imparante

1

使用二分查找来查找给定数字的索引怎么样？假设如果未找到该值，则返回-1，您可以使用该索引来查找重复项的数量。例如，当搜索数字“9”时，二分搜索返回索引5，因此我将在索引5的左侧和右侧搜索重复项，并在没有更多重复项时停止。匹配数将是rightIndex - leftIndex + 1，因为值数组已排序。 - almightyGOSU

5个回答

0

你不能使用类似于 set 的东西来查找重复项吗？

像这样：

package example;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

public class DuplicatesExample {

    public static void main(String[] args) {
        String[] strings = { "one", "two", "two", "three", "four", "five", "six", "six" };
        List<String> dups = getDups(strings);
        System.out.println("DUPLICATES:");
        for(String str : dups) {
            System.out.println("\t" + str);
        }
    }

    private static List<String> getDups(String[] strings) {
        ArrayList<String> rtn = new ArrayList<String>();
        HashSet<String> set = new HashSet<>();
        for (String str : strings) {
            boolean added = set.add(str);
            if (added == false ) {
                rtn.add(str);
            }
        }
        return rtn;
    }

}

输出：

DUPLICATES:
    two
    six

- John

（您还可以从getDups方法返回一个集合，以获取不同的重复项） - John

0

我已将您的问题分成两个部分 - 使用二分查找来查找数字和计算匹配数量。第一部分由搜索函数解决，而第二部分由findMatches函数解决：

public static Matches findMatches(int[] values, int query) {

    int leftIndex = -1;
    int rightIndex = -1;
    int high = values.length - 1;

    int matchedIndex = search(values, 0, high, query);

    //if at least one match
    if (matchedIndex != -1) {

        //decrement upper bound of left array
        int leftHigh = matchedIndex - 1;
        //increment lower bound of right array
        int rightLow = matchedIndex + 1;

        //loop until no more duplicates in left array
        while (true) {

            int leftMatchedIndex = search(values, 0, leftHigh, query);

            //if duplicate found
            if (leftMatchedIndex != -1) {
                leftIndex = leftMatchedIndex;
                //decrement upper bound of left array
                leftHigh = leftMatchedIndex - 1;
            } else {
                break;
            }
        }

        //loop until no more duplicates in right array
        while(true){
            int rightMatchedIndex = search(values, rightLow, high, query);

            //if duplicate found
            if(rightMatchedIndex != -1){
                rightIndex = rightMatchedIndex;
                //increment lower bound of right array
                rightLow = rightMatchedIndex + 1;
            } else{
                break;
            }

        }

        return new Matches(matchedIndex, rightIndex - leftIndex + 1);

    }

    return new Matches(-1, 0);

}

private static int search(int[] values, int low, int high, int query) {

    while (low <= high) {
        int mid = (low + high) / 2;

        if (values[mid] == query) {
            return mid;
        } else if (query < values[mid]) {
            high = mid - 1;
        } else {
            low = mid + 1;
        }
    }

    return -1;

}

- PythaLye

@Gosu，我是否正确实现了你的算法？它似乎很复杂，可能还有改进的空间。 - PythaLye

0

在修正了重置高变量导致无限循环的错误后，我找到了一个解决方案。

public static Matches findMatches(int[] values, int query) {
    int firstMatchIndex = -1;
    int lastMatchIndex = -1;
    int numberOfMatches = 0;

    int low = 0;
    int mid = 0;
    int high = values.length - 1;

    while (low <= high){
        mid = (low + high)/2;

        if (values[mid] == query && firstMatchIndex == -1){

            firstMatchIndex = mid;
            numberOfMatches++;
            high = values.length - 1;
            low = mid;

        } else if (values[mid] == query && (lastMatchIndex == -1 || lastMatchIndex != -1)){

            lastMatchIndex = mid;
            numberOfMatches++;

            if (query < values[mid]){
                high = mid - 1;
            } else { 
                low = mid + 1;
            }

        } else if (query < values[mid]){
            high = mid - 1;
        } else {
            low = mid + 1;
        }
    }

    if (firstMatchIndex != -1) { // First match index is set
        return new Matches(firstMatchIndex, numberOfMatches);
    }
    else { // First match index is not set
        return new Matches(-1, 0); 
    }
}

- imparante

该程序的输出结果不正确。条件“（lastMatchIndex == -1 || lastMatchIndex != -1）”始终被评估为真。@jruser2120512 - PythaLye

@jruser2120512，你的代码没有显示正确的输出，我的朋友。 - Sumeet

@Dante 在该行代码中，当找到第一个匹配索引（firstMatchIndex）后，会在循环中检查lastMatchIndex。我传入以下参数： int[] values = {0, 1, 2, 3, 4, 4, 5, 6, 7, 8, 8, 8, 9}; // 预排序的整数数组。 int query = 8; // 要搜索的整数。我期望得到索引 9 和 3 的匹配项，而我的代码也确实返回了这些结果。 - imparante

1

@jruser2120512，尝试使用查询5测试{1,2,3,3,3,4,5,5,5,5,6,7,7}，但结果仍然不正确。 - Sumeet

@ Dante 我现在明白了。我继续使用二分查找的方法行不通。你用线性搜索的解决方案似乎是最好的！ - imparante

0

如果没有预先排序的数据知识，那么很难进行操作。看这个：二分查找 O(log n) 算法在顺序列表中查找重复项？

这将在已排序数组中找到 k 的重复项的第一个索引。当然，这与首先知道重复值有关，但在已知情况下非常有用。

    public static int searchFirstIndexOfK(int[] A, int k) {

     int left = 0, right = A.length - 1, result = -1;
     // [left : right] is the candidate set.
     while (left <= right) {
       int mid = left + ((right - left) >>> 1); // left + right >>> 1;
       if (A[mid] > k) {
         right = mid - 1;
       } else if (A[mid] == k) {
         result = mid;
         right = mid - 1; // Nothing to the right of mid can be
                                               // solution.
      } else { // A[mid] < k
      left = mid + 1;
      }
     }
     return result;
    }

这将在log(n)时间内找到重复项，但它很脆弱，因为数据必须按1递增排序，并且在范围1..n内。

static int findeDupe(int[] array) {
int low = 0;
int high = array.length - 1;
while (low <= high) {
    int mid = (low + high) >>> 1;
    if (array[mid] == mid) {
    low = mid + 1;

    } else {
    high = mid - 1;

    }

}
System.out.println("returning" + high);
return high;

}

- Droid Teahouse

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sumeet · Accepted Answer

您的代码存在问题：

high = values[values.length - 1];

应该是

high = values.length - 1;

您不需要像numberOfMatches和searchFirst这样的变量，我们可以有一个相对简单的解决方案。

现在来看问题，我了解您想要什么，我认为二分查找适合这种查询。

做所需的最佳方法是一旦找到匹配项，您只需从该索引向前和向后移动，直到出现不匹配，这将既优雅又高效地计算出firstMatchIndex和numberOfMatches。

因此，您的函数应如下所示：

public static Matches findMatches(int[] values, int query) 
{
 int firstMatchIndex = -1,lastMatchIndex=-1;
 int low = 0,mid = 0,high = values.length - 1;
 while (low <= high)
 {
      mid = (low + high)/2;

      if(values[mid]==query)
      {
          lastMatchIndex=mid;
          firstMatchIndex=mid;
          while(lastMatchIndex+1<values.length&&values[lastMatchIndex+1]==query)
           lastMatchIndex++;
          while(firstMatchIndex-1>=0&&values[firstMatchIndex-1]==query)
           firstMatchIndex--; 
          return new Matches(firstMatchIndex,lastMatchIndex-firstMatchIndex+1); 
      }
      else if(values[mid]>query)
       high=mid-1;
      else low=mid+1;
 }
 return new Matches(-1,0);
}