从众多数组中查找最小和最大元素

Question

从众多数组中查找最小和最大元素

18

我在亚马逊面试中收到了一个问题，希望能得到解决方案的帮助。

给定N个大小为K的数组，每个数组的K个元素都是排序过的，并且这N个数组中的每个N*K个元素都是唯一的。从这N个数组中选择一个元素子集，从所选的N个元素中选择单个元素。减去最小和最大元素。此差应该是最小可能的最小值。

示例：

N=3, K=3

N=1 : 6, 16, 67
N=2 : 11,17,68
N=3 : 10, 15, 100

如果选择16、17和15，那么最小差为17-15=2。

- Manjunath Manoharan

7

你实际上并没有提出问题......我很惊讶每个人都假定你需要大O符号表示法。 - Russell Asher

2

三年过去了，我仍然不明白这个问题。 - Matias Cicero

1

我投票关闭此问题，因为这里没有代码可见，也没有具体的问题。 - Kaz

7个回答

3

以下是解决此问题的两个步骤算法：

第一步是将所有数组合并成一个排序数组，它看起来像这样：

combined_val[] - 包含所有数字 combined_ind[] - 包含此数字最初所属的数组的索引

这一步可以很容易地在O(K*N*log(N))中完成，但我认为你也可以做得更好（也许不行，你可以查找合并排序的变体，因为它们执行类似的步骤）。

现在是第二步：

直接放置代码比解释更容易，以下是伪代码：


int count[N] = { 0 }
int head = 0;
int diffcnt = 0;
// mindiff is initialized to overall maximum value - overall minimum value
int mindiff = combined_val[N * K - 1] - combined_val[0];
for (int i = 0; i < N * K; i++) 
{
  count[combined_ind[i]]++;

  if (count[combined_ind[i]] == 1) {
    // diffcnt counts how many arrays have at least one element between
    // indexes of "head" and "i". Once diffcnt reaches N it will stay N and
    // not increase anymore
    diffcnt++;
  } else {
    while (count[combined_ind[head]] > 1) {
      // We try to move head index as forward as possible while keeping diffcnt constant.
      // i.e. if count[combined_ind[head]] is 1, then if we would move head forward
      // diffcnt would decrease, that is something we dont want to do.
      count[combined_ind[head]]--;
      head++;
    }
  }

  if (diffcnt == N) {
    // i.e. we got at least one element from all arrays
    if (combined_val[i] - combined_val[head] < mindiff) {
      mindiff = combined_val[i] - combined_val[head];
      // if you want to save actual numbers too, you can save this (i.e. i and head
      // and then extract data from that)
    }
  }
}

结果在mindiff中。

第二步的运行时间为O(N * K)。这是因为"head"索引最多只会移动N*K次。所以内部循环不会使它变成二次方程，仍然是线性的。

因此，总算法运行时间为O(N * K * log(N))，但这是因为合并步骤，如果您能想出更好的合并步骤，可能可以将其降至O(N * K)。

- zviadm

嘿..你忘记添加mindiff初始化的片段了..我也觉得我理解了，但不是很清楚。你能否在代码中加上一个小注释，特别是在head++处？ - Manjunath Manoharan

mindiff被初始化为可能的最大值，即整体最大数减去整体最小数。该算法的主要目的是找到最小值和最大值。索引“i”指向最大值，索引“head”指向最小值。每次我们增加“i”，我们尝试将索引“head”尽可能地移动，同时仍保留每个数组至少一个元素。 - zviadm

这是一个杰出的解决方案。你关于我的算法错误的评论是正确的。我认为你无法找到比O（NKlog（N））更好的解决方案，至少在最坏的情况下不行，在特殊情况下K = 1，否则你将能够以比N * Log（N）更快的时间排序数组。 - Uri London

谢谢@Uri。你的观察在合并步骤方面是有道理的。因此，在最坏情况下，合并数组确实需要O（N * K * log（N））的时间复杂度。 - zviadm

-1 @zviadm 如果您合并所有数组，则可能会比较来自同一数组的元素，这是错误的，需要从三个数组中选择元素才能得出最小差异。（除非我误解了您的解决方案。） - Mehdi Karamosly

3

这个问题是为了管理人员而设计的

您有3名开发人员(N1)，3名测试人员(N2)和3名数据库管理员(N3)。选择最少分散的团队，以确保项目成功。

int[n] result;// where result[i] keeps the element from bucket N_i

int[n] latest;//where latest[i] keeps the latest element visited from bucket N_i

Iterate elements in (N_1 + N_2 + N_3) in sorted order
{
    Keep track of latest element visited from each bucket N_i by updating 'latest' array;

    if boundary(latest) < boundary(result)
    {
       result = latest;
    }
}

int boundary(int[] array)
{
   return Max(array) - Min(array);
}

- random

1

我有一个O(K*N*log(K))的算法，通常执行时间要短得多。目前想不到更好的方法了。首先我会解释一下比较容易描述但执行时间稍长的方法：

对于第一个数组中的每个元素f（循环K个元素）
对于每个数组，从第二个数组开始（循环N-1个数组）
在数组上进行二分查找，并找到最接近f的元素。这就是你要找的元素（Log(K)）

如果为每个数组添加一个新的Floor Index，则可以优化此算法。在执行二分搜索时，在“Floor”到“K-1”之间搜索。初始Floor索引为0，对于第一个元素，您需要搜索整个数组。一旦找到最接近'f'的元素，请使用该元素的索引更新Floor索引。最坏情况相同（如果第一个数组的最大元素小于任何其他最小元素，则Floor可能不会更新），但平均情况会有所改善。

- Uri London

嘿，不错的解决方案。我正在考虑使用最小堆和最大堆来实现一个解决方案... 你有什么建议吗？ - Manjunath Manoharan

1

这个算法不完整也不正确。假设你从N=1中选择了10。（第一个数组） N=2: 1 15 N=3: 5 16 你会选择15和5，这样你的差值是10，但如果你选择15和16，那么你的差值将是6。 - zviadm

@zviadm 看了这个问题我也是在大约10秒内想到了同样的解决方案，但立即开始怀疑它是否正确...不过，我认为它朝着正确的方向发展。 - Lorenzo Dematté

解决方案是正确的。只需找到迄今为止找到的最小元素的上限和下限，并保留更接近最小值的那个。 1.从第一个数组中取第一个元素，并将其视为最小值。 2.在第二个数组中查找最小值的上限和下限，并选择更接近最小值的那个。 3.更新最小值 4.搜索后续行。 5.对row1的所有元素执行此操作。 - Shivendra

1

接受答案的正确性证明（Terminal的解决方案）

假设该算法找到了一个序列A=<A[1]，A[2]，...，A[N]>，它不是最优解（R）。

考虑在R中的索引j，使得项R[j]是算法检查并用其下一项替换的第一项。

让A'表示该阶段的候选解（替换之前）。由于R[j]=A'[j]是A'的最小值，它也是R的最小值。现在，考虑R的最大值R[m]。如果A'[m]<R[m]，则可以通过用A'[m]替换R[m]来改进R，这与R是最优解的事实相矛盾。因此，A'[m]=R[m]。换句话说，R和A'具有相同的最大值和最小值，因此它们是等价的。这完成了证明：如果R是最优解，则保证算法能够找到与R一样好的解。

- Eyal Schneider

0

以下是我解决这个问题的逻辑，需要记住我们需要从每个N个数组中选择一个元素（以计算最小值）

// if we take the above values as an example!
// then the idea would be to sort all three arrays while keeping another
// array to keep the reference to their sets (1 or 2 or 3, could be 
// extended to n sets)      
1   3   2   3   1   2   1   2   3    // this is the array that holds the set index
6   10  11  15  16  17  67  68  100  // this is the sorted combined array.
           |           |   
    5            2          33       // this is the computed least minimum,
                                     // the rule is to make sure the indexes of the values 
                                     // we are comparing are different (to make sure we are 
// comparing elements from different sets), then for example
// the first element of that example is index:1|value:6 we hold 
// that value 6 (that is the value we will be using to compute the least minimum, 
// then we go to the edge of the comparison which would be the second different index, 
// we skip index:3|value:10 (we remove it from the array) we compare index:2|value:11 
// to index:1|value:6 we obtain 5 which would go to a variable named leastMinimum = 5, 
// now we remove the indexes and values we already used,
// and redo the same steps.

步骤1：

1   3   2   3   1   2   1   2   3
6   10  11  15  16  17  67  68  100
           |   
5            
leastMinumum = 5

步骤2：

3   1   2   1   2   3
15  16  17  67  68  100
           |   
 2          
leastMinimum = min(2, leastMinumum) // which is equal 2

步骤三：

1   2   3
67  68  100

    33
leastMinimum = min(33, leastMinumum) // which is equal to old leastMinumum which is 2

现在我们假设我们有来自同一个数组非常接近的元素（这次k=2，这意味着我们只有3组包含两个值）。

// After sorting the n arrays we will have the below indexes array and values array
1   1   2   3   2   3
6   7   8   12  15  16
*       *   *

* we skip second index of 1|7 and we take the least minimum of 1|6 and 3|12 (index:2|value:8 will be removed as it is not at the edges, we pick the minimum and maximum of the unique index subset of n elements)
1   3         
6   12
 =6
* second step we remove the values we already used, so the array become like below:

1   2   3
7   15  16
*   *   * 
7 - 16
= 9

注意： 另一种消耗更多内存的方法是创建N个子数组，从中比较最大值和最小值

因此，从下面排序后的值数组及其对应的索引数组中，我们提取另外三个子数组：

1   3   2   3   1   2   1   2   3
6   10  11  15  16  17  67  68  100

第一个数组：

1   3   2 
6   10  11

11-6 = 5

第二个数组：

3   1   2
15  15  17

17-15 = 2

第三个数组：

1   2   3
67  68  100

100 - 67 = 33

- Mehdi Karamosly

0

对于第一个数组中的每个元素

    choose the element in 2nd array that is closest to the element in 1st array
    current_array = 2;
    do
    {
        choose the element in current_array+1 that is closest to the element in current_array
        current_array++;
    } while(current_array < n);

复杂度：O(k^2*n)

- user2120127

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Terminal · Accepted Answer

我可以想到一个O(N*K*N)（在zivo的正确指出后进行了编辑，现在不是一个好的解决方案:(）的解决方案。
1. 最初将N个指针分别指向每个数组的初始元素。

6, 16, 67
^ 
11,17,68
^
10, 15, 100
^

2. 找出当前指针 O(k) 中最高和最低的元素（6 和 11），并计算它们之间的差值（5）。
3. 将指向该数组中最低元素的指针加 1。

 6, 16, 67
    ^ 
 11,17,68
 ^
 10, 15, 100 (difference:5)
 ^

4. 不断重复步骤2和3，并存储最小差异。

 6, 16, 67
    ^ 
 11,17,68
 ^
 10,15,100 (difference:5)
    ^ 


 6, 16, 67
    ^ 
 11,17,68
    ^
 10,15,100 (difference:2)
    ^

上述将是所需的解决方案。

 6, 16, 67
    ^ 
 11,17,68
    ^
 10,15,100 (difference:84)
       ^ 

 6, 16, 67
        ^ 
 11,17,68
    ^
 10,15,100 (difference:83)
       ^

等等......

编辑：

可以通过使用堆（如Uri所建议的）来减少其复杂性。我想到了这个方法，但遇到了一个问题：每次从堆中提取一个元素时，都必须找出它在数组中的编号，以便增加该数组对应的指针。一种有效的查找数组编号的方法肯定可以将复杂度降至O(K*N log(K*N))。一种朴素的方法是使用这样的数据结构

Struct
{
    int element;
    int arraynumer;
}

并且重构类似的初始数据

 6|0,16|0,67|0

 11|1,17|1,68|1

 10|2,15|2,100|2

最初，将第一列的当前最大值保留，并将指向的元素插入堆中。现在每次提取一个元素时，可以找到它所在的数组编号，在该数组中，指针会增加，新指向的元素可以与当前最大值进行比较，并相应地调整最大指针。