Coding interview Essentials

44. Find the largest gap

Find the largest gap

Introduction

This chapter discusses another sorting problem. The statement is quite simple and the only input given is an unsorted array from which we are asked to calculate a value that would be simple to find if the input was sorted. Therefore, the real challenge of this problem is to find a solution that does not require explicit sorting.

Particular attention should be paid to the examples as well as to the problem statement because it is easy to misinterpret the real requirements of the function you are asked to write if you dive straight into coding. The problem asks you to return the largest distance between any element in the input array provided they appear one next to the other when \(I\) is sorted. You might misinterpret the problem by thinking that you need to return the largest distance between any two elements of the original input array but this is incorrect. This should be obvious if we consider that the we are only being asked to find the minimum and the maximum values of the input array. You can expect any coding interview question to be harder than that. An imaginative effort (or some pen and paper work) is therefore necessary to understand each of the examples provided.

Problem statement

[example:max_gap:exercice1] Write a function that given a unsorted array of non-negative integers \(I\) of length \(n\) returns the largest gap between two elements that would appear one next to the other if \(I\) was sorted. A gap between \(x\) and \(y\) is defined as the absolute value of the difference between \(x\) and \(y\): \(|x-y|\).

[example:max_gap:example1]
Given \(I = \{5,3,1,8,9,2,4\}\) the function returns \(3\). Sorting \(I\) changes it into: \(sort(I)= \{1,2,3,4,5,8,9\}\), and the largest gap between any two consecutive elements is \(3\). In this case between \(5\) and \(8\).

[example:max_gap:example2]
Given \(I = \{7, 1, 8, 9,15\}\) the function returns \(6\). \(sort(I)= \{1,7,8,9,15\}\), and the largest gap between any two of its consecutive elements is \(6\) e.g. between \(1\) and \(7\) or between \(15\) and \(9\).

Clarification Questions

Is the input \(I\) modifiable?

Yes you can modify \(I\).

Is there any guarantee or constraint on the value of each element of \(I\)?

You can assume each element of \(I\) fits in a 4 bytes unsigned integer.

Trivial Solution

As already discussed, this problem has an extremely simple solution when we can afford to get our hands on a sorted version of the input array. In this specific version of the problem \(I\) is not read-only and we are allowed to modify it,therefore, we can sort it directly. If that is not possible all you have to do is to create a copy of \(I\) and sort that instead.

Given a sorted collection, the largest gap between any two numbers can be found in linear time by just scanning each pair \(p=(I_k, I_{k+1)}\) of subsequent elements and for each of them calculating their distance \(d_k=I_{k-1}-I_k\). Among all calculated distances we simply return the largest. Listing [list:max_gap:bruteforce] shows an implementation of this idea. Note that:

  • we do not need to use the absolute value operation as we are operating on a sorted collection and therefore we are guaranteed that \(I_{k+1}\) is larger than or equal to \(I_k\).

  • the loop stops when \(i=|I|-1\) in order to avoid accessing an invalid element while executing . When \(i=|I|-1\) this would lead to accessing the element at index \(|I|\), which does not exist. In C++ this would cause undefined behaviour and the most likely outcome would be a segmentation fault error.

Listing 1: Trivial solution to the max gap problem using sorting and linear space.

int max_gap_bruteforce(const std::vector<int>& I)
{
  auto I_copy(I);
  std::ranges::sort(I_copy);

  int ans = std::numeric_limits<int>::min();
  for (int i = 0; i < std::ssize(I) - 1; i++)
  {
    ans = std::max(ans, I_copy[i + 1] - I_copy[i]);
  }
  return ans;
}

Radix Sort

The idea we developed in Section 50.3 can be improved if instead of using a normal comparison-based sorting algorithms we use radix-sort[@cit::wiki::radix_sort]. Radix sort will perform better than a standard \(O(nlogn)\) algorithm when there is an upper bound for the values of the input array. If we assume that such bound is the largest value a standard 4 bytes can hold then radix sort will have a complexity of \(O(n)\).

Radix sort works by sorting the input array \(d\) times, where \(d = \lfloor log_{10 \rfloor k+1}\) and \(k\) is the largest number in \(I\). \(d\) is just the number of digits of the largest number in the list. For a standard \(d=\lfloor log_{10 \rfloor(2147483647) + 1}=10\). The sorting is obtained by repeatedly sorting the input list from the least to the most significant digit where each of the intermediate sorting steps is performed using counting-sort. For instance given \(I = \{329,457,657,839,436,720,355\}\) the first pass of radix sort will sort \(I\) based on the value of the least significant digits. After this first pass we have \(I=\{720,355,436,457,657,329,839\}\). Note how the first digits are sorted. At this point the algorithm proceeds by sorting \(I\) further but this time according to their second digit. The resulting list becomes \(I=\{720,329,436,839,355,457,657\}\). Finally the third pass will sort all the elements according tp the the most significant digit resulting in a well sorted list: \(I=\{329,355,436,457,657,720,839\}\). This approach needs to be tweaked if you want to apply radix sort to negative numbers[^44].

Listing [list:max_gap:radixsort] shows an implementation of radix-sort and its application to this chapter’s problem.

Listing 2: Linear time solution to the max gap problem using radix-sort.

void count_sort(vector<int>& I, const unsigned base, const unsigned digit_idx)
{
  std::vector<std::vector<int>> counters(base);
  for (const auto& el : I)
  {
    // get the digit_idx th digit
    const auto digit_value = el / (std::pow(base, digit_idx)) % base;
    // add this number to the corrensponding bucket
    counters[digit_value].push_back(el);
  }

  int pos = 0;
  for (const auto& list : counters)
  {
    for (const auto& num : list)
    {
      I[pos++] = num;
    }
  }
}

void radix_sort(vector<int>& I, const unsigned base = 10)
{
  for (unsigned digit = 0; digit < base; digit++)
    count_sort(I, base, digit);
}

int max_gap_radix_sort(const std::vector<int>& I)
{
  auto I_copy(I);
  radix_sort(I_copy);

  int ans = std::numeric_limits<int>::min();
  for (int i = 0; i < std::ssize(I) - 1; i++)
  {
    ans = std::max(ans, I_copy[i + 1] - I_copy[i]);
  }
  return ans;
}

Notice how the main driver function is basically the same as from Listing [list:max_gap:bruteforce] except for the sorting procedure used.

Buckets and the pigeonhole principle

All the solutions we have presented so far rely on sorting. In this section we will discuss an approach that relies on the pigeonhole principle[^45] and bucketing. The general idea is that we could split the entire array into several buckets and then find the largest gap by only comparing one element of a given bucket to one element of the subsequent bucket.

You can think of \(I\) as an array of buckets each containing a single element. \(|I|\) is made of \(b\) buckets and the total amount of elements in \(I\) is \(n=b\). Imagine for a moment that you would reduce the number of buckets from \(b\) to some integer \(k < b\). For the pigeonhole principle then, one or more of the buckets in \(I\) has to contain strictly more than one element.

Let’s now focus for a moment on the gaps of an ideal collection where each of its elements has the same distance \(t\) from its successor in the list. If such a collection is composed of \(n\) elements, then there is a total of \(n-1\) gaps, each of size \(t\). \(t\) can be easily calculated if the maximum and minimum values are known - in fact \(t=\frac{max-min}{n-1}\). For instance for the collection of five elements \(\{4,8,12,16,20\}\) we have \(t=\frac{20-4}{4}\) and for the \(\{-2,5,12,19\}\) we have \(t=\frac{19- (-2)}{3} = 7\). If \(I\) was like this ideal collection then the problem would be easily solvable by using the formula above.

\(I\) is different to this ideal collection because its elements do not have uniform gaps between them. In this situation we can argue that the maximum gap between any pair of subsequent elements of \(I\) is always larger than \(t=\frac{max-min}{n-1}\). We can show this by taking an ideal collection \(C\) and trying to reduce the gap between any two subsequent elements \(C_{i}\) and \(C_{i+1}\). We do that by moving \(I_{i+1}\) closer to \(I_{i}\). When this happens the gap \((I_{i+1}-I_{i})\) becomes smaller than \(t\). So far it all looks promising but what happens to the gap between \(I_{i+1}\) and \(I_{i+2}\)? It actually becomes larger than \(t\). In our effort to make the largest gap among two subsequent elements smaller, we obtained the opposite result, we made it larger! Figure 50.1 shows an example of such a scenario where we have a collection of uniformly separated elements where \(t = 2\). When the third element is moved by one toward the second, you see that the gap between them is reduced from \(2\) to \(1\) but the gap between the third and the fourth element increases from \(2\) to \(3\), and now the largest gap between any two pairs of subsequent element is no more \(2\) but \(3\) which is larger than what we had to begin with. This shows that the maximum attainable gap can in a collection with uniform gaps can only increase.

Figure 1: Example of how trying to reduce the gap between pairs of subsequent element of a uniformly separated collections makes the largest gap larger

. [fig:max_gap:move_t]

We are going to apply the two ideas above to solve this problem in the following way. We will distribute all the elements of \(I\) into \(n-1\) buckets. The first bucket will contain all the elements of \(I\) in the following range: \([min, min + t)\). Similarly the second bucket will contain all the elements in the following range: \([min + t, min + 2t)\). In general the \(i^{th}\) bucket would contain all the elements of \(I\) in the following range: \([min + (i-1)t, min+it)\) (where \(1 \leq i\)). You can refer to Figure 50.2 for an example of how such a division into buckets would work for the input array in Example [example:max_gap:example2]. This allows us to skip comparing all the element within a bucket because we know for sure they will have a distance that is lower than or equal to \(t\) and we can therefore concentrate on comparing elements of subsequent buckets. In particular we should compare the maximum value of the \(i^{th}\) bucket with the minimum value of the \((i+1)^{th}\) bucket. This is because they would appear one next to the other if \(I\) was sorted. As the number of buckets is always lower or equal to the number of elements in the collection, this approach has a linear time complexity as it requires comparing the number of input elements in \(I\) twice at most. The space complexity is also linear as the number of buckets can be proportional to \(|I|\).

Figure 2: Division of the elements of the Example [example:max_gap:example2] into buckets.

. [fig:max_gap:bucketing_example2]

Listing [list:max_gap:buckets] shows an implementation of this idea where we use the to model a bucket for which we only need to store three pieces of information:

if the buckets contains at least one elements,

its minimum

and the maximum value

. If \(|I| < 2\) we can immediately return \(0\) as there are no possible pairs to calculate the gap for. Otherwise we proceed by calculating \(t\) and the number of buckets we need. The first loop takes care of filling each of the buckets an element belongs to. Note that we can calculate such an index for an element \(el\) by using the following expression: \(\frac{el- \min(I)}{t}\). Once all the buckets are initialized, we can proceed further by calculating the largest gap between them, ensuring we don’t consider empty buckets which are ignored during the second loop. We proceed by considering the max element of the first bucket with the minimum element of the next non-empty bucket \(j > 0\). Once the gap between them is calculated, we can move on to calculating the gap between the next pair of subsequent buckets which will be made of the bucket at index \(j\) and the first non-empty bucket having index larger than \(j\). This process is repeated until all pairs of buckets are processed.

Listing 3: Linear time solution to the max gap problem using bucketing.

struct Bucket
{
  bool used  = false;
  int minval = std::numeric_limits<int>::max();
  int maxval = std::numeric_limits<int>::min();
};

int max_gap_buckets(const std::vector<int>& I)
{
  if (I.size() < 2)
    return 0;

  const auto [minEl, maxEl] = [&I]() {
    const auto p = std::minmax_element(I.begin(), I.end());
    return std::make_tuple(*p.first, *p.second);
  }();

  const int t = std::max(
      1l, (maxEl - minEl) / (std::ssize(I) - 1));  // bucket size or capacity
  const size_t num_buckets = ((maxEl - minEl) / t) + 1;  // number of buckets
  std::vector<Bucket> buckets(num_buckets);

  for (const auto& el : I)
  {
    const size_t bucketIdx  = (el - minEl) / t;  // bucket idx for this element
    buckets[bucketIdx].used = true;
    buckets[bucketIdx].minval = std::min(el, buckets[bucketIdx].minval);
    buckets[bucketIdx].maxval = std::max(el, buckets[bucketIdx].maxval);
  }

  int prevBucketMax = minEl, ans = 0;
  for (auto&& bucket : buckets)
  {
    if (!bucket.used)  // skip empty buckets
      continue;

    ans           = std::max(ans, bucket.minval - prevBucketMax);
    prevBucketMax = bucket.maxval;
  }

  return ans;
}