Coding interview Essentials

31. Find the \(K\) closest elements

Find the \(K\) closest elements

Introduction

In this chapter we will discuss a problem that asks you to return a subset of a given input array. We will investigate two solutions: one that is based on sorting and the other on binary search with the latter being more efficient as we will make good use of the fact that the input is provided already sorted. By contrast, the solution based on sorting appears almost trivial to and we can derive it directly from the problem statement while the solution based on binary search requires slightly more insight and typing to get right. We will present two different implementations of the binary search solution:

the first based entirely the C++ STL,

and the other where we will code the binary search algorithm explicitly.

Problem statement

Write a function that takes as input

a sorted array \(I\) and two integers

\(k\) and

\(x\),

and returns an array, sorted in ascending order, containing the \(k\) elements that are closest to \(x\) in \(I\). Note that: given two elements \(y\), and \(z\), \(y\) is closer to \(x\) than \(z\) if: \[|x-y| < |x-z| \label{eq:kclosest_in_array:sort_criteria}\]


Given

\(I = \{1,2,3,4,5\}\),

\(k=4\) and

\(x=3\),

the function returns: \(\{2,3,4,5\}\)


Given

\(I = \{1,2,3,4,5\}\),

\(k=4\) and

\(x=-1\),

the function returns: \(\{1,2,3,4\}\)


Given

\(I = \{12,16,26,30,35,39,42,46,48,50,53,55,56\}\),

\(k=5\) and

\(x=36\),

the function returns: \(\{26,30,35,39,42\}\)

Clarification Questions

What should the function behavior be when resolving ties? What should it do when you have two elements that are at the same distance from \(x\)?

The function should always favor the smaller element in case of a tie.

Is I guaranteed to be sorted in ascending order?

Yes you can assume \(I\) is always sorted in ascending order.

Sorting

A solution that almost immediately follows from the problem statement is based on sorting the elements of \(I\) according to the criteria shown in Equation [eq:kclosest_in_array:sort_criteria]. The idea is that if \(I\) is sorted according to the absolute value of the difference between each number of \(I\) and \(x\) then the closest number to \(x\) will be located after the sorting at the front of \(I\). All that is necessary at that point is to copy the first \(K\) element of \(I\) into the return array. Listing [list:find_k_closest_in_array1] shows a possible implementation of this idea.

Listing 1: Solution to the problem of finding the $k$ closest element using sorting.

std::vector<int> kth_closest_in_array_sorting(vector<int>& I,
                                              const int k,
                                              const int x)
{
  assert(I.size() >= k);
  std::sort(begin(I), end(I), [x](const auto y, const auto z) {
    return std::abs(x - y) < std::abs(x - z);
  });

  std::vector<int> ans{begin(I), begin(I) + k};
  std::sort(begin(ans), end(ans));
  return ans;
}

Please note that - as in all cases where you actually do not need to have the whole array sorted - you can use partial sorting instead of fully-fledged sorting. In all cases where \(k\) is smaller that \(n\) the complexity is going to be slightly better as we will go from the \(O(nlog(n))\) of the normal sorting to \(O(nlog(k))\) of the partial sort. Fortunately, making this change in C++ is easily as it is only a matter of calling instead of as shown in Listing [list:find_k_closest_in_array2].

Listing 2: Solution to the problem of finding the $k$ closest element using sorting.

std::vector<int> kth_closest_in_array_partial_sorting(vector<int>& I,
                                                      const int k,
                                                      const int x)
{
  assert(I.size() >= k);
  std::partial_sort(
      begin(I), begin(I) + k, end(I), [x](const auto y, const auto z) {
        return std::abs(x - y) < std::abs(x - z);
      });

  std::vector<int> ans{begin(I), begin(I) + k};
  std::sort(begin(ans), end(ans));
  return ans;
}

The problem description clearly states that the input array is sorted but the solution we devised in Section 35.3 does not take advantage of that fact. All it does is invalidate the original ordering in order to enforce a different one. Every time the problem statement mentions that some input is sorted, we should think how to use that to devise a more efficient solution rather than questioning whether that information is useful or not. It is always wise to assume that there will be no useless information in the problem statement. When sorted input is involved, there are several algorithms that should immediately come to mind and, out of this set, binary search is probably first. All that is left is to ask ourselves how binary search could be applied to this problem?

But first, let’s take a step back and try to analyze the problem for a slightly different angle. Specifically, let’s discuss the case where \(x \in \: I\). In this case we know for sure that \(x\) is going to be part of the output vector. As the input is sorted we can use binary search to search for \(x\) in \(I\). Once we have identified the index \(j\) such that \(I_j = x\) we know that the closest element to \(x\) must either be at index \(j+1\) or \(j-1\) [^33]. Therefore once \(x\) has been identified we can select a range of \(k\) elements at \(j\). Said range can be found by using a two pointers technique. We start by initializing two pointers \(l = j\) and \(r = j\). Then until \(r-l+1 < k\) we do one of the following operations:

  • if \(l = 0\) then \(r = r+1\)

  • if \(r = |I|\) then \(l = l-1\)

  • if\(x-I_{l-1} > I_{r+1}-x\) then \(r = r+1\). The range is enlarged at its right end side.

  • symmetrically for the left side: if \(x-I_{l-1} > I_{r+1}-x\) then \(l + l+1\). The range is enlarged at its left end side.

In other words once \(x\) has been found, we incrementally include elements around it by always choosing between the closest numbers to \(x\) between the numbers pointed by the two pointers.

This approach can easily be extended to the case where \(x\) is not present in the input array as it also works when we try to build the range of elements to be returned around the closest element to \(x\) in the array. Turns out that binary search can be used to find such an element (we even have STL support for such an operation). In particular we can use it to identify the index of the first element that is larger or equal than \(x\): a value that is commonly known as lower bound. Armed with this information let’s have a look at Listing [list:find_k_closest_in_array3] showing the implementation of the idea above where we use the STL function to find the index \(o\) of the first element greater or equal than \(x\). We then compare such value with the value at index \(p = o-1\) (if it exists) and we promote \(o\) or \(p\) to be the index closest element to \(x\) in the array depending on their absolute difference to \(x\). The closest of the two to \(x\) is chosen to be the designated starting value for the algorithm described above.

Listing 3: Solution to the problem of finding the $k$ closest element using \inline{std::lower_bound}.

template <typename Iterator>
auto find_range(
    Iterator begin, Iterator end, Iterator closest, int k, const int x)
{
  assert(begin < end);
  assert(closest >= begin);
  assert(closest < end);

  auto l          = closest;
  auto r          = l + 1;
  auto difference = [](const auto x, const auto y) { return std::abs(x - y); };

  k--;  // closest is already included in the range
  while (k && l > begin && r < end)
  {
    if (difference(*(l - 1), x) <= difference(*r, x))
    {
      l--;
    }
    else
    {
      r++;
    }
    k--;
  }
  while (k && l > begin)
  {
    l--;
    k--;
  }
  while (k && r < end)
  {
    r++;
    k--;
  }
  assert(k == 0);

  return std::make_tuple(l, r);
}
std::vector<int> kth_closest_in_array_binary_search_lower_bound(
    const vector<int>& I, const int k, const int x)
{
  auto closest = std::lower_bound(begin(I), end(I), x);
  if (auto prec = std::prev(closest);
      closest != begin(I) && closest != end(I)
      && (std::abs(*closest - x) >= std::abs(*prec - x)))
  {
    closest = prec;
  }
  // if no element is larger than x
  // then the closest to it is the largest in I
  if (closest == end(I))
    closest = std::prev(end(I));

  auto [l, r] = find_range(begin(I), end(I), closest, k, x);
  return std::vector<int>(l, r);
}

For completeness, in Listing [list:find_k_closest_in_array:binary_lower_bound] we also show an implementation of version of . You might be asked to show you can code binary search.

Listing 4: Implementation of a function for the calculation of the \textit{lower_bound} that can be using in substitution of \inline{std::lower_bound} in Listing \ref{list:find_k_closest_in_array2}.

template <typename It, typename Target = typename It::value_type>
It my_lower_bound(const It& begin, const It& end, const Target& target)
{
  auto l = begin;
  auto r = end;
  while (l < r)
  {
    auto mid = l + std::distance(l, r) / 2;

    if (*mid < target)
    {
      l = mid + 1;
    }
    else
    {
      r = mid;
    }
  }
  return l;
}