Coding interview Essentials

42. Sort the chunks, sort the array.

Sort the chunks, sort the array.

Introduction

Sorting is a popular topic in computer science and programming interviews. Its usefulness is beyond dispute and there are countless research papers and algorithms devoted to the topic.

In this problem however, we are not going to devise a novel sorting algorithm. Instead we will investigate how we can sort an entire array by sorting a number of its sub-arrays. The idea is that we want to be able to split an array into pieces such that if each of the pieces is sorted individually then the final result is equivalent to having sorted the entire array.

It is necessary to uncover a key insights to solve this problem efficiently. As such, asking the right questions and looking at a certain number of good examples is fundamental. In the next section we will explore how these insights can be gained and then turned into efficient code.

Problem statement

[example:max_num_chunks_sorted:exercice1] Write a function that - given an array \(I\) of integers - returns the maximum number of sub-arrays (or chunks) of \(I\) such that , if each of the sub-array is sorted individually, then \(I\) as a whole is sorted.

[example:max_num_chunks_sorted:example1]
Given \(I=\{45,88,1,9,90\}\) then the function return \(1\).

[example:max_num_chunks_sorted:example2]
Given \(I=\{4,3,2,1,5,9,10\}\) then the function return \(4\). We can sort the following sub-arrays:

\([0,3]\)

\([4,4]\)

\([4,4]\)

\([4,4]\)

Clarification Questions

Can the chunks overlap?

No. If you choose to sort two sub-arrays of \(I\) \(s_1=[p,q], p\leq q\) and \(s_2=[x,y], x\leq y\) then either \(x > q\) or \(p>y\).

Discussion

Brute-force

Let’s start our discussion by considering a brute-force solution. One possible approach would be to try to divide the array into \(|I|\) non-empty and non-overlapping parts (only one way of performing such division), sort them individually and then check if \(I\) is sorted. If it is not, then we can try to divide \(I\) into \(|I|-1\) sub-arrays, and check whether by sorting the resulting individual pieces \(I\) turns out to be sorted. This line of reasoning can be generalized producing a general brute-force approach that works by progressively trying to split \(I\) into less and less numbers of sub-arrays \(k <|I|\). For each of the possible valid divisions of \(I\) into \(k\) sub-arrays, we can then check whether we can obtain a complete sorting of \(I\) by only sorting the individual \(k\) sub-arrays. Eventually when \(k=1\), \(I\) would be fully sorted as this is equivalent to sorting \(I\) entirely.

Clearly this algorithm is complete and correct as all possible valid partitions of \(I\) are checked. Its complexity is however exponential in time as, given a certain \(k\), there are \({n \choose k}\) possible ways we can divide \(I\) into \(k\) sub-arrays. Listing [list:max_num_chunks_sorted:bruteforce] shows a C++ implementation of such idea.

Listing 1: Bruteforce solution to the problem \textit{Sort the chunks, sort the array}.

void sort_subarrays(auto begin, auto end, std::vector<int>& subarrays)
{
  auto it = begin;
  for (const auto n : subarrays)
  {
    auto itn = begin + n + 1;
    std::sort(it, itn);
    it = itn;
  }
  std::sort(it, end);
}

bool check_all_k_combinations(const std::vector<int>& I,
                              const int offset,
                              const int k,
                              std::vector<int> combination)
{
  if (k == 0)
  {
    auto I_copy(I);
    sort_subarrays(std::begin(I_copy), std::end(I_copy), combination);
    return std::ranges::is_sorted(I_copy);
  }

  for (int i = offset; i < I.size() - k; ++i)
  {
    combination.push_back(i);
    if (check_all_k_combinations(I, i + 1, k - 1, combination))
      return true;
    combination.pop_back();
  }
  return false;
}

int max_chunks_to_sorted_bruteforce(const std::vector<int>& I)
{
  for (int k = std::ssize(I); k >= 2; k--)
  {
    std::vector<int> splitting_points{};
    if (check_all_k_combinations(I, 0, k - 1, splitting_points))
    {
      return k;
    }
  }
  return 1;
}

Linear time

This problem can be sorted in linear time if we consider that if we sort a sub-array of \(I\) containing elements from index \(s\) to \(r< |I|-1\) (there is at least an element after this sub-array) then \(I\) cannot be fully sorted if the maximum element among the elements \(I_s, I_{s+1},\ldots, I_r\) is smaller than any of the elements of \(I\) after \(r\) (or smaller than the smallest among those elements after \(r\)). For instance, imagine \(I\) is the input array of the example [example:max_num_chunks_sorted:example2]. If we choose \(s=0\) and \(r=2\) then \(I\) can never be properly sorted, no matter how we divide the elements after the one at index \(r=2\) into subarrays because the element at index \(3\) will always appear after the value \(4\) (see Figure 48.1).

Figure 1: Input array from the Example [example:max_num_chunks_sorted:example1]. If the sub-array identified by s and r is sorted in isolation, I as a whole cannot be sorted (no matter how elements from r to |I|-1 are sorted) because M will always appear before m, despite the fact M>m and it should appear after.

. [fig:max_num_chunks_sorted:example1]

This insight makes it possible to derive a greedy approach to this problem that is based on the idea that we are going to split \(I\) into as many pieces as possible, such that the largest element of a sub-array is smaller than all the subsequent elements.

Listing [list:max_num_chunks_sorted:linear] shows an implementation of this that works by keeping a sorted list \(N\) of all the elements not yet processed in \(I\). \(N\) initially contains all the values in \(I\). We start a new chunk at index \(0\) and we keep including elements into this chunk until the largest of its elements is still larger than the smallest element in \(N\). When an element is included in the chunk then it is removed from \(N\). If the largest element in the current chunk is indeed smaller than the smallest in \(N\), then this signals the fact we can sort this chunk independently without causing \(I\) as a whole not to be sorted (all the elements of this chunk appear before the rest of the elements when \(I\) is sorted). At this point, we can start a new chunk and repeat the process until we are left with no element to process (or equivalently \(N\) is empty). Listing [list:max_num_chunks_sorted:linear] has a complexity of \(O(|I|)\) for both space and time.

Listing 2: Linear time solution to the problem \textit{Sort the chunks, sort the array}.

int max_chunks_to_sorted_lineartime(std::vector<int>& arr)
{
  constexpr int INF = std::numeric_limits<int>::min();

  if (arr.size() <= 0)
    return 0;

  std::set<int> N(std::begin(arr), std::end(arr));
  int ans      = 0;
  int curr_max = INF;
  for (int i = 0; i < std::ssize(arr); i++)
  {
    N.erase(arr[i]);
    const int new_max               = std::max(curr_max, arr[i]);
    const auto& smallest_among_rest = *(N.begin());
    if (N.size() > 0 && new_max >= smallest_among_rest)
    {
      curr_max = new_max;
    }
    else
    {
      ans++;
      curr_max = INF;
    }
  }
  return ans;
}