Coding interview Essentials

43. Palindrome Partitioning

Palindrome Partitioning

Introduction

In this chapter we will investigate a problem involving strings. It features a short yet complex statement which requires some care to internalize and understand fully. This is another problem on palindromes where we are asked to calculate the cost of breaking down an input string into chunks such that each of the individual chunks is a palindrome.

Problem statement

Write a function that - given a string \(s\) - partitions it in such a way that every resulting substring is a palindrome. A partition for a string \(s\) is a collection of cut-points \(1 \leq c_0 < c_1 \ldots < c_k < |s|\) splitting the string \(s\) into \(k+1\) non empty substrings:

  • \(s(0 \ldots c_0)\)

  • \(s(c_0+1 \ldots c_1)\)

  • \(s(k-1 \ldots c_k)\)

The function should return the minimum number of cuts needed so that the resulting partition consists only of palindrome substrings.


Given s= the function returns \(1\). \(0\) cut-points are not enough as \(s\) itself is not a palindrome but with one cutpoint at index \(1\) we can obtain the following partitioning \(["aa","b"]\) where both \(aa\) and \(b\) are palindromes.


Given s= the function returns \(0\) because \(s\) is itself a palindrome.


Given s= the function returns \(3\). One possible partition that could be produced with \(3\) cuts is: \(["a","babbbab","b","ababa"]\).

Discussion

Brute-force

The obvious solution would be to try to all possible partitions of the input string, from the ones splitting it into \(1\) piece, then all the ones splitting it into \(2\) pieces, and so on in a similar fashion until we eventually find a partition that splits \(s\) into palindromes. Such a partition must exist as if we split \(s\) into \(|s|\) pieces, down to its individual characters, the resulting substrings of length one are all palindromes. This approach is basically the same adopted for the brute-force (see Section 42.3.1) solution of the problem discussed in Chapter 42 where the bulk of the complexity is in the generation of the partitions of incremental size. In order to do that, we could use the algorithm for the generation of all combinations of size \(k\) shown in Listing [list:min_difficulty_job_scheduler:combinations] to generate all possible cut-points and from there get the associated sub-strings. For each partition size \(l = 1,2,\ldots,|s|\) we can use Listing [list:min_difficulty_job_scheduler:combinations] to generate the combination of \(\{1,2,\ldots,|s|-1\}\) in groups of size \(l\) and for each of them evaluate whether the resulting substrings are all palindromes. We can return \(l\) as soon as we find a combination which does.

Listing [list:min_difficulty_job_scheduler:combinations] shows an implementation of this idea which has a time and space complexity of \(O(2^{|s|}\). The work done is the sum of all the work necessary to generate the combinations of sizes \(1,2,\ldots,|s-1|\) i.e. \(\sum_{k=1}^{|s|-1} {|s| \choose k} = 2^n\). The union of all combinations of size \(k=1,2,\ldots,|s|\) is equivalent to the power-set (see Section 1 at page ) which has size \(2^n\).

Listing 1: Exponential time solution to the palindrome partition problem using Listing \ref{list:min_difficulty_job_scheduler:combinations} at page as a sub-routine for the generation of the combinations of size $k$.

#include "generate_combinations.h"

/**
 * Returns
 *   true iff the substring of s starting at start and ending at end is
 * palindrome false otherwise.
 * */
bool is_palindrome(const std::string& s, const size_t start, const size_t end)
{
  assert(start <= end);
  assert(end < s.length());

  auto l = start, r = end;
  while (l < r)
    if (s[l++] != s[r--])
      return false;
  return true;
}

int palindrome_partitioning2_bruteforce(const std::string s)
{
  if (is_palindrome(s, 0, s.size() - 1))
    return 0;

  for (int k = 1; k <= std::ssize(s) - 1; k++)
  {
    // generate combinations in groups of k from [0...s.size()-2]
    const auto& cutpoints = all_combinations(k, s.size());
    // is there a partition of size k such that all the associated substrings in
    // s are palindrome?
    const auto found = std::any_of(
        std::begin(cutpoints), std::end(cutpoints), [&](const auto& combo) {
          auto substring_start = 0;
          for (const auto substring_end : combo)
          {
            if (!is_palindrome(s, substring_start, substring_end))
              return false;
            substring_start = substring_end + 1;
          }
          return is_palindrome(s, substring_start, s.size() - 1);
        });
    if (found)
      return k;
  }
  return s.size() - 1;
}

Dynamic Programming

This problem has however a solution that is much better than exponential time. As the title of this section suggests, we can use DP to effectively tackle it. The key insight that allows us to develop an effective DP solution is to think about how we can break down the problem into subproblems on a suffix of \(s\). What I mean is that we can think about a function \(P(s, i)\) which returns the answer to the problem for the substring of \(s\) from the character at index \(i\) to its end. Clearly \(P(s,0)\) is the answer to the general question. However given this formulation we can express the solution of a subproblem for a starting index \(i\) in terms of optimal solutions for the smaller sub-problems at indices \(j>i\). Equation [eq:palindrome_partitioning2:dpformula] captures this idea in a recurrence relation. It states that the answer to a subproblem for an empty substring of \(s\) is zero: no cuts are necessary as an empty string is already palindrome. If the whole string from index \(i\) to the end is a palindrome, then no cuts are necessary. For any other sub-problems \(P(s,i)\) what we can do is to make a cut at an index \(k\geq i\) provided that the substring of \(s\) from index \(i\) to \(k\) resulting from the cut is a palindrome and then add to it the answer to the subproblem \(P(s,k+1)\) which gives the optimal solution for the unprocessed part of \(s\): from index \(k+1\) to the end.

\[P(s, i) = \begin{cases} 0 \; \; \text{ if } i \geq |s| \\ 0 \; \; \text{ if } s[i\ldots |s|-1] \:\: \text{is palindrome} \\ \min_{k\geq |s|} \big( 1 + P(s,k+1) \big) \: \: \text{if} \: \: s[i\ldots k] \:\: \text{is palindrome} \end{cases} \label{eq:palindrome_partitioning2:dpformula}\]

Top-down

The solution outlined in Section 49.3 and formalized in Equation [eq:palindrome_partitioning2:dpformula] can be easily translated into a recursive solution as shown in Listing [list:palindrome_partitioning2:topdown]. The recursive function has an almost 1-to-1 mapping to the Equation [eq:palindrome_partitioning2:dpformula] except for the code responsible for the memoization optimization which allows the answer for a given subproblem that has been previously solved to be returned immediately. We have used this optimization already in other problems such as:

in Section 45 at page or

in Section 42 at page

.

Listing 2: Quadratic time dynamic programming top-down solution to the palindrome partition problem.

using Cache = std::unordered_map<int, int>;

int palindrome_partitioning2_DP_topdown_helper(const std::string s,
                                               const int start_idx,
                                               Cache& memoization_cache)
{
  if (start_idx >= std::ssize(s)
      || is_palindrome(s, start_idx, std::ssize(s) - 1))
    return 0;

  if (memoization_cache.contains(start_idx))
    return memoization_cache[start_idx];

  int ans = std::numeric_limits<int>::max();
  for (int i = start_idx; i < std::ssize(s); i++)
  {
    if (is_palindrome(s, start_idx, i))
      ans = std::min(ans,
                     1
                         + palindrome_partitioning2_DP_topdown_helper(
                             s, i + 1, memoization_cache));
  }

  assert(ans <= std::ssize(s) - start_idx);
  memoization_cache[start_idx] = ans;
  return ans;
}

size_t palindrome_partitioning2_DP_topdown(const std::string s)
{
  Cache memoization_cache;
  return palindrome_partitioning2_DP_topdown_helper(s, 0, memoization_cache);
}

The complexity of this solution is \(O(|s|^3)\) because each of the \(O(|s|)\) distinct (for which we might have to execute the whole function code) calls to and each performs \(O(|s|^2)\) work: the for loops runs \(O(|s|)\) times, and the function has a complexity of \(O(|s|)\)

Top-down improved

The complexity of code in Listing [list:palindrome_partitioning2:topdown] can be lowered to \(O(|s|^2)\) if we are able to answer the question about whether a given substring of \(s\) from index \(i\) to \(j>i\) is a palindrome or not. The current implementation blindly processes the whole substring to find that information. What we can do instead is to use DP again to build a table \(B\) where each of its elements \(B[i][j]\) contains the information about whether the substring of \(s\) from index \(i\) to \(j>i\) is a palindrome. The key idea here is that we can build such a table in \(O(|s|^2)\) time and store it using \(O(|s|^2)\) space.

A palindrome is a word having the same first and last character, for which the substring obtained by removing the first and the last character is itself a palindrome. We are going to use this property to build \(B\) which is reflected by the fact that an entry of \(B\), \(B[i][j]\) contains true if and only if \(s[i]=s[j]\) and \(B[i+1][j-1]\) is true. There are certain cells of \(B\) that we can fill immediately: for instance all the cells where \(i=j\) can be set to true as those map to sub-strings of \(s\) of length \(1\) which are palindromes by definition. We can fill the table by using a recursive function and memoization as shown in Listing [list:palindrome_partitioning2:recursivecache]. This shows an implementation of a class which will provide the same information in the table \(B\) but wrapped in a class with a simple API: a constructor taking a as input and a function for answering queries on sub-strings of \(s\). Note that the constructor will immediately call the function which will fill the table \(B\) fully before any call to . With minimal change we can also make the class , by removing the function completely via implementing in terms of . The file"hash_pair.h"[^43] contains some code whose only purpose is to allows us to use as keys in the . The same functionality can be also implemented iteratively by using a bottom-up DP strategy. Listing [list:palindrome_partitioning2:iterativecache] illustrates how this can be done.

Listing 3: Recursive implementation of a class which allows to answer queries about whether a given substring of a given string is palindrome or not in constant time.

#include "hash_pair.h"

class PalindromeSubstringCacheRecursive
{
  public:
    PalindromeSubstringCacheRecursive(const std::string& s) :mStr_size(s.size()) {
      buildMap(s);
    }

    [[nodiscard]] bool is_palindrome(const size_t start, const size_t end) const
    {
      const std::pair<size_t,size_t> p(start, end);
      return mB.contains(p) &&  mB.at(p);
    }

    [[nodiscard]] size_t size()const {
      return mStr_size;
    }

  private:
    bool is_palindrome_substring_helper(const string&s, const size_t start, const size_t end)
    {
        if(start > end || start >= s.size() || end < 0)
            return true;

        const std::pair<int,int> p(start, end);
        if(mB.contains(p))
            return mB.at(p);
        
        const bool ans = (start == end) || (
            (s[start]==s[end]) && 
                is_palindrome_substring_helper(s,start+1, end-1)
                );
        mB.insert({p, ans});
        return ans;
    }

    void buildMap(const std::string&s)
    {
        for(size_t i = 0 ; i < std::size(s) ; i++)
        {
            for (size_t j = i; j < std::size(s); j++)
            {
                mB[{i,j}] = is_palindrome_substring_helper(s,i,j);
            }
        }
    }

  std::unordered_map<std::pair<int,int>, bool, PairHasher> mB;
  const size_t mStr_size;

};

Listing 4: Iterative implementation of a class which allows to answer queries about whether a given substring of a given string is palindrome or not in constant time.

class PalindromeSubstringCacheIterative
{
  public:
    PalindromeSubstringCacheIterative(const std::string& s) :mStr_size(s.size()) {
      buildMap(s);
    }

    [[nodiscard]] bool is_palindrome(const size_t start, const size_t end) const
    {

      return start < mStr_size && end>=0 && 
          mB[start][end]!=-1 &&  mB[start][end];
    }

    [[nodiscard]] size_t size() const {
      return mStr_size;
    }

  private:

    void buildMap(const std::string&s)
    {
        mB.resize(mStr_size, std::vector<int>(mStr_size,-1));
        for(int i = mStr_size-1; i >=0 ; i--)
        {
            for(int j = i ; j < mStr_size ; j++ )
            {
                mB[i][j]=(s[i]==s[j]) && ((j-i<=2) || mB[i+1][j-1]);
                //mB[{i,j}]=s[i]==s[j] && ((j-i<=2) || mB[{i+1,j-1}]);
            }
        }
    }

  std::vector<vector<int>> mB;
  const size_t mStr_size;

};

Finally in Listing [list:palindrome_partitioning2;topdownquadratic] we can see how such a Substring Palindrome Cache can be used in order to implement a quadratic solution for the problem. Notice that the code for is almost identical to the one of in Listing [list:palindrome_partitioning2:topdown] with the difference being that the former takes \(B\), the substring palindrome cache, as an additional parameter and that the call to is substituted with a query into \(B\) which runs in constant time. The complexity of this solution is now \(O(|s|^2)\) which is a big improvement from \(O(|s|^3)\). The space complexity is also \(O(|s|^2)\), because of the space used by \(B\).

~~~{#lst:palindrome_partitioning2;topdownquadratic .cpp caption=“Quadratic time dynamic programming bottom-up solution to the palindrome partition problem.”} #include “PalindromeSubstringCacheRecursive.h”

size_t palindrome_partitioning2_DP_topdown_optimized_helper( const PalindromeSubstringCacheRecursive& B, const size_t start_idx, Cache& memoization_cache) { const auto size = B.size(); if (start_idx >= size || B.is_palindrome(start_idx, size - 1)) return 0;

if (memoization_cache.contains(start_idx)) return memoization_cache[start_idx];

// cout<<start_idx<<std::endl; size_t ans = std::numeric_limits::max(); for (size_t i = start_idx; i < size; i++) { if (B.is_palindrome(start_idx, i)) // O(1) ans = std::min(ans, 1 + palindrome_partitioning2_DP_topdown_optimized_helper( B, i + 1, memoization_cache)); }

assert(ans <= size - start_idx); memoization_cache[start_idx] = ans; return ans; }

size_t palindrome_partitioning2_DP_topdown_optimized(const std::string s) { PalindromeSubstringCacheRecursive B(s); Cache memoization_cache; return palindrome_partitioning2_DP_topdown_optimized_helper( B, 0u, memoization_cache); } ~~~

Bottom-up

In this section we will discuss how we can implement the DP approach shown in Section 49.3 in a bottom-up fashion.

The idea is that we can start processing progressively longer portions of \(s\) starting from the last character at index \(|s|-1\). Each of these portions, starting at index \(i\) to the end of \(s\), correspond to a sub-problem that can be uniquely identified by its starting index \(i\). For instance subproblem where \(i=3\) corresponds to a substring of \(s\) from index \(3\) to its end. When solving a given sub-problem \(i\), we will use the information about sub-problems related to smaller portions of \(s\) starting at higher indices \(j > i\) to determine the minimum number of cuts necessary to split the substring \(s[i \ldots |s|-1]\) into several palindromes.

The substring of \(s\) starting at index \(|s|-1\) has size \(1\) and therefore is already a palindrome and does not need any cuts. For any other substring starting at index \(i = |s|-1-k\) where \(k \geq 0\), we have two options:

  1. if \(s[i \ldots |s|-1]\) is already a palindrome, then we know this subproblem has a solution equal to \(0\). No cuts are necessary.

  2. otherwise, we can try to split \(s[i \ldots |s|-1]\) at index \(i+1 \leq j \leq |s|-1\). If \(s[i \ldots j]\) is a palindrome, then we know we can turn \(s[i \ldots |s|-1]\) into a partition of palindromes by using one cut (the one we just performed at index \(j\)) plus all the cuts necessary to turn the substring \(s[j \ldots |s|-1]\) into a partition of palindromes. The crucial point is that we have already solved the sub-problem \(j > i\) and therefore we can reuse its solution. The final answer for this sub-problem starting at index \(i\) is the smallest value we can obtain among all the cuts (at index \(j > i\)) we can make.

The sub-problem at index \(0\) contains the answer for the entire problem i.e. the smallest size of a palindrome partition of \(s\) starting at index \(0\). Listing [list:palindrome_partitioning2:bottomup] implements this idea where we use an array having size \(|s|+1\) to store the answers to all sub-problems.

Listing 5: Quadratic time dynamic programming bottom-up solution to the palindrome partition problem.


int palindrome_partitioning2_DP_bottomup(const std::string s)
{
  std::vector<int> DP(s.size() + 1, s.size() - 1);
  DP[s.size() - 1] = DP[s.size()] = 0;
  for (int i = std::ssize(s) - 2; i >= 0; i--)
  {
    if (is_palindrome(s, i, s.size() - 1))
    {
      DP[i] = 0;
      continue;
    }
    for (int j = i; j < std::ssize(s); j++)
    {
      if (is_palindrome(s, i, j))
      {
        const auto cost_partition_rest = j < std::ssize(s) - 1 ? DP[j + 1] : 0;
        DP[i] = std::min(DP[i], 1 + cost_partition_rest);
      }
    }
  }
  return DP[0];
}