Coding interview Essentials

25. Validate Parenthesized String

Validate Parenthesized String

Introduction

Analyzing strings is an important operation for computer languges and it lies at the heart of programming languages. For example a calculator would look at an input such as 33 * 4 + (125̂ - 22*sqrt(2)) and before proceeding in performing the calculation would check that the input string forms an allowable expression.

In the problem discussed in this chapter we will study how we can write an efficient parser for a simple grammar on an alphabet consisting only three charaters describing a particular kind of well parenthesized strings. What is cool about this probelem is that the techniques as well as the structure of the solutions presented here can be adapted and exploited for other string analysis problems.

Problem statement

Given a string s containing only three types of characters:

  1. (

  2. )

  3. *

write a function to check whether a string is valid. A string is valid if the following holds:

  • Any left parenthesis ( must have a corresponding right parenthesis ).

  • Any right parenthesis ) must have a corresponding left parenthesis ).

  • Left parenthesis ( must appear before the corresponding right parenthesis ).

  • The character * could be treated as a jolly, and can be modified into a single right parenthesis ) or a single left parenthesis ) or deleted.


Given the input string s= the function returns true because it is possible to obtain from s the string (()) by deleting the first * and by turning the second one into a left parenthesis ).


Given the input string s= the function returns false because no matter how the are arranged there is no way to obtain a well balanced string of parenthesis.

Clarification Questions

Is an empty string considered valid?

An empty string is also valid.

Discussion

This is an extremely interesting and challenging problem that can be solved in several ways. We start with the brute-force solution in Section 29.3.1 from which we can develop a more effective dynamic programming solution. Section 29.3.3 examines a different solution based on a greedy technique that can dramatically improve the time and space complexity compared to the previous approaches. Section [valid_parenthesis:sec:twostacks] presents a linear time and space clever solution based on stacks.

The solution shown in Section 29.3.3 is likely to most effective and therefore we advise to us this as as the reference point during an actual interview.

Brute-force

If the input string does not contains wild-cards, this problem is quite simple and is easily solvable using a stack. When wild-cards are present things are complicated because now for each of them there are three options. In the brute-force approach we will try all possible options for all wild-cards. The idea is that the input string \(s\) is traversed from left to right. As we traverse the string we will keep track of how many open and closed parenthesis we have encountered. We do this because if at any moment we find that the number of closed parenthesis is greater than the number of open ones, the string is invalid (it violates the constraint that any left parenthesis should appear before any right one). Depending on the character \(c\) we are processing:

  1. If \(c\)is a ( then we increase the number of open parenthesis found so far and we recursively check the rest of the string.

  2. Similarly, if \(c\) is a ) then we increase the number of closed parenthesis and proceed checking the rest of the string.

  3. If the current character is a * then we have the option to:

    • consider it as an open parenthesis

    • consider it as a closed parenthesis

    • ignore it

The recursion terminates when either:

  • the number of closed parenthesis is larger than the number of open ones

  • we have processed the whole string. In this case we return true only if the number of open parenthesis so far is equal to the closed ones (a necessary condition for a well balanced string).

Listing [list:valid_parenthesis:bruteforce] shows a possible recursive implementation of the idea above. The complexity of this approach is exponential in the number of *, i.e. \(O(3^{n})\), where \(n\) is the length of \(s\).

Listing 1: Brute-force, exponential time solution to the problem of validating a string of parenthesis with wild-cards.

bool validate_parenthesized_string_bruteforce_helper(std::string s,
                                                     const size_t pos,
                                                     const int open,
                                                     const int closed)
{
  if (pos == s.size())
    return open == closed;

  if (closed > open)
    return false;

  const char curr = s[pos];
  bool ans        = false;
  if (curr != '{')  // either } or *: add a right parenthesis
    ans = validate_parenthesized_string_bruteforce_helper(
        s, pos + 1, open, closed + 1);

  if (curr != '}' && !ans)  // either {} or *: add a left parenthesis
    ans = validate_parenthesized_string_bruteforce_helper(
        s, pos + 1, open + 1, closed);

  if (curr == '*' && !ans)  // if neither { nor } worked, then ignore this *
    ans = validate_parenthesized_string_bruteforce_helper(
        s, pos + 1, open, closed);

  return ans;
}

bool validate_parenthesized_string_bruteforce(std::string s)
{
  return validate_parenthesized_string_bruteforce_helper(s, 0, 0, 0);
}

Dynamic Programming

Another way of solving this problem is to still try all possibilities as in the brute-force solution, but to streamline that process by ensuring no work is done more than once. In a string of length \(n\) there are \(O(n^2)\) possible substrings. Given a substring starting at \(i\) and ending at \(j\), from now on identified by \(s(i,j)\) we can solve this problem by processing one character \(c\) at the time. A substring \(s(i,j)\) is valid when

  • if \(c\) is an * and \(s(i+1,j)\) is valid. We try to ignore the character \(c\).

  • if \(c\) is either * or (, then we search for a character \(k\) in \(s(i+1,k)\) s.t. it can be turned into a closing parenthesis. If \(k\) exists (in case multiple \(k\) exists, then we try all of them) then \(s(i,j)\) is valid if \(s(i+1,k-1)\) and \(s(k+1,j)\) are valid. What we are doing here is matching an open parenthesis with a closing one that appears further in the range. Remember that each open parenthesis must be paired up with a closing one.

  • if \(c\) ) we return false because in this case we have an unmatched closing parenthesis.

The result for a substring \(s(i,j)\) is saved in a map, and when the algorithm asks for the validation of the same substring again, the value returned in the map is returned instead of doing the computation again. This technique is called memoization, and allows us to avoid the repeated re-computation of the same subproblem. The approach described in this section can be implemented as shown in Listing [list:valid_parenthesis:dp] and it has a time complexity of \(O(n^3)\). There are \(O(n^2)\) possible substrings and for each of them a work proportional to \(n\) is performed. The space complexity is bound by the amount of substrings that we can potentially store in the Hash-set i.e. \(O(n^2)\).

Note that this solution is, at it’s core, just another way of solving this problem by brute-force. We just use Dynamic Programming techniques to speed it up by remembering the result of intermediate subproblems (in this case the substring of the input string \(s\)) thereby avoiding repetitive work. As such, although the DP solution is definitely better than simple brute-force, it is still far from optimal. In the next section we will investigate a much faster solution that has the additional benefit of being also much shorter in length and therefore less error prone than the ones presented so far.

Listing 2: Dynamic programming solution to the problem of validating a string of parenthesis with wild-cards.

// a pair of indices identifying a substring
using pii = std::pair<int, int>;

struct pair_hash
{
  template <class T1, class T2>
  std::size_t operator()(const std::pair<T1, T2>& pair) const
  {
    return std::hash<T1>()(pair.first) ^ std::hash<T2>()(pair.second);
  }
};

bool validate_parenthesized_string_DP_helper(
    const std::string& s,
    std::unordered_map<pii, bool, pair_hash>& DP,
    const pii& substr)
{
  const auto [i, j] = substr;
  if (i > j)
  {
    // empty string is valid
    return true;
  }

  if (DP.find(substr) != DP.end())
    return DP[substr];

  bool ans     = false;
  const char c = s[i];

  if (c == ')')
    ans = false;

  if (!ans && c == '*')  // try ignoring this character
    ans = validate_parenthesized_string_DP_helper(s, DP, {i + 1, j});

  if (!ans && c != ')')  // either * or open brackets. Try turning it into a (
  {
    // find a something that can be turned into a ) further ahead in the
    // string
    for (int k = i + 1; !ans && k <= j; k++)
    {
      if (s[k] == ')' || s[k] == '*')
      {
        // validate the two resulting substring from pairing char i and k
        ans = validate_parenthesized_string_DP_helper(s, DP, {i + 1, k - 1})
              && (validate_parenthesized_string_DP_helper(s, DP, {k + 1, j}));
      }
    }
  }

  DP[substr] = ans;
  return ans;
}

bool validate_parenthesized_string_DP(const std::string& s)
{
  std::unordered_map<pii, bool, pair_hash> DP;
  const int size = s.size() - 1;
  return validate_parenthesized_string_DP_helper(s, DP, {0, size});
}

Also note that the structure is necessary so that the knows how to correctly calculate a hash value for s.

Greedy - Linear time

The two previous approaches do much more than the problem asks for. They not only determine whether the input string is valid or not; they also calculate one. They solve the problem by manufacturing a valid string by trying out all possibilities until either one or none of them is good. Are more efficient approach would determine whether the input string can be turned into a good one without having to actually come up with a specific valid string obtainable from the input. This is the rationale behind the solution presented in this section.

The main idea is that for a valid string it has to be true that the balance between open \(o\) and closed \(c\) parenthesis is perfect i.e. \(o-c=0\) has to hold. \(o-c=0\) represents the number of open unmatched open parenthesis. Because the can be either a or a we cannot simply loop through the string and count the number of open and closed parenthesis. What we can do is to store all possible values for \(o-c\) parenthesis that can be obtained. As we will see, these values change in a fairly predictably way depending on what character we process. For instance given the input string when processing the \(i^{th}\) character, the number of \(c-o\) possible values \(P\) can be:

  1. \(P=\{1\}\)

  2. \(P=\{0,1,2\}\) because considering only the first two character of \(s\) we can obtain

    • by turning the into a .

    • by deleting the .

    • by turning the into a .

  3. \(P=\{1,2,3\}\) because considering only the first three character of \(s\) we can obtain

    • by turning the into a

    • by deleting the .

    • by turning the into a

  4. \(P=\{0,1,2\}\) because considering only the first four character of \(s\) we can obtain

    • by turning the into a

    • by deleting the

    • by turning the into a .

  5. \(P=\{-1,0,1,2,3\}\) because considering only the first four character of \(s\) we can obtain

    • by turning the first into a and deleting the second one.

    • by deleting both the

    • by turning both the into a .

    • finally we can obtain \(-1\) with this string by turning both the into a .

Note that the values in the list \(P\) can be obtained with different combinations of substitutions and that \(P\) is always made of a contiguous element. This last piece of information is important because it allows us to describe \(P\) by only using its maximum and minimum value. We are interested in seeing whether at the end of the process we can obtain a value of \(0\) meaning that the string is balanced. If the maximum value at any point goes under \(0\) it means that we reached a place where we have an excess of closed parenthesis that we cannot fix using all the asterisks we encountered so far. For instance consider the string . When processing the element number the max and min values for the difference between the open and closed parenthesis will be:

  1. \((1,1)\)

  2. \((0,2)\)

  3. \((-1,1)\)

  4. \((-2,0)\)

  5. \((-2,-1)\)

When we reach the character the maximum number of open parenthesis we can obtain in the best case is \(-1\), meaning that we are short by one for a balanced string. The string is, therefore, invalid because there is an excess of closed parenthesis. This is a violation of the rule stating that every closed parenthesis must have a proceeding open one.

We can use the idea described above to derive an algorithm that works as follows: Let , respectively be the smallest and largest possible number of open left brackets after processing the \(i^{th}\) character in the string \(s\).

If we encounter:

  • a left parenthesis , then we can increment both and .

  • similarly, a right parenthesis , then we can decrement both and .

  • an asterisk we can choose to either consider this as an open parenthesis (increasing the max number of obtainable open ones), delete it (leaving the balance unvaried) or convert it into a closed parenthesis (reducing )

If at any point the maximum number of open parenthesis falls under \(0\) then we are forced to consider the string invalid, for the reason we pointed out earlier (at least an unmatched closed parenthesis). Similarly we need to make sure that does not go below zero, because we do not need to consider strings which have this count of open parenthesis as they would invalid for the same reason as above.

When all the characters are processed, the only thing that is left to check is that \(0\) is contained in the range defined by and . If it is, then there is a way to turn the string into a valid one, otherwise it is impossible (imagine the case where meaning that no matter what we do, the minimum amount of open parenthesis we end up with is still more than one meaning that there is at least one unmatched open parenthesis).

Listing [list:valid_parenthesis:linear] shows a possible implementation of this idea. Note how every decrement of is guarded by a check, so to avoid that it goes below \(0\).

Listing 3: Linear time constant space solution to the problem of validating a string of parenthesis with wild-cards.

bool validate_parenthesized_string_linear(std::string s)
{
  int min, max;
  min = max = 0;
  for (const char c : s)
  {
    if (c == '(')
    {
      min++;
      max++;
    }
    if (c == ')')
    {
      if (min > 0)
        min--;
      max--;
    }
    if (c == '*')
    {
      if (min > 0)
        min--;
      max++;
    }
    if (max < 0)
      return false;
  }
  return min == 0;
}