Coding interview Essentials

48. Decode the message

Decode the message

Introduction

The problem in this chapter resembles the one for decoding a string encoded with the famous run-length encoding method(RLE). RLE is a simple form of data compression in which a stream of data is given (e.g. ) and the output is a sequence of counts of consecutive data values in a row. (e.g. ). It is a type of lossless encoding meaning that the encoding process does not lose any information in the original input and therefore the input data can be recovered fully and integrally decoded.

This chapter will deal with a similar algorithm where we will be asked to write a function capable of decoding a string encoded with a run-length-encoding-like algorithm. More than complicated insights, string manipulation skills and attention to details of the implementation and for corner cases are going to be crucial in order to solve this problem during an actual interview.

Problem statement

[example:decode_string:exercice1] Write a function that given an encoded string \(s\) decodes it. \(s\) is of the form: \({k_1[d_1]k_2[d_2[ \ldots]}\) where \(k\) is a positive integer and \(s\) is another encoded string. The decoded version of \(s\) is obtained by appending \(d_1\) \(k_1\) followed by repeating \(d_2\) \(k_2\) times.

[example:decode_string:example1]
Given the function returns .

[example:decode_string:example2]
Given the function returns .


Given the function returns .

[ex:decode_string:example3]

Clarification Questions

Is it guaranteed that \(s\) is always valid?

Yes, \(s\) contains only lower-case letters from the English alphabet, numbers and square brackets and it is a valid encoded string.

Is there an upper-bound on the size of the encoded-substrings (the \(k\)s in the problem statement)?

Not really, you are only guaranteed their value to fit into a built-in .

Discussion

Recursive solution

The first thing to note about this problem is that the encoded string has a recursive definition. Whenever we encounter a number \(k\) followed by the closed square bracket character we know that we have to decode whatever is inside the brackets and replicate it \(k\) times. We can use this fact to simply create a recursive algorithm which follows this definition. The real challenge of this problem actually lies in the implementation more than in the algorithm itself and in our opinion specifically in the correct parsing of the string.

The idea is to construct the final answer by looking at one character of \(s\) at a time. We can start from the char at index \(0\) and depending on what it is:

  1. append it to the answer (when \(s\) an alphabetic letter);

  2. parse it as a part of a number (when \(s\) is a digit);

  3. recursively decode the rest of the string (when \(s\) is an open square bracket ).

For instance, let’s assume we have to decode . We start by reading ’x’ and ’y’ which are letters and therefore are just appended to the final answer. We then see a digit which tells us that a number has started. We parse all of its digits into the integer \(242\) (which assumes we need to perform a conversion from string to int; see Chapter 7 at page where we delved into it for a refresher). The end of the number is signaled by the presence of the char which also signals that a new encoded substring is starting. So when we see an open square bracket character we recursively call the decode function so that it returns the expansion of whatever is within the brackets. When the recursive call ends (when we find a closed square bracket character ) we are left with the expanded string which we can then replicate \(242\) times and append to the final answer. When a recursive call ends the caller must continue processing the elements of \(s\) from the last unprocessed character. We keep track of the next element to be processed via a integer variable which is passed along to each of the recursive calls. This is necessary because after the recursive call returns we might need to continue processing more characters. Regarding the example above, when the recursive call associated with the substring returns we still have to process for the encoded substring substring .

Listing [list:decode_string:recursive] shows a possible implementation of this idea.

Listing 1: Recursive implementation of the algorithm described in Section \ref{decode_string:sec:recursive}

std::string decode_string_recursive_helper(const std::string& s, std::size_t& i)
{
  const auto size = s.size();
  std::string ans;
  int multiplier = 0;
  while (i < size)
  {
    const auto curr_char = s[i];
    if (std::isdigit(s[i]))
    {
      // parse the whole number
      while (i < size && std::isdigit(s[i]))
      {
        multiplier *= 10;
        multiplier += s[i] - '0';
        i++;
      }
    }
    else if (s[i] == '[')
    {
      const std::string nested = decode_string_recursive_helper(s, ++i);
      for (int k = 0; k < multiplier; k++)
        ans += nested;
      // no increment of i here.
    }
    else if (s[i] == ']')
    {
      i++;
      break;
    }
    else
    {
      ans += s[i];
      i++;
    }
  }
  return ans;
}

std::string decode_string_recursive(const std::string& s)
{
  std::size_t pos = 0;
  return decode_string_recursive_helper(s, pos);
}

Notice that the function takes the second parameter as a reference to an integer. As mentioned above already, we use a reference because we want this number to be updated by each of the recursive calls and this way we do not lose track of what portion of the input is still to be processed. The variable \(i\) to keep track of the current character we are examining in \(s\). Figure 54.1 graphically depicts and describes the execution of this algorithm on the input string .

The time and complexities of the code in Listing [list:decode_string:recursive] is \(O(K|s|)\) where \(k\) is the largest replication factor we can have.

Figure 1: Execution of the algorithm in Listing [list:decode_string:recursive] for the input string . The execution starts (Figure (a)) with a first call to with . The first two characters are letters and therefore they are appended to the instance of bounded to this recursive call (Figures (a) and (b)). Characters at indices 2 and 3 are numbers and they are parsed and saved into the integer (Figure (c)). The next character is going to be the open square bracket at index 4 and this will cause a new recursive call to happen with i=5 (Figure (d)). The process repeats now and we see at index 5 a letter that we append to the instance of bound to this call (Figures (e)). We then parse the number 12 at characters 6 and 7 (Figure (f)) and subsequently at index 8 we find an open square bracket that leads to a new recursive call, this time starting from index i=9 (Figure (g)). Index 9 holds a letter which is appended to the (empty) instance of bound to this call (Figure (h)). The next character is a closed square bracket which mean we can terminate the recursive call and return to the caller (which now holds ). We are not back (Figure (k)) to the second recursive call where, as you remember, m is twelve. Therefore we replicate 12 times the string we received from the recursive call number three and append the result to the current instance of . We also set m to zero. Because the next character is again a closed bracket we return the the caller and repeat the process (Figure (i)).m=23 and therefore we replicate from the second recursive call 23 times. The rest of the characters left are the ones at indices 11 and 12 which are simple letters and are just appended to .

Iterative solution

The same idea can of course be implemented iteratively. The trick is to simulate the call stack of the recursive approach in Section 54.2.1 by using an explicit stack. This stack will contains two pieces of information

the replication factor associated with an encoded substring (the value \(1\) is the default)

the decoded substring

.

Initially the stack contains only one entry: . As in the recursive approach, we process \(s\) one character at a time and all the operations are performed on the top of the stack unless we encounter either a:

  • which signals we need to add another element to the stack and start decoding a substring of \(s\).

  • which signals that we are done with decoding the substring and we can then replicate it as many time as is necessary, remove the entry from the top of the stack and append the replicated string to the string associated with the new top of the stack.

At the end of this process we are left with the fully decoded string at the top of the stack.

Listing [list:decode_string:iterative] implements this idea.

Listing 2: Iterative solution using a \inline{std::stack}.

std::string decode_string(const std::string& s)
{
  std::stack<std::pair<int, std::string>> stack;
  stack.push({1, std::string()});
  size_t i  = 0;
  int rep_f = 0;
  while (i < s.size())
  {
    if (std::isdigit(s[i]))
    {
      rep_f *= 10;
      rep_f += (s[i] - '0');
    }
    else if (s[i] >= 'a' && s[i] <= 'z')
    {
      stack.top().second.push_back(s[i]);
    }
    else if (s[i] == ']')
    {
      const auto [rep, str] = stack.top();
      stack.pop();
      for (int i = 0; i < rep; i++)
        stack.top().second += str;
    }
    else if (s[i] == '[')
    {
      stack.push({rep_f, std::string()});
      rep_f = 0;
    }
    i++;
  }
  assert(stack.size() == 1);
  return stack.top().second;
}

Notice that the stack has type thus reflecting the fact we need to keep two pieces of information for each of the encoded substrings. As for the recursive implementation, both time and space complexities are \(O(K|s|)\).