Counts the items in the containers

Introduction

Imagine you are the owner of a successful online store. You would like to be able to know the number of items you still have in the warehouse. The problem is that you cannot just walk into the warehouse and count the items as they are stored in closed containers. Thankfully, the warehouse is equipped with sensors and is able to produce a string representing the state of the warehouse and single containers. The problem described in this chapter investigates how we can write an algorithm that takes such a string (the state of all the containers in the warehouse) and be able to answer queries on the number of elements that are present in some portions of the warehouse itself.

Unsurprisingly, this problem has been reported as being asked during Amazon interviews and can be considered as of a medium difficulty. We will investigate two solutions:

• brute-force based on relatively straightforward logic (blindly count the items in the string) and easy to code (in Section 41.3.1),

• a more sophisticated solution with optimal asymptotic complexity where the input string is pre-processed so that queries can be answered faster.

Problem statement

You are given a string $$s$$ representing the current state of a warehouse. The string contains only two kinds of characters:

(ASCII 42)

: representing an item

(ASCII 124)

: representing the boundaries of a container.

A container is a closed space within the warehouse and it is represented in $$s$$ by a pair of . Items within a container $$c$$ are represented as appearing within the two defining $$c$$. You are also given an array of pairs $$Q = \{(s_0, e_0),(s_1, e_2),\ldots,(s_{n-1}, e_{e-1}) : 0 \leq s_i \leq e_i \leq |s|\}$$, where each pair in $$Q$$ identifies a substring in $$s$$. Each element of $$Q$$ is a query you must answer to.

Your task is to write a function that returns an array $$A$$ of length $$n$$, containing the answers to all of the queries in $$Q$$, where each element $$A_i$$ is the number of items contained in all the closed compartments between $$(s_i, e_i)$$.

Given s = and $$Q = \{(0,4),(0,5)\}$$ the function returns $$A=\{2,3\}$$. $$s$$ has a total of $$2$$ closed containers the first with $$2$$ and $$1$$ item inside respectively.

The first query asks you to find the number of elements in the substring $$s[0,4]=$$  where three items are represented but only two are within a closed container (the first two).

The second query refers to the substring $$s[0,5]=$$ . The items are the same as in the previous query but this time all of them are in closed containers.

Given s = and $$Q = \{(0,2),(1,3)\}$$ the function returns $$A=\{0,1\}$$. $$s$$ has a total of two items and only $$1$$ closed container containing only a single item.

The first query refers to the substring $$s[0,2]=$$ . No closed containers are represented in such a substring thus the answer in this case must be $0$. However, the second question refers to $s[1,3]=$ where we can see we have a valid container. We can therefore counts the elements in it.

Clarification Questions

Is it guaranteed for the input string $$s$$ to only contains valid characters?

Yes, you do not need to worry about the sanity of the input.

Discussion

Brute-force

This problem has a straightforward solution that essentially loops over all the elements specified in a query $$(s,e) \in Q$$ and counts all the elements inside the containers. Because the $$|s-e|$$ is $$O(|s|)$$ the complexity of this approach is $$O(|s|*|Q|)$$. Listing [list:items_in_containers_amazon_bruteforce] shows an implementation of this idea. Note that most of the code complexity of this solution is in the function that has to make sure only to count items that are within a closed container. It does so by first finding the first container wall appearing after the start of the query interval. We can safely skip all those items because they are not inside a container. Once we have found the beginning of the first container, we can proceed by counting the elements one container at a time.

Listing 1: Na\ive solution to the \textit{items in the container} problem.

using Query                   = std::pair<int, int>;
constexpr char kContDelimiter = '|';
constexpr char kItem          = '*';

int count_items_in_substring(const std::string& s, const Query& query)
{
const auto& [start, end] = query;
assert(start <= end);
assert(end <= std::ssize(s));

auto curr_char = start;
// find the first container
while (curr_char <= end && s[curr_char] != kContDelimiter)
curr_char++;
curr_char++;

int ans          = 0;
int cont_counter = 0;
while (curr_char <= end)
{
if (s[curr_char] == kItem)
{
cont_counter++;
}
else if (s[curr_char] == kContDelimiter)
{
ans += cont_counter;
cont_counter = 0;
}
curr_char++;
}
return ans;
}
std::vector<int> items_in_containers_naive(const std::string& curr_char,
const std::vector<Query>& Q)
{
std::vector<int> ans;
ans.reserve(Q.size());
for (const auto& q : Q)
ans.push_back(count_items_in_substring(curr_char, q));
return ans;
}

Linear time solution

There is, however, a much faster solution that can be easily implemented provided we have come pre-computed values. Performing some pre-computation in order to speed-up an algorithm is a common idea that is useful in solving many coding interview questions. In this particular problem, we are going to calculate two values for each character of the input string:

$$C_i$$, the closest delimiter to the right

: for each character $$s_i$$ of the input string $$s$$ we want to have information about the index of the closest container delimiter appearing after it. In other words we are looking for the index $$j > i$$ such that $$s[j]='|'$$. If such index does not exists then we assume it is the index of the last character of $$s$$, i.e. $$|s|-1$$.

$$P_i$$, the number of elements in all containers to the left

: this value should answer the question: given a character at index $$i$$ of $$s$$, how many items are placed into all the closed containers appearing to the left of $$i$$?

When this information is available for each and every position of $$s$$ then we can answer each query in constant time. Given a query $$(l,r)$$, we can calculate the answer to it by using the information about the closest container delimiter to the right of $$l$$, $$c\geq l$$, to find the beginning of the first container in the range $$(s,e)$$. All the elements between $$l$$ and $$c$$ can be ignored. So now that we have transformed our query from $$(l,r)$$ to $$(c,r)$$ we are ready to use the prefix sum of the number of elements in the containers.

We can calculate the answer to $$(c,r)$$ by simply returning $$P_r - P_l$$: the number of elements in the all container up to index $$r$$ minus the number of elements in all containers up to the index $$l$$.

Figure [fig:items_in_containers_amazon:example_prefix] shows the values of $$P$$ and $$C$$ for the input string is . Each value of the array $$P_i$$ contains the count of the items inside all the containers in the prefix of $$s$$ up to and including index $$i$$. For instance $$P_4=2$$ because the substring of $$s$$ between indices $$0$$ and $$4$$ only contains one container with two elements in it while $$P_5=3$$ because between indices $$0$$ and $$5$$ we have two containers with $$2$$ and $$1$$ items inside, respectively. The values in $$C_i$$ contains the indices of the first character in the suffix of $$s$$ from index $$i$$. For instance $$C_1=3$$ because the first after index $$1$$ appears at index $$3$$ in $$s$$ while $$C_3=3$$ because $$s[3]$$ contains itself. Note that the last element of $$s$$ always contains $$|s|-1$$ regardless of whether $$s_{|s|-1}$$ is or not. Listing [list:items_in_containers_amazon_lineartime] shows an implementation of the idea above. The main driver function is which first calls two other functions:

and,

that are responsible for the pre-computation of $$C$$ and $$P$$, respectively. Note that in Listing [list:items_in_containers_amazon_lineartime], for the sake of clarity, $$C$$ and $$P$$, are named and , respectively.

[fig:items_in_containers_amazon:example_prefix]

Listing 2: Linear time and linear space solution to the \textit{items in the container} problem.

std::vector<int> prefix_sum_containers_items(const std::string& s)
{
std::vector<int> cont_prefix_sum;
cont_prefix_sum.reserve(s.size());

auto it = std::begin(s);
while (it != std::end(s) && *it != '|')
{
cont_prefix_sum.push_back(0);
it = std::next(it);
}
cont_prefix_sum.push_back(0);
it = std::next(it);

int cont_curr_countainer = 0;
while (it != std::end(s))
{
const auto count_prev_containers =
(cont_prefix_sum.size() > 0 ? cont_prefix_sum.back() : 0);
if (*it == '|')
{
// sum of the previous and previous container items
cont_prefix_sum.push_back(count_prev_containers + cont_curr_countainer);
cont_curr_countainer = 0;
}
else
{
cont_prefix_sum.push_back(count_prev_containers);
cont_curr_countainer++;
}
it = std::next(it);
}
return cont_prefix_sum;
}

std::vector<int> find_closest_bars_right(const std::string& s)
{
std::vector<int> ans(s.size());
int idx_last_bar = std::ssize(s) - 1;
for (int i = std::ssize(s) - 1; i >= 0; i--)
{
if (s[i] == '|')
idx_last_bar = i;
ans[i] = idx_last_bar;
}
return ans;
}
std::vector<int> items_in_containers_lineartime(const std::string& s,
const std::vector<Query>& Q)
{
const std::vector<int>& prefix_sum_count_items =
prefix_sum_containers_items(s);
const std::vector<int>& closest_bars_right = find_closest_bars_right(s);

std::vector<int> ans;
ans.reserve(Q.size());
for (const auto& [start, end] : Q)
{
const auto& new_start = closest_bars_right[start];
if (new_start >= end)
{
ans.push_back(0);
}
else
{
const auto& count_before_start =
(new_start <= 0) ? 0 : prefix_sum_count_items[new_start];
ans.push_back(prefix_sum_count_items[end] - count_before_start);
}
}
return ans;
}