Coding interview Essentials

35. BST Lowest Common Ancestor

BST Lowest Common Ancestor

Introduction

The lowest common ancestor is an important concept in graph theory and is often a topic or a fundamental building block of coding interview questions. For example, LCA has important applications in graph theory in:

  • computation of minimum spanning tree,

  • finding a dominator tree, or

  • as a stepping stone for algorithm for network routing, or range searching.

Given a tree and two nodes \(p\) and \(q\), the lowest common ancestor of \(p\) and \(q\), (\(LCA(p,q)\)) is defined as the lowest or deepest node that has both \(p\) and \(q\) as descendants. In other words the LCA is the shared ancestor or \(p\) and \(q\) that is the farthest from the root of the tree.

There are several known algorithms for finding the LCA efficiently on a generic tree; one of the most fundamental being the one from Harel and Tarjan [@harel84; @harel80]. In this chapter, however, we will focus on the (simpler) problem of finding the LCA for trees of a particular kind: binary search trees. This constraint greatly simplifies the general problem of finding LCAs.

Problem statement

Write a function that takes a binary search tree \(T\), and two nodes \(p\) and \(q\) and returns their lowest common ancestor.


Given the tree in Figure 39.1 the lowest common ancestor for nodes:

  • \(p = 2\), \(q=12\) is node \(3\)

  • \(p = 12\), \(q=3\) is node \(12\)

  • \(p = 18\), \(q=1\) is the root node \(13\)

[ex:lower_common_ancestor:example1]

Figure 1: Binary Search tree of the Example [ex:lower_common_ancestor:example1]

. [fig:lowest_common_ancestor:example1]

Clarification Questions

What should we return in case the input tree is empty?

You can return an empty tree.

Should I check for the validity of the binary search property for \(T\)?

No, you can assume \(T\) to always be a valid binary search tree.

Can I assume \(p\) and \(q\) to always be always present in \(T\)?

Yes

Discussion

Based on the definition of LCA one of the simplest solution possible would be to compute the path from the root to \(p\) and \(q\) and store them as two separate lists of nodes: \(P_q\) and \(P_p\). We can then compare these lists knowing that they will match up until a certain point, say up to the \(k^{th}\) node of the path. After that the lists do not match anymore. Therefore the \(k^{th}\) element of the list is the LCA for \(p\) and \(q\). For instance in the tree in Figure 39.1 the paths from the root to the nodes \(5\) and \(2\) are the following:

&P_5 = {,4,12,10,5}
&P_2 = {,1,2}

As you can see they match up to node \(3\) which is indeed their LCA.

If we try the same approach for the nodes \(5\) and \(11\) their respective paths from the root are:

&P_5 = {,5}
&P_1 = {,11}

\(P_5\) and \(P_{11}\) match up to the penultimate node (\(10\)). Therefore, their LCA is node \(10\).

This approach is correct and is easily implementable. Its time and space complexity is \(O(k)\) where \(k\) is the height of \(T\) (which for unbalanced trees might be proportional to the number of nodes in \(T\)). Listing [list:lowest_common_ancestor_paths] show an implementation of the idea above.

Listing 1: LCA solution based on the difference of paths from the root.

template <typename T>
void find_path_helper(Node<T>* root,
                      const T& target,
                      std::vector<Node<T>*>& path)
{
  assert(root);  // because target in guaranteed to be in the tree.

  // visited a new node. remember it
  path.push_back(root);
  if (root->val == target)
  {
    // found the target element. we can stop as the path is complete
    return;
  }

  // classic BST search
  if (target <= root->val)
    find_path_helper(root->left, target, path);
  else
    find_path_helper(root->right, target, path);
}

template <typename T>
std::vector<Node<T>*> find_path(Node<T>* root, const T& node)
{
  std::vector<Node<T>*> path = {};
  find_path_helper(root, node, path);
  return path;
}

template <typename T>
Node<T>* findLeastCommonAncestor_paths(Node<T>* root, const T& p, const T& q)
{
  std::vector<Node<T>*> P_p = find_path(root, p);
  std::vector<Node<T>*> P_q = find_path(root, q);

  // find the point up to which P_q and P_q are the same
  auto itp     = begin(P_p);
  auto itq     = begin(P_q);
  Node<T>* ans = *itp;
  while ((itp != end(P_p) && itq != end(P_q)) && (*itp == *itq))
  {
    ans = *itp;
    itp++;
    itq++;
  }
  return ans;
}

This approach is, however, not optimal as we waste space by storing entire paths which is not necessary as we only really care about the last common node. One way to avoid memorizing the entire paths is to find the path for both \(p\) and \(q\) simultaneously and only remember the last node we visited. If at some point during this visit we find that the the next node to visit for \(p\) is different from the direction that the search for \(q\) requires, we can stop as this is the point where the paths for \(p\) and \(q\) diverge and return the last element that was common. Despite the fact that this approach does not improve the time complexity we have only a constant space usage which is a very good improvement compared to the linear space complexity for Listing [list:lowest_common_ancestor_paths]. This optimized version is shown in Listing [list:lowest_common_ancestor_paths_optimized].

Listing 2: Space optimized version of Listing \ref{list:lowest_common_ancestor_paths}

template <typename T>
void find_path_optimized_helper(Node<T>* root,
                                const T& p,
                                const T& q,
                                Node<T>*& last_common)
{
  assert(root);  // because target in guaranteed to be in the tree.

  // LCA is the current node. Either p is descendant of q or the other way
  // around
  if (root->val == p || root->val == q)
  {
    last_common = root;
    return;
  }
  last_common = root;

  // paths for p and q takes different direction from here
  if ((p <= root->val && q > root->val) || (p > root->val && q <= root->val))
    return;

  // they are both lower or equal than val or both higher
  if (p <= root->val)
  {
    find_path_optimized_helper(root->left, p, q, last_common);
  }
  else
    find_path_optimized_helper(root->right, p, q, last_common);
}

template <typename T>
Node<T>* findLeastCommonAncestor_paths_optimized(Node<T>* root,
                                                 const T& p,
                                                 const T& q)
{
  Node<T>* ans = root;
  find_path_optimized_helper(root, p, q, ans);
  return ans;
}

We can also simplify the implementation shown in Listing [list:lowest_common_ancestor_paths_optimized] further by rewriting it such that it runs iteratively rather than recursively. Listing [list:lowest_common_ancestor_iterative] shows this iterative version which starts at the root of \(T\) and keeps navigating the tree by moving left or right until the direction of the search for both \(p\) and \(q\) is the same. When the path diverges we can stop and return the current node which is the lowest shared node in the paths from the root to \(p\) and \(q\)

Listing 3: Iterative solution to the problem of finding the LCA in a binary search tree.

template <typename T>
Node<T>* findLeastCommonAncestor_reference(Node<T>* root,
                                           const T& p,
                                           const T& q)
{
  while (root)
  {
    const auto& payload = root->val;
    if (payload > p && payload > q)
    {
      root = root->left;
    }
    else if (payload < p && payload < q)
    {
      root = root->right;
    }
    else
    {
      return root;
    }
  }
  return root;
}