Coding interview Essentials

20. Find the cycle in a Linked list

Find the cycle in a Linked list

Introduction

The topic of this chapter is linked-lists i.e. linear collections of elements whose order, unlike an array, is not dictated by their ordering in memory. As they are one of the most simple and commonly data structures it is reasonable to assume that they will come up during interview and should, therefore, form part of your preparation.

The major benefit that lists offer over conventional arrays is that elements in the list can be efficiently (in constant time) removed and inserted without the need to reorganize and perform a complete restructuring of all the data. [^22]. As a result, linked lists are often used to implement more complex data structures where this insertion and deletion cost is crucial; for example, associative arrays. They do, however, also have quite a number of drawbacks. For instance:

memory consumption (as for each node of the list you also pay a price as has to remember the next and/or previous nodes).

they offer sequential access. Accessing a node costs linear time.

cache unfriendly

.

A linked list is, at a high level of abstraction, a collection of so-called nodes or elements each of which (except the last) contains a pointer to the next one. Each node also carries payload data which is the information you ultimately want to be stored. The standard linked list has two special nodes:

  • the head that is not pointed to by other elements and is the first of the elements.

  • the tail, which is a node that has no next element, and is, not surprisingly, the last of the elements.

In some particular cases during the manipulation of the list you might end up with a broken or corrupted list where the tail node no longer exists, meaning that each of the elements in the list is pointing to some other node. In this situation a loop forms and the list becomes what it known as a circular list. In this chapter we will investigate how we can find out whether:

a list is circular and if it is;

how to identify the first element of the loop.

.

Problem statement

Given a singly linked list (definition in Listing [list:delete_duplicates_list:linked_list] at page ) determine whether the list contains a loop.

  • If it does, return the the node where the loop starts

  • otherwise, return nullptr

For the rest of the chapter we will use an array of integers to represents the nodes of the list and a single integer to represent the node that the last element of the list connects to in order to create a cycle or \(-1\) if there is no cycle. For instance the array \(L=[1,2,3,4]\) and the integer \(2\) represent the list shown in Figure 24.1.


Given the List \(\{[1,2,3,4,5],2\}\), the function returns the address of the node \(2\). See Figure 24.1.


Given the List \(\{[1,2,3,4,5],-1\}\), the function returns . See Figure 24.2.

Figure 1: Example of linked list with a cycle.

[fig:cycle_in_list:example2] Figure 2: Example of linked list with no cycle.

Discussion

Considering this is a very well-known problem we will not spent time on the obvious brute-force solution. Instead we will concentrate first on an optimal in time solution with linear space, and then examine how to improve it by lowering the space complexity to constant.

All solution implementations in this chapter uses the Linked list definition shown in Listing [list:cycle_in_list:node_definition];

Listing 1: Singly linked-list node definition.

template <typename T>
struct Node
{
  T val;
  Node *next;
  Node() = default;
  Node(T x) : val(x), next(nullptr)
  {
  }
};

Linear time and space solution

This problem has many similarities with the problem of finding a duplicate in a collection and, as such, we can approach it in a similar way. The idea is to visit the list and store in a hash-set the address of the node already visited. While visiting a new node, we first check if that node was already visited, and if the answer is yes then we can stop as we have found the starting point of a loop. If we reach the end of the list and we were not be able to find a duplicate then there is no loop and we can return nullptr. A possible implementation of this idea is shown in Listing [list:cycle_in_list:linearspace].

Listing 2: Linear time and space solution to the problem of detecting a cycle in a linked list where an hashset is used to remember already visited nodes.

template <typename T>
Node<T>* detect_cycle_linear_time_space(Node<T>* head)
{
  using Node_ptr = Node<T>*;
  std::unordered_set<Node_ptr> visited;

  while (head)
  {
    // has the current node already been visited?
    if (visited.find(head) != visited.end())
      return head;
    // if not, then remember that we did now
    visited.insert(head);
    // advance one node in the list
    head = head->next;
  }
  return nullptr;
}

Slow and fast pointer solution - Floyd’s algorithm

This algorithm[@cit::wiki::floyd] uses the fact that, like clock hands, things iterating on a cycle at different speeds will eventually meet at some point in the future. Consider two runners \(R_1\) and \(R_2\) with velocities \(V_1\) and \(V_2=2V_1\) respectively (\(R_2\) goes twice as fast as \(R_1\)), starting their run from the same point in a circular stadium. They will meet again when the slower runner reaches the starting point for the second time. This occurs because, by the time the slower runner has completed a half lap of the track, the faster runner will have completed a full lap; and by the time the slower finishes a full lap, arriving at the starting point again, the faster will have completed a second full lap. We can use this fact to detect a cycle in a linked list even if for the cycle detection problem things might be a bit more complicated because the two runners might start going in a loop at the same time (the list potentially has a first part the is not part of the loop as can be seen in Figure 24.1).

The rest of this section will outline the technical specifics so, if you are pressed for time, it is possible to skip to the implementation shown in Listing [list:cycle_in_list:constantspace] which is quite self-explanatory given the underlying algorithm works fairly intuitively. If you are interested in the details of why it works, read along.

Consider two iterators p,q with velocities \(v_p=1\),\(v_q=2\) respectively. Suppose the cycle(not the entire list) has length \(n\). We can have two scenarios depending on the index \(A\) of the starting node of the cycle:

  1. the cycle starts at \(A < n\).

  2. or starts at \(A \geq n\).

For the case \((1)\) when the slower iterator reaches \(A\) the faster is at location \(2A\) (which might mean that the faster iterator looped around the cycle already). How many iterations \(k\) will it take before they meet and at which node will this meeting occur? The situation is described by the following congruences: \[\begin{aligned} A + kv_p &\equiv 2A + k2v_p \Mod{n} \\ 2A + k2v_p &\equiv A + kv_p \; \Mod{n} \\ A + k2v_p &\equiv kv_p \Mod{n} \\ A +kv_p &\equiv 0 \Mod{n} \\ A +k &\equiv 0 \Mod{n}\end{aligned}\] which has solution \(k = n-A\). This means that they will meet after \(k=n-A\) iterations of the slower iterator, i.e. at \(A\) nodes before the beginning of the cycle and we can use this fact to count \(A\) nodes from the beginning of the list in order to find the starting point of the cycle.

Once the iterators meet in the cycle we can move the fast iterator back to the beginning of the list and iterate forward one node per step with both iterators until they match again. When we move the fast iterator back at the head of the list, both iterators are \(A\) nodes away from the beginning of the cycle. Because of this, when we move both of them by one, they will eventually meet exactly at that node \(A\) i.e. the beginning of the cycle.

Let’s consider now the case (\(2\)) i.e. when \(A \geq n\). This means that by the time the slower iterator reaches the beginning of the cycle the faster one has completed more than one cycle. What will then be the starting point for the faster one? We argue that once \(p\) reaches \(A\), \(q\) is at node \(2A\) but since \(A > n\), this means that it will be at position \(A + (A \Mod{n})\). We can now use similar arguments to the previous example and write:

\[\begin{aligned} A + kv_p &\equiv A + (A \Mod{n}) + (k2v_p \Mod{n}) \\ A + (A \Mod{n}) + k2v_p &\equiv A + kv_p\;\Mod{n} \\ (A \Mod{n}) + kv_p \Mod{n} & \equiv 0\Mod{n} \\ (A \Mod{n}) + k \Mod{n} &\equiv 0 \Mod{n} \: \: \text{ : because } v_p=1 \\\end{aligned}\] which has solution \(k = n-(A \Mod{n})\). This means that the meeting point is \(A \Mod{n}\) nodes before the beginning of the cycle. If we do the same operations as previously (when \(A < n\)), we obtain the same result. Iterators will meet at the beginning of the cycle. This happens because advancing \(q\) makes \(p\) cycle possibly several times ( remember that \(A \geq n\)  ) and it will clearly stop at \(A+(n-A \Mod{n}) + A \Mod{n} = A +n \;(mod (n))= A\). In other words; the slower pointer is at first  at node number \(A+(n-A \Mod{n})\). We can write \(A = bn + r\) where \(r = A \;\Mod{n}\). After \(A\) advancing steps it will be at location  \(A+(n-A \;\Mod{n}) +bn +r (\Mod{n})\). Since \(bn \; \Mod{n}=0\) the result follows.

As an example, consider a list with a cycle of length \(n=4\) starting at node number \(10\). The first part of the algorithm tells us that the nodes will meet at node \(10 + 4 - 10 \: mod(4) = 12\). Moving the fast pointer back to the head of the list and iterating one node per time; both iterators will lead the slower pointer to node:

Figure [fig:cycle_in_list:floyd] depicts how the algorithm works on a list of \(8\) nodes with a cycle of length \(4\) starting at node number \(4\). After \(5\) steps the slow (\(p\)) and fast (\(q\)) iterators point to the same node i.e. node number \(6\). After a new phase starts, with the slow pointer being moved to the head of the list and continues with both iterators moving forward by \(1\) until they meet again. They will meet again at the beginning of the cycle.

An implementation of the Floyd’s algorithm is shown in Listing [list:cycle_in_list:constantspace].

Listing 3: Floyd's algorithm, linear time, constant space solution to the problem of detecting a cycle in a linked list.

template <typename T>
Node<T> *detect_cycle_constant_time(Node<T> *head)
{
  Node<T> *n1, *n2;
  n1 = n2 = head;

  while (n1 && n2)
  {
    n1 = n1->next;
    n2 = n2->next;
    if (n2)
      n2 = n2->next;
    else
      break;

    if (n1 == n2)
      break;
  }
  // second phase floys's algorithm
  if (n1 == n2)
  {
    n2 = head;
    while (n1 != n2)
    {
      n1 = n1->next;
      n2 = n2->next;
    }
    return n1;
  }
  return nullptr;
}
Figure 3: At the beginning p=q=1. The slow and fast forward: p=p+1, q=q+2.
Figure 4: p \neq q, thus: p=p+1, q=q+2
Figure 5: p \neq q, thus: p=p+1, q=q+2
Figure 6: p \neq q, thus: p=p+1, q=q+2
Figure 7: p \neq q, thus: p=p+1, q=q+2
Figure 8: p = q. The fast and slow movements stop.
Figure 9: p is reset to the beginning of the list. q is not moved. From now on p and q are moved one step at the time.
Figure 10: p \neq q, thus: p=p+1, q=q+1
Figure 11: p \neq q, thus: p=p+1, q=q+1
Figure 12: p = q. The algorithm stops, and both p and q point to the beginning of the cycle.