# 17. Delete duplicates from Linked List

# Delete duplicates from Linked List

## Introduction

This chapter deals with a fairly simple problem on linked lists that is popular during the preliminary interview stage. Given it’s ubiquity, it is important to have a good understanding of the solution so that you can implement it quickly and, more importantly, flawlessly during a real interview.

## Problem statement

Given a singly linked link \(L\) (see definition at Listing [list:delete_duplicates_list:linked_list]), return a linked list with no duplicates.

Listing 1: Singly Linked list definition

```
template<typename T>
struct Node {
;
T val*next;
Node (T x) : val(x), next(nullptr) {}
Node};
```

## Clarification Questions

Can the input list be modified?

*Yes.*

Is it guaranteed the input list to be a valid list?

*The input list is always a valid singly linked list.*

## Discussion

There are several approaches to solving this problem but we will focus on two, the second of which is a refinement of the first intended to allow you to present an elegant and efficient solution in a short time frame.

### Brute-force

The easiest solution possible is as follows:

Create a vector that contains a copy of the list;

Remove duplicates from the vector;

Create a brand new list with the content of the duplicate-free vector.

This solution is straightforward but not optimal as, while it is optimal in time we are not taking advantage of the fact that we can modify the list in place and thereby avoid creating and returning a brand new list.

A possible implementation is shown in Listing [list:delete_duplicates_list_brite_force1] where for the step \(2\) the remove-erase idiom[@cit::wiki::remove-erase] is used to remove duplicates from the vector(the erase part is actually not necessary in this case).

Listing 2: C++ solution $O(n)$ time and $O(n)$ space solution to the problem of removing duplicates from a Linked List using std::remove.

```
template <typename T>
std::vector<T> list_to_vector(Node<T> *head)
{
std::vector<T> ans;
for (; head; head = head->next)
.push_back(head->val);
ans
return ans;
}
template <typename T>
<T> *list_from_vector(const std::vector<T> &Vs)
Node{
<T> *tail = nullptr;
Node<T> *head = nullptr;
Nodefor (const auto v : Vs)
{
<T> *n = new Node<T>(v);
Nodeif (!tail)
= n;
head else
->next = n;
tail= n;
tail }
return head;
}
template <typename T>
<T> *remove_duplicates_from_linked_list_1(Node<T> *head)
Node{
auto vec_list = list_to_vector<int>(head);
// std::sort(std::begin(vec_list), std::end(vec_list)); //not necessary. List
// is sorted already
.erase(std::unique(std::begin(vec_list), std::end(vec_list)),
vec_liststd::end(vec_list));
return list_from_vector(vec_list);
}
```

### In-place \(O(1)\) space solution

In the solution describe in Section 21.3 used additional space to both remove the duplicates and also to avoid the pain of rearranging the input list by creating a brand new list containing no duplicates. It is, however, possible to write a in-place solution that uses no additional space.

The main idea is that since the list is **sorted**, duplicate elements will be one after the other. We can take advantage of this by simply ignoring pairs of consecutive nodes that have the same payload. Ignored nodes can therefore be deleted. The only complications is that we need to make sure to connect the first occurrence of every Node in the list with each other. It is f or this reason that we need to remember the first Node of a stride of the same value.

This idea is demonstrated in Listing [list:delete_duplicates_list_lineartime]. Note that:

The base case

`if(!head || !head->next) return head;`

is making sure that if we are examining the last element of the list (or an empty list) then there is no duplicate to ignore and thus we can return this element.otherwise, we are looking at a list with

**at least**two elements. These two elements can be potentially the start of a stride of equal elements that we want to ignore. We keep a pointer,`Node<T>* head;`

(which is never modified) to the first element of the stride and advance a second pointer,`Node<T>* head_n;`

, until we either reach the end of the list or we find an element that is different from the first one,`while ( head_n && head -> val == head_n -> val)`

. All the advanced elements are deleted. At the end of the loop the second pointer is pointing to either:An element different from the element pointed to by the first one. The stride of equal elements is processed and

`head->next`

can now point to the second pointer.`nullptr`

. We have reached the end of the input list. We are done.

The time and space complexity for this approach are \(O(n)\) and \(O(1)\), respectively. All the nodes of the list are visited at most once.

Listing 3: C++ solution $O(n)$ time and $O(1)$ space solution to the problem of removing duplicates from a Linked.

```
template <typename T>
<T> *remove_duplicates_from_linked_list_linear_space(Node<T> *head)
Node{
if (!head || !head->next)
return head;
<T> *head_n = head->next;
Node
while (head && head_n && head->val == head_n->val)
{
const auto head_n_n = head_n->next;
delete head_n;
= head_n_n;
head_n }
->next = remove_duplicates_from_linked_list_linear_space(head_n);
headreturn head;
}
```

## Common Variations and follow-up questions

One follow-up question that an interviewer may ask relates to the deletion of the duplicate nodes. In the Listing [list:delete_duplicates_list_lineartime] we see that the nodes are deleted using the `operator delete`

, but what if the list was not allocated using `operator new`

? The question is left for the reader.

How would the solution change if the nodes were allocated using a custom allocator? (spoiler, a custom deleter is also needed.)