Coding interview Essentials

13. Find the odd occurring element

Find the odd occurring element

Introduction

This chapter describes a problem on arrays and on the XOR (also known as disjunctive-or and usually identified by the symbol \(\oplus\))[^11] operation.

There exists a very simple and intuitive yet inefficient brute-force solution that, however, will not lead us a long way in an actual interview and that is conceptually quite different from other approaches that are far faster. For this reason, it can be challenging to use the brute-force solution as a starting point and to iteratively improve it so that we eventually reach optimal time and space complexity. It is way more effective to read the statement carefully and to try to get the right insight right from the beginning instead of getting carried away towards a dead-end by the brute-force approach.

Problem statement

Write a function that given an array \(A\) of positive integers where all elements except one appear an even number of times, returns the one and only one element appearing an odd numbers of times.

[ex:find_odd_occurring_element:example1]
Given the array \(A=\{4,3,6,2,4,2,3,4,3,3,6\}\) the function returns \(4\) because it appears \(3\) times while all the other elements appear an even number of times.

Clarification Questions

Is the input array always valid. Does it always contain only one element appearing an odd number of times?

Yes the input array can be assumed to be valid.

Is the range of the input integers known?[^12]

No it is not. The values of the elements of \(A\) is arbitrary.

Discussion

Brute-force

The brute-force solution to this problem is very intuitive: we have to count the occurrences of each of the elements of \(A\) until we find one appearing an odd number of times. Provided that a counting function (which counts the occurrences of a given element in an array) is available, it is only a matter of using that function for all the elements in the array, and return as soon as it returns an odd number.

Listing [list:find_odd_occurring_element_bruteforce_rawloop] shows a possible implementation in C++ which uses the function from the STL to count the number of occurrences of a given number in \(A\).

Listing 1: Brute force solution using a counting function.

inline constexpr bool is_odd(const int x)
{
  return x & 1; //only odd number have the leftmost bit set
}

int odd_appearing_element_bruteforce_rawloop(const std::vector<int>& A)
{
  for (const auto& x : A)
  {
    //count how many times x appears in A
    const size_t number_occurrences = std::count(begin(A), end(A), x);
    if (is_odd(number_occurrences))
      return x;
  }
  throw std::invalid_argument(
      "Invalid input array. No elements appear an odd number of times");
}

What the code above is really trying to do is find the element appearing an odd number of times. Instead of using a raw loop for doing so, the code can be made much more expressive (which is always appreciated and positively evaluated by the interviewer) by using the standard metafunction as shown in the Listing [list:find_odd_occurring_element_bruteforce_standard].

Listing 2: Brute force solution using standard libraries functions \inline{std::count} and \inline{std::find_if}.

int odd_appearing_element_bruteforce_standard(const std::vector<int> &A)
{
  return *std::find_if(begin(A), end(A), [&](const auto x) {
    return is_odd(std::count(begin(A), end(A), x));
  });
}

However, this is considered a poor solution: the time complexity is \(O(n^2)\) which is quite off from the optimum, while the space complexity is constant.

Notice that, in the first brute-force solution (Listing [list:find_odd_occurring_element_bruteforce_standard]), we dereference the iterator returned by directly without checking if it is valid or not. returns an iterator to the element satisfying the search criteria \(p\) (in the form of a lambda) only if such an element exists, and that otherwise it would return which is in equal to . Dereferencing would cause UB, but we are guaranteed this case to never happen as an odd occurring element is always present in \(A\)[^13].

In the second implementation (Listing [list:find_odd_occurring_element_bruteforce_rawloop]), we took a different approach concerning the handling of a bad input and we decided to explicitly throw an exception, in case all elements appear an even number of times or \(A\) is empty. Even if the interviewer does not ask for it, it is good to show that we thought about this case, and also that we can handle it without big penalties in expressiveness and performance: we can rest assured this certainly adds a bonus point to our final evaluation. Moreover, we can argue that a throw statement makes explicit and clear that the function is expecting certain characteristics from the input without incurring performance penalties: [^14] when the input is good (which is safe to assume would be the majority of the times the function gets invoked).

Linear time and space solution

In order to speed up the process of keeping count of how many times each element appear in the input array, we can adopt a map-like structure, where the keys are the numbers in \(A\) and the values are integers representing the number of times each element appears in the array. If a hash-based map is used to store this key-value information, then this effectively reduces the time complexity of the brute-force approach down to \(O(n)\) (on average) at the expense of space that increases to linear as well.

Keeping track of the actual number of times an element appears in \(A\) is unnecessary and overkill, mainly because all we need is the information about whether or not the number of times it appears is even or odd. We do not care about the actual number, and therefore a single bit is sufficient to store this information. The map structure would then associate integers to booleans, for a substantial saving in the space used. However big the reduction is, the space used remains linear. This idea is implemented in Listing [list:find_odd_occurring_element_bruteforce_linearspace].

Listing 3: Linear time and space solution using a map.

int odd_appearing_element_linear_space(const std::vector<int>& A)
{
  // true = even
  // false = odd
  std::unordered_map<int, bool> M;
  for (const auto& x : A)
    M[x] = !M[x];

  for (const auto& kv : M)
    if (kv.second)  // kv is a pair<key, value>
      return kv.first;

  throw std::invalid_argument(
      "Invalid input array. No elements appear an odd number of times");
}

The code works in two phases:

  1. the map is filled in such a way that for each key \(x\) the corresponding value is \(1\) if and only if \(x\) appears in \(A\) an odd number of times.

  2. the map is scanned to find the one element having a value of \(1\).

The time and space complexity are \(O(n)\).

Linear time and constant space solution

However, there is a way to solve this problem in constant space and linear time. This solution is based on the XOR operation which can be thought of as the equivalent of the sum for bits and has some interesting properties that are useful to construct a solution for this problem:

  1. it is a commutative, distributive and associative operation;

  2. its neutral element is the \(0\). What it means is that applying the XOR to a number \(x \neq 0\) and \(0\) always results in \(x\) i.e. \(x \oplus 0 = x\) and \(0 \oplus x = x\)

  3. xor-ing an element with itself always results in 0 i.e. \(x \oplus x = 0\).

The practical consequence of these facts is that when xor-ing an element \(x\) with itself an odd number of times, the result is \(x\) as \((x \oplus x) \oplus x = (0 \oplus x) = x\), but doing so an even number of times results in \(0\) because \((x \oplus x) \oplus (x \oplus x) = 0 \oplus 0 = 0\).

Why is this useful? It is very useful because we known that all input integers except one are occurring an even number of times! Therefore when all numbers are xor-ed together, all it is left at the end is the number appearing an odd number of times: every number except the answer will be xor-ed an even number of times with itself, resulting in \(0\).

For instance if we try to XOR all the elements of the example [ex:find_odd_occurring_element:example1] above where \(A=\{4,3,6,2,4,2,3,4,3,3,6\}\) we obtain: \(4 \oplus 3 \oplus 6 \oplus 2 \oplus 4 \oplus 2 \oplus 3 \oplus 4 \oplus 3 \oplus 3 \oplus 6 = 4\). At this point we can use commutativity, associativity and distributivity properties to rearrange it as follows (this would be equivalent to first sort \(A\) and then XOR all the elements): \[\underbrace{(2 \oplus 2)}_{0} \oplus \underbrace{(3 \oplus 3 \oplus 3 \oplus 3)}_{0} \oplus \underbrace{(4 \oplus 4 \oplus 4)}_{4} \oplus \underbrace{(6 \oplus 6)}_{0} = 4\] which clearly show the only value remaining is the one of the element appearing an odd number of times.

An implementation of the idea above is shown in Listings [list:find_odd_occurring_element_bruteforce_final1] where we explicitly loop over \(A\) and [list:find_odd_occurring_element_bruteforce_final2] where instead, we use to perform the array reduction[^15].

Listing 4: Linear time and constnat space using XOR and a raw loop.

int odd_appearing_element_final(const std::vector<int> &A)
{
  int ans = 0; //0 is the neutral element for XOR
  for (const int x : A)
    ans ^= x;
  return ans;
}

Listing 5: Linear time and constant space solution using XOR $\oplus$ and the \inline{std::accumulate} function from the STL.

int odd_appearing_element_final_std(const std::vector<int> &A)
{
  return std::accumulate(
      std::begin(A), std::end(A), //range
      0, //initial value
      [](const int acc, const int x) { return acc ^ x; } // binary reduction operation
    );
}

Both implementations [list:find_odd_occurring_element_bruteforce_final1] and [list:find_odd_occurring_element_bruteforce_final2] have very similar characteristics in terms of asymptotic performance, as they both use linear time and constant space.