Tuesday, December 01, 2009

Duplicate or missing numbers in an array

This is one of the most frequently asked questions in CS job written tests and interviews. Though other solutions like counting sort, linked-list-loop detection, sum etc. exist, they have their own complexities and limitations.

here is one O(n) approach I came up with (during a test), which actually seems to work.

Throughout, I assume that the numbers are between [1..n], all positive and integers are large enough to hold all of these (but not necessarily their sum, which can be an issue with the sum based approach), and XOR works with regular semantics.

given an array of length n which contains elements from [1..n] without repetitions, let x = XOR(all elements of the array)
Then

x = n if n = 0 mod 4
x = 1 if n = 1 mod 4
x = n+1 if n = 2 mod 4
x = 0 if n = 3 mod 4

We know n and also the expected value of x if the missing number was there. Just XOR the result of XORing all elements of the given array with expected x from given n, which gives the missing number. The same argument holds for one repeated number too.

Here is a sample C program which finds one repeated number.

int n = 100000;
int a[n + 1];
int rep = n - 1;
int i = 0;
int idx = 0;
for(i = 1; i <= n; i++){
  a[idx++] = i;
  if(i == rep)
    a[idx++] = i;
}

int res = 0;
for(i = 0; i < n + 1; i++){
  res ^= a[i];
}
switch(n % 4){
  case 0 : res ^= n;     break;
  case 1 : res ^= 1;     break;
  case 2 : res ^= n + 1; break;
  case 3 : res ^= 0;     break;
}
printf("%d\n", res);