Tuesday, December 01, 2009

Duplicate or missing numbers in an array

This is one of the most frequently asked questions in CS job written tests and interviews. Though other solutions like counting sort, linked-list-loop detection, sum etc. exist, they have their own complexities and limitations.

here is one O(n) approach I came up with (during a test), which actually seems to work.

Throughout, I assume that the numbers are between [1..n], all positive and integers are large enough to hold all of these (but not necessarily their sum, which can be an issue with the sum based approach), and XOR works with regular semantics.

given an array of length n which contains elements from [1..n] without repetitions, let x = XOR(all elements of the array)
Then

x = n if n = 0 mod 4
x = 1 if n = 1 mod 4
x = n+1 if n = 2 mod 4
x = 0 if n = 3 mod 4

We know n and also the expected value of x if the missing number was there. Just XOR the result of XORing all elements of the given array with expected x from given n, which gives the missing number. The same argument holds for one repeated number too.

Here is a sample C program which finds one repeated number.

int n = 100000;
int a[n + 1];
int rep = n - 1;
int i = 0;
int idx = 0;
for(i = 1; i <= n; i++){
  a[idx++] = i;
  if(i == rep)
    a[idx++] = i;
}

int res = 0;
for(i = 0; i < n + 1; i++){
  res ^= a[i];
}
switch(n % 4){
  case 0 : res ^= n;     break;
  case 1 : res ^= 1;     break;
  case 2 : res ^= n + 1; break;
  case 3 : res ^= 0;     break;
}
printf("%d\n", res);

1 comment:

  1. Arey Gaurav here (hope u remember me!) I have just started reading ur blog .. its insane and u r one crazy person! most of the stuff goes straight over my head! Any way talking abt this problem, I guess ur solution will work only if duplicate element causes nth element to be deleted ie something like {1,2,2,3,4} and n=5 is missing. But let us say that any other element could be missing because of the duplicates i.e something like {1,3,3,4,5} (2 is missing). I am not sure if the nature of the problem allows this, but lets say it does. do u think ur solution will work in that case? I think even the solution with sum will not work either.

    ReplyDelete