dc0d32: 2009

Tuesday, December 01, 2009

Duplicate or missing numbers in an array

This is one of the most frequently asked questions in CS job written tests and interviews. Though other solutions like counting sort, linked-list-loop detection, sum etc. exist, they have their own complexities and limitations.

here is one O(n) approach I came up with (during a test), which actually seems to work.

Throughout, I assume that the numbers are between [1..n], all positive and integers are large enough to hold all of these (but not necessarily their sum, which can be an issue with the sum based approach), and XOR works with regular semantics.

given an array of length n which contains elements from [1..n] without repetitions, let x = XOR(all elements of the array)
Then

x = n if n = 0 mod 4
x = 1 if n = 1 mod 4
x = n+1 if n = 2 mod 4
x = 0 if n = 3 mod 4

We know n and also the expected value of x if the missing number was there. Just XOR the result of XORing all elements of the given array with expected x from given n, which gives the missing number. The same argument holds for one repeated number too.

Here is a sample C program which finds one repeated number.

int n = 100000;

int a[n + 1];
int rep = n - 1;
int i = 0;
int idx = 0;
for(i = 1; i <= n; i++){
  a[idx++] = i;
  if(i == rep)
    a[idx++] = i;
}

int res = 0;
for(i = 0; i < n + 1; i++){
  res ^= a[i];
}
switch(n % 4){
  case 0 : res ^= n;     break;
  case 1 : res ^= 1;     break;
  case 2 : res ^= n + 1; break;
  case 3 : res ^= 0;     break;
}
printf("%d\n", res);

Sunday, November 08, 2009

Going lambda/functional with m4

m4 is the GNU version of traditional macro processor. Even though it is a single pass macro processor, unlike cpp, it can also process the substitutions of a macro, we can define a macro to define another macro, and use the newly created macro, and so on. Thus, we can, in some sense, use m4 as a functional programming language.

Recursive version of factorial program is straightforward in m4:

define(`rec_fact', `ifelse($1,0,1,`eval'($1 * `rec_fact'(`eval'($1 - 1))))')
rec_fact(15)

We can also create non-recursive version in m4. Since lambda terms are essentially anonymous functions, we write m4 macros to create and destroy new pseudo-anonymous macro definitions. Following macro defines the Y combinator equivalent in m4.

dnl Thanks to Toby White (http://uszla.me.uk/space/essays/m4Y)

define(define(`Y_COMB', `dnl
pushdef(`recur',dnl
`pushdef(`y_anon',dnl
`$1''changequote([,])(['changequote([,])`changequote`]`$[]1'(``$[]1'')dnl
['changequote([,])'changequote`])changequote`dnl
(changequote([,])`$[]1'changequote)dnl
`popdef(`y_anon')')'dnl
`y_anon')dnl
pushdef(`y_anon',`dnl
recur(`recur')'changequote([,])(`$[]1')changequote`dnl
popdef(`recur')`'popdef(`y_anon')')`'dnl
'`y_anon')`'dnl

This essentially is a version of fixed point combinator given by Haskell Curry and is defined as Y = \lambda f.(\lambda x.f (x x)) (\lambda x.f (x x)). This is similar to recur(recur). \lambda x.f (x x) is implemented as $1(`$1') in y_anon.

The non-recursive version of factorial can be written as :

define(`ARG1', `changequote([,])`$[]1'changequote')dnl
define(`F_fact',`dnl
pushdef(`anon',dnl
`ifelse'(ARG1,1,1, dnl
``eval''(ARG1 * ``$1''(`eval'(ARG1 - 1))))dnl
`popdef(`anon')')'dnl
`anon')dnl

And now all we need to do to call this non-recursive factorial functions is:

Y_COMB(`F_fact')(5)

We can also define a sugar function named fact that expands to Y_COMB(`F_fact').

Not just tail recursion but we can also run any kind of general recursive program in non-recursive fashion:

define(`F_fib',`dnl
pushdef(`anon',dnl
`ifelse'(ARG1,1,1, ARG1,2,1, dnl
``eval''(``$1''(`eval'(ARG1 - 2)) + ``$1''(`eval'(ARG1 - 1))))dnl
`popdef(`anon')')'dnl
`anon')dnl
Y_COMB(`F_fib')(11)

Run the above with m4 -dV to get sequence of reductions, it helps a lot with debugging.

Note that data constructors are also actual functions in m4, and there is no built-in support for say tuples or lists. It is more lambda-ish than say Haskell-ish. We have to build all such structures ourselves. The evaluation scheme is not full-lazy.

Saturday, November 07, 2009

Pidgin : Log in invisible for Google Talk (XMPP/Jabber)

I am a big fan of pidgin, primarily due to it's configurability and plugin support. One of major turn-off people report around me is that pidgin does not support logging in as invisible with gtalk account. Not that I am a big fan of invisibility, I built a plugin that lets you do just that.

Not supporting neat 'sign on as invisible' for gtalk is basically due to lack of time and willingness on Pidgin developers' part. Not that they can't, they just don't want to at the moment due to some extra stuff gtalk uses/needs.

Installation instructions:

It is written in Perl, which means you need perl 5+, pidgin-perl and libpurple-perl packages.

On apt based systems (like Ubuntu), do (as root)
apt-get install pidgin-perl

looks like Ubuntu 9.10 has this built in.

on fedora and alike

yum install pidgin-perl

That should install all required packages, with dependencies.

Download and extract the plugin perl script.
Save it to your ~/.purple/plugins/ directory. That is, /home/{username}/.purple/plugins/.
If you find that hard, there is an install script included.

Usage:
Go to Tools->Plugins. Find 'Google Talk Invisible Plugin' and enable it.
Now, when you set your status as "Extended Away", you become invisible. Simple!

This is the pre-pre-pre-...-alpha release, which means I did not go through the trouble of activating the INVISIBLE status for gtalk accounts, and hijacked xa instead. I hope it is not a big mistake.

Update : there is a serious bug in this plugin. After you become invisible, you can not get updates to your contact list, and hence if you start pidgin with the plugin enabled in extended away mode, you may not see any online friends at all. I am working on it.

Thursday, September 10, 2009

Basic Authenticating pseudo-proxy with bash/netcat

Yes, you can also do authentication with netcat.

#!/bin/bash
# @author : Prashant Borole (prashantb  :can be reached at:     cse iitb ac in)

# A simple proxy-proxy that authenticates on users' behalf.
# Known limitations :
#       Works for Basic proxy authentication at the moment (though it is not that hard to add digest support)
#       Can process single request at a time. Any parallel requests will not be satisfied. This may lead to referred elements not being loaded, eg. images. Download accelerators are rendered useless.
#       Due to the logging format, there can be storage issues when downloading huge files.
#       Significantly slower than squid or some custom asyncio implementation as each request spawns bunch of new processes
#       No cache. Each request is strictly sent to upstream proxy
#       no https or ftp support

# TODO put this in some standard location
CONFIG_FILE="$HOME/.proxy.conf"

export LISTEN_PORT=
export PROXY=
export PROXY_PORT=
export LOG_FILE=

init()
{
        exec   2>&- 2>&1
        exec   5>&1
        exec   >>"$LOG_FILE"
}

readConfig()
{
        # really cheap config parsing.
        # FIXME improve
        # echo "Loading Configuration from $CONFIG_FILE"
        export LISTEN_PORT=$(grep "LISTEN_PORT=" "$CONFIG_FILE" | cut -d '=' -f2)
        export PROXY=$(grep "PROXY=" "$CONFIG_FILE" | cut -d '=' -f2)
        export PROXY_PORT=$(grep "PROXY_PORT=" "$CONFIG_FILE" | cut -d '=' -f2)
        export LOG_FILE=$(grep "LOG_FILE=" "$CONFIG_FILE" | cut -d '=' -f2)
}

if [ "$1" != "--" ]; then
        # read the config initially
        readConfig

        # accept credentials
        echo -n "$PROXY:$PROXY_PORT    user name : "
        read USER_NAME
        echo -n "$USER_NAME@$PROXY:$PROXY_PORT Password : "
        read -s PASSWORD
        echo
        export B64CRED=`echo -ne "$USER_NAME:$PASSWORD" | base64`
        unset USER_NAME PASSWORD
        echo "Credentials accepted."

        # launch in background
        $0 -- &
        exit 0
else
        trap 'echo "cleaning up $TMP_DIR" ; rm -fr "$TMP_DIR"; killall -9  nc tee >/dev/null 2>&1' EXIT
        trap 'echo "Shutting down proxy server on port $LISTEN_PORT" ; exit 0' SIGUSR1
        trap 'echo "Reloading config..." ; readConfig' SIGUSR2
        trap 'echo "[UNCLEAN] Shutting down proxy server on port $LISTEN_PORT" ; exit 1' SIGINT SIGKILL SIGSEGV SIGPIPE SIGHUP SIGTERM SIGABRT

        readConfig
        init
        echo "`date`   Server $0[$$] started"
fi

TMP_DIR=`mktemp -d /tmp/proxy.XXXXXXXX`

BACK_FIFO="$TMP_DIR"/back_fifo
# create a pipe
mknod "$BACK_FIFO" p

echo "Accepting connections on port $LISTEN_PORT"
while true ; do
        nc -l "$LISTEN_PORT" 0<$BACK_FIFO | tee request | (while read line ; do if [ "$line" = "^M" ] ; then echo "Proxy-Authorization: Basic $B64CRED" ; echo ; else echo "$line" ; fi;  done) | nc -w 5 "$PROXY" "$PROXY_PORT" | tee response 1>$BACK_FIFO
        head -n+1 request
        head -n+1 response
done

echo "THIS SHOULD NEVER HAPPEN. SOMETHING IS WRONG !!!"
exit 1

Note that you can NOT use something like sed or awk, because they simply give up when there is no text to read. This breaks the pipe. A "while read" loop just cuts it :)

Tuesday, August 18, 2009

Java Unicode (source code)

We all know that Java is great with Unicode. I was surprised to know that the compiler itself has moved on. The Java source itself can be in Unicode now!!

Check out the sample Java source file

1 /**
2 * @author prashant
3 *
4 */
5 public class JavaUnicodeTest{
6     static class मज्जाआहे  {
7         public मज्जाआहे()     {
8             System.out.println("constructor of " + this.getClass().getCanonicalName());
9         }
10
11         int पाचद्या()       {
12             return 5;
13         }
14     }
15
16     public static void main(String[] args) {
17         मज्जाआहे o = new मज्जाआहे();
18         System.out.println(o.पाचद्या());
19     }
20 }

Using OpenJDK
java version "1.6.0_0"
OpenJDK Runtime Environment (IcedTea6 1.5) (fedora-27.b16.fc11-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b15, mixed mode)

It runs fine, giving

constructor of JavaUnicodeTest.मज्जाआहे
5

weird, eh?

Sunday, July 26, 2009

random status in Pidgin :- fortune

There is a nice utility called 'fortune' for linux. Basically a database of funny/witty quotes accumulated over decades which can be read either directly from flat files or using the 'fortune' front-end.

What can you use it for? Display random quotes in your mail signature (with due references/credits), on your home page or IM status :P

#!/bin/bash
while true :;
do
    MSG=`fortune -ae`
    zenity --question --text="$MSG" && (purple-remote "setstatus?message=$MSG" && sleep 1800);
done;

Make sure you have zenity and fortune packages installed.

Start pidgin and then this script. It will get a new random status every 1800 seconds and ask whether you want it as status, the process repeats immediately if you press cancel, and status is set otherwise.

'fortune' is used for name sake here. You can pull in random statuses from some web page (and probably scrape them), or some RSS feed if available, or from your own compiled database of statuses, reading one at random.

This is what you do when you are bored to core sitting in an office on Monday afternoon waiting for the benchmarks to complete.

Sunday, July 05, 2009

raided

Bought a new Seagate 500GB internal disc drive last week. One of the obvious things I'd try out was to RAID my old drive with this one. Though asymmetric in both sizes and performance, I wanted to give it a try. Without very high expectations of speedups, I went ahead.

RAID on windows turned out to be a disappointment. The so called Windows Dynamic Discs must span an entire physical disk, which means I could not install linux on any of these drives. Pfft; what a nice way to cut out competition! Moreover, from what I read, dynamic discs are not reliably readable in linux.
Someone has posted an article that tells how to dual boot RAID windows with linux installed. Though I have not tried it, it is worth trying out in a VM first.

RAID-ing on Linux was like a breeze. This time I decided to trust the anaconda installer GUI for partitioning and not doing it the usual command line way. With two raid partitions of same size on each drive, the setup created /dev/md0 comfortably for me. Then I created a PV out of raid and added it to LVM. Partitioned the lvm group in the usual way and my system was up flawlessly. Though not exaggeratingly massive, I can sense some speedup in my day to day activities. Not to mention, boot times are affected. Disc I/O bound programs are ones most boosted. One of my projects where data is read from a 15G file at random runs at 1.5x now :D.

Wednesday, February 11, 2009

Starting gdm or xdm and kde by default in FreeBSD 7.1

After installation, by default FreeBSD gives you a nice terminal based login. I am pretty much happy with that as I have a high-res VESA console (after recompiling the kernel with VESA console support and then setting respective mode using vidcontrol).

But if you are more of a clicky-clicky type of person, you probably want it to start off directly into some graphical greeter and login manager like gdm or xdm.

Starting gdm by default

add

gdm_enable="YES"

in /etc/rc.conf

and reboot. You should get gdm after boot.
One problem I could not come over with gdm is that I could not select or start KDE from it. It would always try to load gnome.
FailSafe Terminal and then running startkde works, but is again not for the clicky-clicky user.

Starting xdm+kde by default

This needs minor serious work.

First, edit /etc/ttys
under #Virtual terminals
change

ttyv8 "/usr/local/bin/xdm -nodaemon" xterm off secure

ttyv8 "/usr/local/bin/xdm -nodaemon" xterm on secure

This should start xdm with an xterm console log on virtual tty8 (the one you get when you hit ).

If you want to log in to kde after authenticating in xdm,

create/open ~/.xsession

and make it look like

#!/bin/sh
exec startkde

This will start off kde after authenticating in xdm.

USB mouse under X / kde / gnome in FreeBSD 7.1

During FreeBSD 7.1 installation, many users with USB mouse respond in negative when the setup asks if they have PS2 or serial mouse (as USB =/= serial). And then they end up with kde up and running and no mouse.

The quick way to fix that is to add

moused_enable="YES"
moused_type="auto"

to your /etc/rc.conf file and reboot.
Xorg server picks up the device /dev/sysmouse nicely.

Thursday, January 29, 2009

rsync vs scp vs ssh + tar

Many will agree that copying large directory trees with many files of varying sizes and compressibility from one linux box to another can be such a caterpillar; whereas surprisingly copying huge files is relatively faster. The primary reason being that as the number of files (actual files+directories) increases, the file system and kernel overheads add up to the network slowness.

Here are some interesting results from a small experiment I performed -

The same data was copied between same machine with similar network conditions and loads using different methods with and without compression. Both fedora installations are new and home partitions almost empty (I have no idea at the moment what would happen if they were somewhat filled). 2 dry runs were carried out before actual tests to give cache benefit to all methods.

Data being copied : 1.6GB worth matlab2008a(unix) installation - contains bunch of avi videos in megabytes, moderate number of jar files and many small .m files.

source machine:
Pentium D 820@2.8GHz, 2GB DDR2 dual channel@667, Intel 946 mobo, on board SATA controller; Samsung 160GB SATA hard disk drive@7200rpm and 8M cache, Linux f10 2.6.27.12-170.2.5.fc10.i686,
35GB ext3 source partition (fresh).

target machine:
Pentium core2 duo E4500@2.2GHz, 2GB DDR2 dual channel@667, Intel 965 mobo, on board SATA controller; Samsung 160GB SATA hard disk drive@7200rpm and 2M cache, Linux f10 2.6.27.12-170.2.5.fc10.i686,
100GB ext3 copy-to partition (fresh).

Results

Method command	Timing	avg. CPU util.
uncompressed recursive scp scp -rq matlab_install prashant@10.105.41.19:matlab_install_regular_scp_unc	real 9m54.554s user 0m23.204s sys 0m15.103s	20
compressed recursive scp scp -Crq matlab_install prashant@10.105.41.19:matlab_install_regular_scp	real 11m8.391s user 3m48.508s sys 0m25.200s	85
uncompressed recursive rsync rsync -a matlab_install prashant@10.105.41.19:matlab_install_regular_rsync_unc	real 3m3.604s user 0m26.709s sys 0m21.664s	40
compressed recursive rsync rsync -az matlab_install prashant@10.105.41.19:matlab_install_regular_rsync	real 4m11.651s user 3m11.847s sys 0m31.892s	90
uncompressed tar+ssh tar -cf- matlab_install \| ssh prashant@10.105.41.19 'tar -xf- -C ~/matlab_install_hack_unc'	real 2m59.706s user 0m21.428s sys 0m14.020s	20
compressed tar+ssh tar -cf- matlab_install \| gzip -f1 \| ssh prashant@10.105.41.19 'tar -xzf- -C ~/matlab_install_hack_compr'	real 2m44.349s user 2m7.709s sys 0m18.114s	60

Conclusion

as seen from the timings, rsync and tar+ssh perform close-up; though tar+ssh beats rsync here.
On the other hand though, when updating the huge tree; rsync wins hands down - insane speedups!
scp is not to be used with more than a hundred files. period.

Friday, January 16, 2009

UUID business

Use *nix? Use many distributions at a time? Use many distributions on many disks?

Managing distributions installed on multiple disks can be a real pain in the a**, say you have fedora10 installed on one SATA disk, and Debian on other disk. Now, can you confidently tell which is the system drive? It gets really messy.

Have a portable hard drive you want to be mounted at a fixed place when attached? If there is another USB drive already attached, there is no way you can tell whether the new one is sdc or sdd.

UUID comes to rescue. Though not supported nor understood by the kernel, many distributions provide tools in initrd which can work with UUIDs. Every partition has a Universally Unique IDentifier (UUID), which can be used as a globally unique name for a partition.

Finding UUID of a partition

This command needs 'udev' package. It comes pre-installed on many modern Linux distributions. If not, please install it following your distribution's guidelines.

/lib/udev/vol_id --uuid /dev/sdaXX

replacing XX by the partition number.

Using UUIDs

Painless booting
UUID comes in handy when you have a separate boot and / partitions. In this case, the kernel and initrd resides on the boot partition, while the root, is on / partition. Naturally, the kernel needs to be told which partition to use as root.

say the root partition is sda8, then
# /lib/udev/vol_id --uuid /dev/sda8
b036863a-2846-4f57-a6db-e7716f5d903c

or use

# udevadm info --query=all --name=/dev/sda8 | grep UUID | sed q | cut -d'=' -f2
b036863a-2846-4f57-a6db-e7716f5d903c

This UUID can be used in the kernel parameter. Open up menu.lst of grub (or lilo.conf if you still use LiLo), and make the kernel entry look something like

kernel vmlinuz-2.6.27.9-159.fc10.i686 ro root=UUID=b036863a-2846-4f57-a6db-e7716f5d903c

Now you can safely boot without worrying about which drive has the boot partition.
Pseudo-permanent mountpoint settings

Continuing from the USB disk example, say the partition on the USB drive has UUID 51e3a299-68f3-466f-86ac-428c60420621, an entry can be added into /etc/fstab which looks like

UUID=ba92ef0d-bb4e-4632-bf22-133d9d3fa1f4 / ext3 defaults 1 1
UUID=b036863a-2846-4f57-a6db-e7716f5d903c /home ext3 defaults 1 2
UUID=51e3a299-68f3-466f-86ac-428c60420621 /media/thumbdrive auto defaults 0 0

Now, the USB thumbdrive will always be mounted at /media/thumbdrive :)

Adios.
--
prashant

Wednesday, January 14, 2009

Directx 11 (10) benchmark - comparison with Windows server STD 2008's directx 10

I am trying out both the beta of Windows 7 (build 7000) and release of Windows server 2008 (standard edition). Since it is on the same machine, naturally benchmark comparison was possible and made sense.

Specs :
Windows 7 - Installed on a defragmented 40gig logical partition of 7200SATA disk, which still has other stuff and has 12 gigs free. Regular install, default drivers for sound (doesn't matter in this case).

Windows server 2008 - Installed on a defragmented 20gig primary partition of same disk, which has 5 gigs free. Regular install, Google's drivers for sound (again, doesn't matter in this case).

PC config :
Pentium D 820 (2.8GHz) (I know it is crappy... not a gamer's processor and all, but that's what I've got)
Intel 946GZIS mobo (PCIE 16x), 667MHz FSB.
2Gig (dual channel) RAM.
NVidia 8800GS, factory OC at 650MHz/mem 950MHz, a dx10 card.
Driver version 181.20 (used the same binary on both installations).
(again, that's a budget card; but it is sufficient for me)

3DMark Vantage settings - res 1280x1024, no AA
everything else Extreme, all tests were conducted.

Windows 7 (dx11)
GPU score = 4337.75
CPU score = 17495.47
3DMarks = 5342.16

Windows server 2008 std (dx10)
GPU score = 3975.24
CPU score = 17995.19
3DMarks = 0 (!!)

The demos visually ran faster on DX11 (though I do not have the FPS numbers at the moment).

The GPU scores for DX11 show 9.1% improvement over DX10. In all, windows 7 definitely is worth giving a try IMO (provided I can afford it OR my school has MSDNAA agreement for it). It has other cool 'features' too, of which I doubt the usefulness.

Please drop a comment if you have any suggestions; since as you can see, I am new to graphics benchmarks :)