Saturday, February 18, 2012

Writing kernel in Windows with Visual Studio C/C++

Most hobby osdev projects prefer *nix+gcc combination these days, primarily because there are a bunch of nice tutorials and examples available for them. Considering the flexibility, I too personally think it is a good choice. But if you are a heretic by nature and are curious about how you can osdev with just the express edition of Visual C++ and some other free software without leaving windows (or forced), this post is for you.

In this post I will simply demonstrate the toolchain setup. Complete package can be downloaded from the lower 'Downloads' section. The actual 'kernel' that is written in this tutorial is the lame hello-world one. We will get it working with PXE boot in VirtualBox (replace with any emulator of your choice). It works with default Windows Boot Manager (WBM) without modifications (GRUB equivalent, just lamer) and can easily be modified to work with mbr/vbr, booting with stages. Let's face it - even osdev noobs these days are so over floppies.

Kernel loader (stub)
For a kernel loader stage, there are many ways of entering protected mode, loading the actual kernel image, relocating and passing control to the kernel. Here is one that I think is simple enough to fit in a blogspot post.

The image loaded by PXE/WBM consists of two sections - the stub and the kernel executable image. Both PXE and WBM (are supposed to) load our image at physical address 0x7c00, or 7c0h:0h in real mode. This means that the stub must be 16bit real mode code that is statically linked to run at 7c00h. The kernel executable output by Visual C++ compiler cl.exe on the other hand is 32bit protected mode code. Also, the kernel executable needs to be relocated before it can be run. Thus, the stub must set up pmode, relocate the kernel to the address it is supposed to run and then pass control to it.

The stub in this tutorial is written in assembler for ease. I have used yasm, primarily because it supports both At&T and Intel syntax and it can output a flat binary.

stub.asm

; toyos boot stub

; irrespective of whether we are (chain)loaded from Windows bootloader or through PXE, we are loaded at 7c0:0
[org 7c00h]
; because i386 is old :P
[CPU P4]


NULL_SEG equ 00h    ; unused
CODE_SEG equ 08h    ; cs
DATA_SEG equ 10h    ; all other segments

section .text
[bits 16]
_entry:
 cli ; no interrupts
 xor ax, ax
 mov ds, ax
 
 lgdt [gdt_desc] ; load GDTR
 mov eax, cr0
 or al, 1
 mov cr0, eax    ; set pmode bit
 jmp CODE_SEG:clear_pipeline ; far jump ensures a prefetch queue flush and truely enter pmode

[bits 32]
clear_pipeline:
 mov ax, DATA_SEG
 mov ds, ax
 mov es, ax
 mov ss, ax
 lea esp, [initial_stack_top]
 jmp chain

chain:
 mov byte [0b8000h], '1'     ; just some on-screen deubgging
 mov byte [0b8001h], 01bh

 EXELOADADDR    equ PAYLOAD_ADDRESS  ; the load address of payload (kernel) exe
    ; PE header offsets
 sigMZ         equ esi
 PEheaderOffset   equ esi+60
 sigPE         equ esi
 NumSections      equ esi+6
 BaseOfCode      equ esi+52
 EntryAddressOffset   equ esi+40
 SizeOfNT_HEADERS   equ 248

 SectionSize      equ esi+8
 SectionBase      equ esi+12
 SectionFileOffset   equ esi+20
 SizeOfSECTION_HEADER   equ 40

 mov esi, EXELOADADDR
 mov eax, [sigMZ]
 cmp ax, 0x5A4D  ; signature check
 jnz badPE
 mov eax, [PEheaderOffset]
   
 add esi, eax
 mov eax, [sigPE]
 cmp eax, 0x00004550
 jnz badPE
   
 xor edx, edx
 mov dx, [NumSections]
 mov eax, [BaseOfCode]
 mov ebx, [EntryAddressOffset]
   
 add ebx, eax
 push ebx
   
 add esi, SizeOfNT_HEADERS

    ; load each section
 .loadloop:
  mov ecx, [SectionSize]
  mov edi, [SectionBase]
  add edi, eax
  mov ebx, [SectionFileOffset]
  add ebx, EXELOADADDR
   
  push esi
   
  mov esi, ebx
  rep movsb       ; copy each section to its respective load/run address
   
  pop esi
  add esi, SizeOfSECTION_HEADER
   
  dec edx
  or edx, edx
  jnz .loadloop

  pop ebx ; restore entry
  jmp ebx ; jump to entry

; PE image invalid
badPE:
 mov eax, 0b8000h
 mov byte [eax], '!'
 mov byte [eax + 1], 01bh

; spin away!    
infloop:
 hlt
 jmp infloop

; data section
section .data
initial_stack:
 times 128 dw 0  ; oughtta be enough
initial_stack_top:

; global descriptor table
gdt:
gdt_null:
    dd 0
    dd 0

gdt_code:
    dw 0FFFFh   ; RTFM
    dw 0
    db 0
    db 10011010b
    db 11001111b
    db 0

gdt_data:
    dw 0FFFFh
    dw 0
    db 0
    db 10010010b
    db 11001111b
    db 0
gdt_end:


gdt_desc:                       ; The GDT descriptor
    dw gdt_end - gdt - 1    ; Limit (size)
    dd gdt                  ; Address of the GDT

align 4
PAYLOAD_ADDRESS:    ; this is where the exe is loaded
stub.asm can be compiled as

yasm -Xvc -f bin --force-strict -o stub.o stub.asm
This should create a flat binary for stub.

Kernel
If you reached here, you probably know what you can/should and can/should not use in a kernel. I am assuming that you have some experience with writing such code in other toolchains.

The kernel for this tutorial is hilariously simple. It can be C or C++.

kernel.cc

// refer intrinsic strlen by cl compiler
extern "C" size_t strlen(const char*);

int main()
{
 char* msg = "yuffie> SO U HACKING ME THEN HUH\n"
                "yuffie> WElL I GOT NEWS FOR U MISTER I GOT MORE FIREWALL POWERS NOW SO IM SECURE AND IM USING WINDOWS 98 SO IM REALLY SECURE FROM HACKERS LIKE YOU SO YOU BETTA JUST GIVE UP CUZ U GOT NO HOPE MISTER.\n"
                "* YuFFie (~mirc@3B942731.dsl.stlsmo.swbell.net) Quit (Quit: Owned.)\n"
                "* YuFFie (~mirc@3B942731.dsl.stlsmo.swbell.net) has joined #\n"
                "yuffie> HELP MY MOUSE IS MOVING BY IT SELF";
 int msgLen = strlen(msg);
 char* vidmem = (char*)0xB8000;
 int vidIdx = 0;
 for(int i = 0; i < msgLen; ++i)
 {
        if(msg[i] == '\n')
        {
            vidIdx = (((vidIdx / 80) + 1) * 80) % (80 * 25);
        }
        else
        {
            vidmem[2 * vidIdx] = msg[i];
            vidmem[(2 * vidIdx) + 1] = 0x1b;
            ++vidIdx;
        }
 }

 while(true);

 return 0;
}

Here is the tricky part. We must compile kernel.cc in such a way that it can run standalone, without any ties with Windows hosted environment. For Visual C++ 2010, I have seen the following switches working. I am going to save on typing as MSDN has explanation for each of those switches in detail.

cl /Zi /nologo /W4 /WX- /O2 /Oi /Oy- /GL /X /c /GF /Gm- /Zp4 /GS- /Gy- /fp:precise /fp:except- /Zc:wchar_t /Zc:forScope /GR- /openmp- /Gd /analyze- /Fd"kernel.pdb" /Fo"kernel.obj" kernel.cc
link /VERBOSE /VERSION:"0.0.0.1" /NOLOGO /NODEFAULTLIB /MANIFEST:NO /ALLOWISOLATION:NO /DEBUG /SUBSYSTEM:NATIVE /LARGEADDRESSAWARE /DRIVER /ENTRY:"main" /BASE:"0x00100000" /FIXED /MACHINE:X86 /ALIGN:4096 /SAFESEH:NO /OPT:REF /OPT:ICF /OUT:kernel.o kernel.obj

Creating the boot image
This is the simplest part.

copy /B /Y stub.o+kernel.o kernel

Yay! Now you have a PXE/WBM bootable image!

PXE boot setup instructions (or lack thereof)
Unless you are using Windows Server as the host OS, chances are that you will have to work with a non-Microsoft solution for DHCP and TFTP servers with pxeboot support. I have used tftpd32 in the past as well as in this tutorial, and have had no complaints thus far. Setting up DHCP and TFTP servers is out of scope of this tutorial. tftpd32 configuration is a walk in the park. Just attach the DHCP and TFTP servers on local interface, set the TFTP root and file name as the boot image. Fire up any virtual machine emulator of your choice that supports PXE boot ROM in its NIC. I have used VirtualBox. Start the VM, boot from LAN. You should see some activity in tftpd32 and moments later you should see the message.

WBM
It is fairly easy to get your kernel alongside Windows entry in the Windows bootloader. There are tutorials [1, 2] to create new BCD entries. Just add your kernel as a BOOTSECTOR application. EasyBCD is an option in case you are not comfortable with bcdedit.

Download
Using Visual C++ IDE
Download
Sure, these build rules can be set in VC++ projects. Enjoy one-button-hit builds with VC++'s Intellisense. Eventually when you get a debugger stub in your kernel, you could also get source level debugging working within WinDbg.
32bit versions of yasm and tftpd32 are included.

Using Makefiles
Download
I never felt comfortable when an IDE came between me and my source code custom build rules. If you too think that makefiles are the way to go, nmake works out of the box. I personally think CMake is a better choice if your project gets serious.
32bit versions of yasm and tftpd32 are included.

Please let me know if decide to use Microsoft compilers for osdev.

Good luck!

3 comments: