I’ve spent some time learning about operating systems and I believe my programming has benefitted from it. I think I have reached a point where I can provide some help to others who want to learn the basics of how their operating system works with their processor, so I’m planning to write a series about it. I’m new to this, so feel free to email me with corrections or suggestions.

In this post, I’ll write a short bootloader. My introduction to this topic came from Operating Systems: From 0 to 1 [1], which provides a sample bootloader that does not work (on my Ubuntu 20.04 Intel system, with a current version of NASM). I’ll do something similar to what is outlined in chapter 7.

The purpose of a bootloader is to load the operating system into memory and execute it. The bootloader is the first executable code that is loaded into RAM from the storage media that is available to your computer, such as a hard disk or floppy. This post will describe the floppy approach: this is simpler as we do not need to consider disk partitions, but has the drawback that less storage is available [6].

When an Intel processor starts, it is initially in 16-bit real mode, which emulates the operation of the 8086 architecture for legacy reasons [2]. This has a couple of important implications for a bootloader program:

  • Registers used will be the 16-bit versions: for example, you cannot use the 32-bit eax or 64-bit rax register and instead must use ax, which refers to the lowest 16 bits of eax or rax on 32-bit and 64-bit processors respectively.
  • Memory addressing is frequently done by combining two registers: a segment and an offset. This is a quirk of the 8086. You may see notation such as 0xFFFF:0x0000 or es:bx. You will see this in documentation of BIOS interrupt service routines [3]. To obtain the actual address, the segment value is left shifted by four bits and added to the offset. For example, 0xFFFF:0x0000 becomes 0xFFFF0. For a more detailed discussion, you can refer to [4, p.43].
  • One of the responsibilities of the bootloader (or early kernel code) is to switch the processor from 16-bit real mode to protected mode [4]. I will not discuss this process here, but the implication is that in real mode all facilities of the processor are open to us: there is no protection from a malicious bootloader program writing undesirable values to the output ports which are connected to your peripheral devices using the OUT instruction. You can refer to the Intel manual for more information on protected registers such as the IDTR and GDTR and protected instructions like LIDT, LGDT, OUT, and IN [5].

The bootloader program below will execute a BIOS interrupt service routine to load a placeholder kernel program from elsewhere on the floppy disk. It will use interrupt 13 with ah = 0x02 (ah is the upper 8 bits of the ax register. It’s also worth noting that hexadecimal instructions in documentation such as the Intel manual or the interrupt list are denoted either with a 0x prefix or a h suffix. So we could say ah = 0x02 or ah = 02h and still be referring to the same 8-bit binary number 0000 0010).

The documentation for the interrupt can be found here. Here are the relevant parts:

AH = 02h
AL = number of sectors to read (must be nonzero)
CH = low eight bits of cylinder number
CL = sector number 1-63 (bits 0-5)
high two bits of cylinder (bits 6-7, hard disk only)
DH = head number
DL = drive number (bit 7 set for hard disk)
ES:BX -> data buffer

Return:
CF set on error
if AH = 11h (corrected ECC error), AL = burst length
CF clear if successful
AH = status (see #00234)
AL = number of sectors transferred (only valid if CF set for some
BIOSes)

It’s worth looking at a diagram of a floppy disk to understand what the parameters mean (source):

floppy

A 512 byte sector is the smallest quantity that can be read into memory at one time. A track is a ring of 18 sectors on the disk. Floppy disks have two sides: drive 0 refers to the needle used to read or write the top, while drive 1 is for the bottom. After all sectors from track 0, head 0 have been read, the next sequence of data comes from track 0, head 1 - on the bottom of the disk [1].

The bootloader program can only fill a single 512-byte sector because it is read from track 0, sector 0, head 0 by the BIOS program, stored in ROM. The program will load the “kernel” from sector 1, track 0 and execute it:

start:	
	cli
	mov ax, 0x50
	mov es, ax 		
	xor bx, bx		; es:bx contains the memory address at which the sector(s) will be loaded by this interrupt routine. The es:bx notation means that the value in es is left shifted by 4 bits before adding to bx. Therefore es:bx = 0x500.
	mov cl, 2		; first sector to read	
	mov al, 1		; number of sectors to read starting from the sector number in cl
	mov ch, 0		; lower eight bits of the cylinder number
	mov dh, 0 		; head number 
	mov dl, 0		; drive number 
	mov ah, 0x02		; used to indicate which routine we want
	int 0x13		; interrupt 13 with ah = 02h
	mov bx, 0x0500 		; the memory address of the kernel loaded into memory
	jmp bx			; jump to the kernel code
	

times 510-($-$$) db 0		; $ is an assembler directive referring to the current position (in bytes) within this file at the beginning of this line. $$ refers to the beginning of the current section within the file. This ensures that our binary is 512 bytes in total, and trailing bytes up to the boot signature are zeros.
dw 0xAA55					; This is the boot signature. It is a magic number required by the BIOS to be in the bootloader program. If we do not include it, we'll get a "not a bootable disk" error.

There are a couple of things worth pointing out. xor bx, bx clears the bx register to zero, so that when es is left shifted by 4 bits and added to bx, the value is 0x500. Aside from the tricky 8086 memory addressing, it is mostly straightforward: the program from chapter 7 of Operating Systems: From 0 to 1 tries to perform a far jump with jmp 0x50:0x0 which induces a jump to 0x00 using the q35 chipset with QEMU 4.2.1 (Debian 1:4.2-3ubuntu6.23) and NASM 2.14.02. I’ll look into this further in another post by working backwards from the bytes generated by NASM to determine which instruction is being produced (GDB’s disassembly can be inconsistent, and was not clear here). In the meantime, placing the address in a register (bx) has solved the issue.

Notice that the value placed in al is the number of sectors to load - you can imagine how you might increase this to load a larger operating system, rather than just the second sector. Most modern operating systems will not fit on a floppy disk, but the procedure for hard disks is similar: a 512-byte sector is read from the desired partition [6, p. 116].

All of the code for this project is hosted here with a Makefile. I’ll go through the steps for compiling and debugging the program, so we can verify the code is working. It may be useful to have NASM >= 2.14 and QEMU >= 4.2 installed to match these steps exactly. I’ve included a sample kernel program with the following code:

mov ax, 0x0101
mov ax, 0x0333

marker:	 db "this is where the kernel ends"

This does not accomplish much, but the instructions are unique so we can step through in the debugger and determine if the kernel has been loaded into memory and set to execute.

To compile, we need to convert the assembly instructions into bytes that are interpretable by the processor:

nasm -f bin bootloader.asm -o bootloader
nasm -f bin kernel.asm -o kernel

We need to construct a floppy disk image to provide to QEMU:

dd if=/dev/zero of=disk.img bs=512 count=2880

Lastly, we need to write the bootloader to the first sector (seek=0) and the kernel to the second (seek=1). The conv=notrunc option ensures that the floppy disk remains 1.44 MB in size. This is the amount of storage available on a standard floppy (two heads * 80 tracks per head * 18 sectors per track * 512 bytes per sector = 1.44MB). See this for additional explanation.

dd conv=notrunc if=bootloader of=disk.img bs=512 count=1 seek=0
dd if=kernel of=disk.img bs=512 count=1 seek=1

We can verify that all of our code up to the end of the kernel program was written to the image using hd which displays the content of the binary and decodes any ASCII:

ooc@thinkpad:~/writeyourownos$ hd disk.img 
00000000  fa b8 50 00 8e c0 31 db  b1 02 b0 01 b5 00 b6 00  |..P...1.........|
00000010  b2 00 b4 02 cd 13 bb 00  05 ea 00 00 50 00 00 00  |............P...|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200  b8 01 01 b8 33 03 74 68  69 73 20 69 73 20 77 68  |....3.this is wh|
00000210  65 72 65 20 74 68 65 20  6b 65 72 6e 65 6c 20 65  |ere the kernel e|
00000220  6e 64 73 00 00 00 00 00  00 00 00 00 00 00 00 00  |nds.............|
00000230  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00168000

We can see that the marker message from kernel.asm was included. We inserted the marker using the db command, which is an assembler directive, not an instruction that your processor understands: it means “define byte” (similarly, we have dw and dd for defining words and double words respectively). Conveniently, you do not need to type db before every character in a string: each character is converted to its ASCII representation by the assembler.

We can execute our code on QEMU with the following:

qemu-system-i386 -machine q35 -fda disk.img -gdb tcp::26000 -S

The -S option ensures that the machine will not start until we have entered GDB and entered continue. QEMU will greet you with a black screen until you connect with GDB using target remote localhost:26000.

(base) ooc@thinkpad:~$ gdb
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
(gdb) target remote localhost:26000
target remote localhost:26000
Remote debugging using localhost:26000
0x0000fff0 in ?? ()
(gdb) b *0x7c00
Breakpoint 1 at 0x7c00
(gdb) c
Continuing.

Breakpoint 1, 0x00007c00 in ?? ()

The bootloader will be loaded at 0x7c00 in memory by the BIOS. We set a breakpoint there and type c to continue to where the bootloader has been loaded. To get more insight into the current state of the registers and what instructions are being executed, you can type layout reg at the prompt:

(gdb) layout reg
┌──Register group: general─────────────────────────────────────────────────────┐
│eax            0xaa55              43605                                      │
│ecx            0x0                 0                                          │
│edx            0x0                 0                                          │
│ebx            0x0                 0                                          │
│esp            0x6f00              0x6f00                                     │
│ebp            0x0                 0x0                                        │
└──────────────────────────────────────────────────────────────────────────────┘
│B+>0x7c00      cli                                                            │
│   0x7c01      mov    $0xc08e0050,%eax                                        │
│   0x7c06      xor    %ebx,%ebx                                               │
│   0x7c08      mov    $0x2,%cl                                                │
│   0x7c0a      mov    $0x1,%al                                                │
│   0x7c0c      mov    $0x0,%ch                                                │
└──────────────────────────────────────────────────────────────────────────────┘
remote Thread 1.1 In:                                          L??   PC: 0x7c00 
(gdb) 

You can see the first instruction of our bootloader, where we clear the interrupt flag, is at the current breakpoint. Step through all of the interrupt parameters using the ni command until you reach the int instruction. You can type set disassembly-flavor intel to get the familiar assembly format from the .asm files we compiled using NASM.

┌──Register group: general─────────────────────────────────────────────────────┐
│eax            0x201               513                                        │
│ecx            0x2                 2                                          │
│edx            0x0                 0                                          │
│ebx            0x0                 0                                          │
│esp            0x6f00              0x6f00                                     │
│ebp            0x0                 0x0                                        │
│esi            0x0                 0                                          │
│edi            0x0                 0                                          │
│eip            0x7c14              0x7c14                                     │
│eflags         0x46                [ IOPL=0 ZF PF ]                           │
┌──────────────────────────────────────────────────────────────────────────────┐
│   0x7c04      mov    es,eax                                                  │
│   0x7c06      xor    ebx,ebx                                                 │
│   0x7c08      mov    cl,0x2                                                  │
│   0x7c0a      mov    al,0x1                                                  │
│   0x7c0c      mov    ch,0x0                                                  │
│   0x7c0e      mov    dh,0x0                                                  │
│   0x7c10      mov    dl,0x0                                                  │
│   0x7c12      mov    ah,0x2                                                  │
│  >0x7c14      int    0x13                                                    │
│   0x7c16      mov    ebx,0xea0500                                            │
│   0x7c1b      add    BYTE PTR [eax+0x0],dl                                   │
└──────────────────────────────────────────────────────────────────────────────┘
remote Thread 1.1 In:                                          L??   PC: 0x7c14 
(gdb) ni
0x00007c12 in ?? ()
(gdb) set disassembly-flavor intel
(gdb) ni
0x00007c14 in ?? ()
(gdb) 

You can use ni to step over function calls, but this does not work for interrupt service routines. To step over without getting stuck in a long routine, set a breakpoint at the following instruction and continue with c.

┌──Register group: general─────────────────────────────────────────────────────┐
│eax            0x1                 1                                          │
│ecx            0x2                 2                                          │
│edx            0x0                 0                                          │
│ebx            0x0                 0                                          │
│esp            0x6f00              0x6f00                                     │
│ebp            0x0                 0x0                                        │
│esi            0x0                 0                                          │
│edi            0x0                 0                                          │
│eip            0x7c16              0x7c16                                     │
│eflags         0x46                [ IOPL=0 ZF PF ]                           │
┌──────────────────────────────────────────────────────────────────────────────┐
│B+>0x7c16      mov    ebx,0xea0500                                            │
│   0x7c1b      add    BYTE PTR [eax+0x0],dl                                   │
│   0x7c1e      add    BYTE PTR [eax],al                                       │
│   0x7c20      add    BYTE PTR [eax],al                                       │
│   0x7c22      add    BYTE PTR [eax],al                                       │
│   0x7c24      add    BYTE PTR [eax],al                                       │
│   0x7c26      add    BYTE PTR [eax],al                                       │
│   0x7c28      add    BYTE PTR [eax],al                                       │
│   0x7c2a      add    BYTE PTR [eax],al                                       │
│   0x7c2c      add    BYTE PTR [eax],al                                       │
│   0x7c2e      add    BYTE PTR [eax],al                                       │
└──────────────────────────────────────────────────────────────────────────────┘
remote Thread 1.1 In:                                          L??   PC: 0x7c16 
(gdb) ni
0x00007c12 in ?? ()
(gdb) ni
0x00007c14 in ?? ()
(gdb) set disassembly-flavor intel
(gdb) break *0x7c16
Breakpoint 2 at 0x7c16
(gdb) c
Continuing.

Breakpoint 2, 0x00007c16 in ?? ()
(gdb) 

You may notice when stepping through the code in GDB that mov bx, 0x500 becomes mov ebx,0xea0500. I think this is because GDB does not know we are simulating a 16-bit processor mode. It might also be helped by compiling our assembly in ELF format and supplying GDB with some debugging information [7]. Regardless, the above instruction sets the lower 16 bits of bx to 0x0500 as desired. Type ni once more to hit the jmp bx command. You can see that the kernel has been loaded into memory at 0x500:

┌──Register group: general─────────────────────────────────────────────────────┐
│eax            0x1                 1                                          │
│ecx            0x2                 2                                          │
│edx            0x0                 0                                          │
│ebx            0x500               1280                                       │
│esp            0x6f00              0x6f00                                     │
│ebp            0x0                 0x0                                        │
│esi            0x0                 0                                          │
│edi            0x0                 0                                          │
│eip            0x500               0x500                                      │
│eflags         0x46                [ IOPL=0 ZF PF ]                           │
┌──────────────────────────────────────────────────────────────────────────────┐
│  >0x500       mov    eax,0x33b80101                                          |
│   0x503       mov    eax,0x68740333                                          │
│                                                                              |
│   0x505       add    esi,DWORD PTR [eax+ebp*2+0x69]                          │
│   0x509       jae    0x52b                                                   │
│   0x50b       imul   esi,DWORD PTR [ebx+0x20],0x72656877                     │
│   0x512       and    BYTE PTR gs:[eax+ebp*2+0x65],dh                         │
│   0x517       and    BYTE PTR [ebx+0x65],ch                                  │
│   0x51a       jb     0x58a                                                   │
│   0x51c       gs ins BYTE PTR es:[edi],dx                                    │
│   0x51e       and    BYTE PTR [ebp+0x6e],ah                                  │
│   0x521       fs jae 0x524                                                   │
│   0x524       add    BYTE PTR [eax],al                                       │
└──────────────────────────────────────────────────────────────────────────────┘
remote Thread 1.1 In:                                           L??   PC: 0x500 
(gdb) ni
0x00007c19 in ?? ()
(gdb) ni
0x00000500 in ?? ()
(gdb) 

These are the two placeholder mov commands from our “kernel,” (mov ax, 0x0101 and mov ax, 0x0333) so we have succeeded in loading it from the floppy!

If any of the general purpose assembly instructions were confusing, you can refer to [7] or [8]. The former has useful discussion of memory addressing and processor modes; the latter covers more ground and has some discussion of floating point coprocessor instructions. The Intel instruction reference [5] may also be useful for information about rarer instructions like cli.

Several tutorials exist for writing your own operating system, but most cover few topics. Since writing an OS is a big project and my goal is to understand the fundamentals, future posts will describe the steps that were most difficult for me. I do not plan to cover the bulk of the operating system code in my posts, so I’ll refer to more detailed resources with the specifics of what I learned from them or what code I used. Linus took a similar approach when writing Linux: many remnants of MINIX are still visible in the source code (compare MINIX ioctls and Linux ioctls, for example).

Hopefully this helped to give you a better understanding of one of the ways that your operating system interfaces with hardware, via the BIOS. Here are a few additional teaching operating systems which might be useful resources:

You are welcome to to email me with corrections or suggestions, as I’m still new to operating systems development. Citations: