January 23, 2022

Bootloader basics

I spent a few days playing around with bootloaders for the first time. This post builds up to a text editor with a few keyboard shortcuts. I'll be giving a virtual talk based on this work at Hacker Nights on Jan 27.

There are a definitely bugs. But it's hard to find intermediate resources for bootloader programming so maybe parts of this will be useful.

If you already know the basics and the intermediates and just want a fantastic intermediate+ tutorial, maybe try this. It is very good.

The code on this post is available on Github, but it's more of a mess than my usual project.

Motivation: Snake

You remember snake bootloader in a tweet from a few years ago?

Install qemu (on macOS or Linux), nasm, and copy the snake.asm source code to disk from that blog post.

$ cat snake.asm
          [bits 16]                    ; Pragma, tells the assembler that we
                                       ; are in 16 bit mode (which is the state
                                       ; of x86 when booting from a floppy).
          [org 0x7C00]                 ; Pragma, tell the assembler where the
                                       ; code will be loaded.

          mov bl, 1                    ; Starting direction for the worm.
          push 0xa000                  ; Load address of VRAM into es.
          pop es

restart_game:
          mov       si, 320*100+160    ; worm's starting position, center of
                                       ; screen

          ; Set video mode. Mode 13h is VGA (1 byte per pixel with the actual
          ; color stored in a palette), 320x200 total size. When restarting,
          ; this also clears the screen.
          mov       ax, 0x0013
          int       0x10

          ; Draw borders. We assume the default palette will work for us.
          ; We also assume that starting at the bottom and drawing 2176 pixels
          ; wraps around and ends up drawing the top + bottom borders.
          mov       di, 320*199
          mov       cx, 2176
          rep
draw_loop:
          stosb                        ; draw right border
          stosb                        ; draw left border
          add       di, 318
          jnc       draw_loop          ; notice the jump in the middle of the
                                       ; rep stosb instruction.

game_loop:
          ; We read the keyboard input from port 0x60. This also reads bytes from
          ; the mouse, so we need to only handle [up (0x48), left (0x4b),
          ; right (0x4d), down (0x50)]
          in        al, 0x60
          cmp       al, 0x48
          jb        kb_handle_end
          cmp       al, 0x50
          ja        kb_handle_end

          ; At the end bx contains offset displacement (+1, -1, +320, -320)
          ; based on pressed/released keypad key. I bet there are a few bytes
          ; to shave around here given the bounds check above.
          aaa
          cbw
          dec       ax
          dec       ax
          jc        kb_handle
          sub       al, 2
          imul      ax, ax, byte -0x50
kb_handle:
          mov       bx, ax

kb_handle_end:
          add       si, bx

          ; The original code used set pallete command (10h/0bh) to wait for
          ; the vertical retrace. Today's computers are however too fast, so
          ; we use int 15h 86h instead. This also shaves a few bytes.

          ; Note: you'll have to tweak cx+dx if you are running this on a virtual
          ; machine vs real hardware. Casual testing seems to show that virtual machines
          ; wait ~3-4x longer than physical hardware.
          mov       ah, 0x86
          mov       dh, 0xef
          int       0x15

          ; Draw worm and check for collision with parity
          ; (even parity=collision).
          mov ah, 0x45
          xor [es:si], ah

          ; Go back to the main game loop.
          jpo       game_loop

          ; We hit a wall or the worm. Restart the game.
          jmp       restart_game

TIMES 510 - ($ - $$) db 0              ; Fill the rest of sector with 0
dw 0xaa55                              ; Boot signature at the end of bootloader

Now run:

$ nasm -f bin snake.asm -o snake.bin
$ qemu-system-x86_64 -fda snake.bin

Recording of snake bootloader

What a phenomenal hack.

I'm not going to get anywhere near that level of sophistication in this post but I think it's great motivation.

Hello world

Bootloaders are a mix of assembly programming and BIOS APIs for I/O. Since you're thinking about bootloaders you already know assembly basics. Now all you have to do is learn the APIs.

The hello world bootloader has been explained in detail (see here, here, and here) so I won't go into too much line-by-line depth.

In fact, let's just pull the code from the latter blog post.

$ cat hello.asm
bits 16 ; tell NASM this is 16 bit code
org 0x7c00 ; tell NASM to start outputting stuff at offset 0x7c00
boot:
    mov si,hello ; point si register to hello label memory location
    mov ah,0x0e ; 0x0e means 'Write Character in TTY mode'
.loop:
    lodsb
    or al,al ; is al == 0 ?
    jz halt  ; if (al == 0) jump to halt label
    int 0x10 ; runs BIOS interrupt 0x10 - Video Services
    jmp .loop
halt:
    cli ; clear interrupt flag
    hlt ; halt execution
hello: db "Hello world!",0

times 510 - ($-$$) db 0 ; pad remaining 510 bytes with zeroes
dw 0xaa55 ; magic bootloader magic - marks this 512 byte sector bootable!

The computer boots, prints "Hello world!" and hangs.

But aside from clerical settings (16-bit assembly, where the program exists in memory, padding to 512 bytes) the only real bootloader-y magic in there is int 0x10, a BIOS interrupt.

BIOS interrupts = API calls for I/O

BIOS interrupts are API calls. Just like syscalls in userland programs they have a specific register convention and number to call for the family of APIs.

When you write bootloader programs you'll spend most of your time at first trying to understand the behavior of the various BIOS APIs.

The two families we'll deal with in this post are the keyboard family (documentation here) and the display family (documentation here).

Run hello world

Anyway, back to the hello world. Assemble it with nasm and run it with qemu.

$ nasm -f bin hello.asm -o hello.bin
$ qemu-system-x86_64 -fda hello.bin

Printing hello world

Getting the hang of it?

IO Loop

The specific function we called above to write a character to the display is INT 10,E. The 0x10 is the argument that you call the int keyword with (e.g. int 0x10). And the E is the specific function within the 0x10 family. The E is written into the AH register before calling int. The ASCII code to be written is placed in the AL register.

Now that output makes some sense, let's do input. In the keyboard services documentation you may notice that INT 16,0 provides a way to block for user input. According to that page the ASCII character will be in AL when the interrupt returns.

Clearing the screen

You may have noticed some text gets displayed before our program runs. We can use INT 0x10,0 to clear the screen.

        ;; Clear screen
        mov ah, 0x00
        mov al, 0x03
        int 0x10

All together

Since the display function reads from the same register the input function outputs to, we can just call both interrupts after each other. Wrap this in a loop and we have the world's worst editor.

$ cat ioloop.asm
bits 16
org 0x7c00

main:
        ;; Clear screen
        mov ah, 0x00
        mov al, 0x03
        int 0x10

.loop:
        ;; Read character
        mov ah, 0
        int 0x16

        ;; Print character
        mov ah, 0x0e
        int 0x10

        jmp .loop

times 510 - ($-$$) db 0 ; pad remaining 510 bytes with zeroes
dw 0xaa55 ; magic bootloader magic - marks this 512 byte sector bootable!

By the way, the main label here (like the boot label above in hello.asm) is only to help the reader. It is not something the BIOS uses.

Now that we've got the code, let's run it!

$ nasm -f bin ioloop.asm -o ioloop.bin
$ qemu-system-x86_64 -fda ioloop.bin

Recording of ioloop bootloader

Digression on abstraction

There are two ways to build abstractions: assembly functions and nasm macros.

We could build a clear screen function like this:

clear_screen:
        ;; Clear screen
        mov ah, 0x00
        mov al, 0x03
        int 0x10
        ret

And then we can call this in the ioloop program like so:

bits 16
org 0x7c00

jmp main

clear_screen:
        ;; Clear screen
        mov ah, 0x00
        mov al, 0x03
        int 0x10
        ret

main:
        call clear_screen

.loop:
        ;; Read character
        mov ah, 0
        int 0x16

        ;; Print character
        mov ah, 0x0e
        int 0x10

        jmp .loop

times 510 - ($-$$) db 0 ; pad remaining 510 bytes with zeroes
dw 0xaa55 ; magic bootloader magic - marks this 512 byte sector bootable!

On the other hand if you do it in a macro:

bits 16
org 0x7c00

jmp main

%macro cls 0 ; Zero is the number of arguments
        mov ah, 0x00
        mov al, 0x03
        int 0x10
%endmacro

main:
        cls

.loop:
        ;; Read character
        mov ah, 0
        int 0x16

        ;; Print character
        mov ah, 0x0e
        int 0x10

        jmp .loop

times 510 - ($-$$) db 0 ; pad remaining 510 bytes with zeroes
dw 0xaa55 ; magic bootloader magic - marks this 512 byte sector bootable!

And nasm macros even have a way to write macro-safe labels by prefixing them with %% which is useful if you have conditions or loops within a macro.

The benefit of a macro I guess is that you're not using up the stack. The benefit of a function call is that you're not duplicating code every place you use a macro. The amount of generated code eventually becomes important in bootloaders because the code must fit into 512 bytes.

I lean more toward using macros in this code.

Complex input

Reading ASCII characters is not complicated as we saw above. But what if we want to build Readline style shortcuts like ctrl-a for jumping to the start of the line?

Using INT 16,0 as we do above is fine. But rather than solely reading from the result of that function call, there is a section of memory that contains both the character pressed and control characters pressed.

Based on documentation for this memory area (found here or here), we can build a macro for reading the pressed character:

%macro mov_read_character_into 1
  mov eax, [0x041a]
  add eax, 0x03fe   ; Offset from 0x0400 - sizeof(uint16) (since head points to next free slot, not last/current slot)
  and eax, 0xFFFF

  mov %1, [eax]
  and %1, 0xFF
%endmacro

And another macro for reading the pressed control character (if any):

%macro mov_read_ctrl_flag_into 1
  mov %1, [0x0417]
  and %1, 0x04    ; Grab 3rd bit: %1 & 0b0100
%endmacro

Cursor location

Lastly we'll use some cursor APIs that that allow us to handle newlines, backspace on the first column of a line, and ctrl-a (jump to beginning of line).

%macro get_position 0
  mov ah, 0x03
  int 0x10
%endmacro

%macro set_position 0
  mov ah, 0x02
  int 0x10
%endmacro

But there's something buggy about my goto_end_of_line function. Sometimes it works and sometimes it just jumps all over the screen in an infinite loop. Part of the problem is that the editor memory is the video card. The cursor location is only stored there and not in some program state like you might do in a high-level environment/language.

goto_end_of_line:
;; Get current character
  mov ah, 0x08
  int 0x10

;; Iterate until the character is null
  cmp al, 0
  jz .done

  inc dl
  set_position
  jmp goto_end_of_line

.done:
  ret

Alright, let's put all these pieces together.

Editor with keyboard shortcuts

Start with the basics in editor.asm.

; -*- mode: nasm;-*-

bits 16
org 0x7c00

  jmp main

Then add a clear screen macro.

%macro cls 0
  mov ah, 0x00
  mov al, 0x03
  int 0x10
%endmacro

Add macros for reading and printing.

%macro read_character 0
  ;; Read character
  mov ah, 0
  int 0x16
%endmacro

%macro print_character 1
  mov ax, %1
  mov ah, 0x0e
  int 0x10
%endmacro

Add cursor utilities.

%macro get_position 0
  mov ah, 0x03
  int 0x10
%endmacro

%macro set_position 0
  mov ah, 0x02
  int 0x10
%endmacro

goto_end_of_line:
;; Get current character
  mov ah, 0x08
  int 0x10

;; Iterate until the character is null
  cmp al, 0
  jz .done

  inc dl
  set_position
  jmp goto_end_of_line

.done:
  ret

And keyboard utilities.

%macro mov_read_ctrl_flag_into 1
  mov %1, [0x0417]
  and %1, 0x04    ; Grab 3rd bit: %1 & 0b0100
%endmacro

%macro mov_read_character_into 1
  mov eax, [0x041a]
  add eax, 0x03fe   ; Offset from 0x0400 - sizeof(uint16) (since head points to next free slot, not last/current slot)
  and eax, 0xFFFF

  mov %1, [eax]
  and %1, 0xFF
%endmacro

Now we can start the editor loop where we wait for a keypress and handle it.

editor_action:
  read_character

Don't print ASCII garbage if the key pressed is an arrow key. Just do nothing. (This isn't good editor behavior in general but ours is a limited one.)

;; Ignore arrow keys
  cmp ah, 0x4b    ; Left
  jz .done
  cmp ah, 0x50    ; Down
  jz .done
  cmp ah, 0x4d    ; Right
  jz .done
  cmp ah, 0x48    ; Up
  jz .done

Next handle backspace.

;; Handle backspace
  cmp al, 0x08
  jz .is_backspace

  cmp al, 0x7F    ; For mac keyboards
  jnz .done_backspace

.is_backspace:

  get_position

If this key is pressed at the first line and the first column, do nothing.

;; Handle 0,0 coordinate (do nothing)
  mov al, dh
  add al, dl
  jz .overwrite_character

Otherwise if backspace is pressed not at the beginning of the line, just overwrite the last character with the ASCII 0 (the code 0 not the digit 0).

  cmp dl, 0
  jz .backspace_at_start_of_line
  dec dl      ; Decrement column
  set_position
  jmp .overwrite_character

Otherwise you're at the beginning of the line and you need to jump to the end of the previous line.

.backspace_at_start_of_line:
  dec dh      ; Decrement row
  set_position

  call goto_end_of_line

.overwrite_character:
  mov al, 0
  mov ah, 0x0a
  int 0x10

  jmp .done

.done_backspace:

Next we handle the Enter key. This should move the cursor onto the next line and set the column back to zero.

;; Handle enter
  mov_read_character_into ax
  cmp al, 0x0d
  jnz .done_enter

  get_position
  inc dh      ; Increment line
  mov dl, 0     ; Reset column
  set_position

  jmp .done

.done_enter:

Next we handle ctrl-a, jump to start of line.

;; Handle ctrl- shortcuts

;; Check ctrl key
  mov_read_ctrl_flag_into ax
    jz .ctrl_not_set

;; Handle ctrl-a shortcut
  mov_read_character_into ax
  cmp al, 1     ; For some reason with ctlr, these are offset from a-z
  jnz .not_ctrl_a

;; Reset column
    mov dl, 0
    set_position

    jmp .done

.not_ctrl_a:

For ctrl-e, jump to the end of the line.

;; Handle ctrl-e shortcut
  mov_read_character_into ax
  cmp al, 5
  jnz .not_ctrl_e

  call goto_end_of_line
  jmp .done

.not_ctrl_e:
  jmp .done

.ctrl_not_set:

Finally if none of these cases are met, just print the pressed character and return.

  mov_read_character_into ax
  print_character ax

.done:
  ret

Finally, create the main function that calls this editor code in a loop.

main:
  cls

.loop:
  call editor_action
  jmp .loop

times 510 - ($-$$) db 0 ; pad remaining 510 bytes with zeroes
dw 0xaa55 ; magic bootloader magic - marks this 512 byte sector bootable!

And we're done! Try it out:

$ nasm -f bin editor.asm -o editor.bin
$ qemu-system-x86_64 -fda editor.bin

Recording of a bad editor

Tedious and buggy! But I learned something, I think.