March 12, 2018

Blinking LEDs w/ MSP430 assembly pt1

Lets blink an LED on a MSP430 launchpad the old fashioned way! Assembly! Don’t worry, MSP430 has a nice, small RISC instruction set. It’s only true 24 instructions, with a total of 27 higher level ones emulated by the assembler.

The full user’s guide for the MSP430 microcontroller i will be covering (on an older launchpad) is available here ,
grab a copy because it covers the instruction set and microcontroller completely.

First, I will present the source with the CCS generated boiler plate for ELF sections and stack setup:

            .cdecls C,LIST,"msp430.h"       ; Include device header file
            .def    RESET                   ; Export program entry-point to
                                            ; make it known to linker.
            .text                           ; Assemble into program memory.
            .retain                         ; Override ELF conditional linking
                                            ; and retain current section.
            .retainrefs                     ; And retain any sections that have
                                            ; references to current section.

    RESET       mov.w   #__STACK_END,SP         ; Initialize stackpointer
    StopWDT     mov.w   #WDTPW|WDTHOLD,&WDTCTL  ; Stop watchdog timer

            ; our code will go here moving forward

    .global __STACK_END
    .sect   .stack
    .sect   ".reset"                ; MSP430 RESET Vector
    .short  RESET

For all intents and purposes we can ignore the setup scaffolding, it just turns off the watchdog timer and setups the stack and reset vector. This lets you press the reset button to .. jump to our RESET label.

To blink an LED on a microcontroller you need to do 2 things:

  • Set the pin(s) of the PORT the LED is connected to to OUTPUT
  • Switch the output high/low

On the Ti launchpad, there are two LEDs on PORT1. For simplicity we can set the entire port to output. To do this, mov an immediate value into the named mapping like so:

mov #0FFh, P1DIR ; P1DIR is port1 direction control. '#0FFh' is an immediate value in hex

Now the entire port is set as output. To check, you could just try to lightup all the LEDs by turning all the pins on:

mov #0FFh, P1OUT

However this isnt very exciting because they just stay lit. We need to toggle. The easiest way is an exclusive or (xor) of the bits.

xor #0FFh, P1OUT  ; flip all bits in P1OUT

However, we need to loop over this so we dont just change the output once and fall through:

    xor #0FFh, P1OUT  ; flip all bits in P1OUT
    jmp LOOP

Astute readers know what’s coming next. if you run this code the LEDs will appear to be solid. It’s because the loop’s executing so fast the human eye cannot see the difference. In fact there probably is no difference, as the rate this loop executes is many 1000s of instructions per second.

I lied when I said we only need to do 2 things. We need a delay loop. Unfortunately we do not have something as convenient as wiring’s delay(). We need to make a delay loop by hand! This is much easier than it sounds. There are a few ways to do so, we will do the naive approach first:

  • Put a big value in a register
  • decrement the value until it hits zero
  • once it hits zero, execute our toggle code

The tricky part about this approach is figuring out how big of a number you need, which can depend on the MCU clock rate, and what instructions you effectively execute within the loop. Precise timing can be achieved with some math and nop sleds. But for our purpose, we can ballpark some visible delay.

Lets stuff our countdown in the first general register R4. The MSP430 is a 16-bit cpu so lets try the biggest 16-bit unsigned value to start with:

mov #0FFFFh, R4

Now lets run it down to zero. We can check if it hit zero using the status register SR, this is like the x86 EFLAGS register. It will have bits set automatically after certain instructions, which are documented for every instruction that affects status flags. Alternatively, there is a conditional jump when the zero flag is set, so we will use it - jnz.

    SUB #1h, R4 
    ; we fall through here once delay loop is done 

This will spinwait until R4 hits zero and then fall through, now all that’s left is to jump to top loop again:

    jmp LOOP 

This will start the whole process over. On a MSP430G2553, this first try at a delay loop works out to around 250ms, making for a nice blinking rate.

For completeness, the entire body of our effective code looks like this (with some added comments):

            mov #0FFh, P1DIR    ; setup - set port1 to output
            xor #0FFh, P1OUT    ; flip port 1 bits
            mov #0FFFFh, R4     ; R4 will be our delay counter

            SUB #1h, R4         ; subtract 1 from R4...
            jnz DELAYLOOP       ; if we hit zero we're done with delay loop
            jmp LOOP

This code can be simplified even further, do you see how? This is an exercise left to the reader. It’s also midnight and I’m starting to get tired in my old age.

In part two we will use the built in timer peripherals of the microcontroller and use interrupts for a cleaner approach. After that it will be on to PWM (Pulse Width Modulation) duty cycles and controlling a servo using an ADC input.

asm embedded msp430 blinky assembly
March 8, 2018

Simple sampling with Box-Muller transforms pt 1

I’m a technically uneducated idiot, so talking about math is a bit above my pay grade, but Box-Muller transform is a straightforward, computationally simple way to generate a distribution of points.

This is useful in many cases where you need to generate samples for simulations, so I think every programmer should have familiarity with it.

Lets take the example code from the wikipedia and convert it to standard C. This program will just print 60 x/y samples.

First, checkout the cool visualization of the transform on uniform inputs from the wikipedia! If you right click and view the image directly, you can hover on the crosses to see points there.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <float.h>
#include <time.h>

static const double epilson = DBL_MIN;
static const double pi2 = 2.0 * M_PI;

struct sample {
    double z0;
    double z1;

struct sample box_muller_sample() {
    double u1;
    double u2;
    struct sample ret;

    do {
        u1 = rand() * (1.0 / RAND_MAX);
        u2 = rand() * (1.0 / RAND_MAX);
    } while (u1 <= epilson);

    ret.z0 = sqrt(-2.0 * log(u1)) * cos(pi2 * u2);
    ret.z1 = sqrt(-2.0 * log(u1)) * sin(pi2 * u2);
    /* no sigma or mu */
    return ret;

int main(int argc, char **argv) {
    time_t curtime;

    puts("Box-Muller samples:");
    for (int line = 0; line < 10; line++) {
        for (int i = 0; i < 6; i++) {
            struct sample bm = box_muller_sample();
            printf("%0.3f:%0.3f\t", bm.z0, bm.z1);
    return 0;

a sample output looks like this:

Box-Muller samples:
-0.559:0.786    -0.798:-0.640   -0.365:-0.140   1.417:-0.197    0.210:1.734     0.738:1.026
-2.334:0.079    -0.973:-0.634   -0.674:1.263    0.765:1.705     -0.560:0.860    -0.576:-0.044
-1.242:-1.705   1.705:-0.471    0.063:-0.079    0.732:-1.173    -0.242:-0.756   -1.798:0.853
-1.088:1.138    2.096:1.439     0.556:0.168     -0.105:0.132    -0.397:-0.728   0.090:0.109
-0.850:-0.034   0.930:0.108     -1.105:0.016    -0.592:-1.795   1.073:-0.227    -1.198:-0.517
-0.901:-1.978   -0.524:-0.090   0.821:-1.595    0.783:-0.051    0.580:-0.458    0.416:1.115
0.888:0.337     -0.074:-0.526   1.667:0.233     -1.339:0.075    0.563:-0.379    0.236:-0.229
0.222:0.472     -0.364:0.061    0.413:1.185     0.265:0.083     -1.120:1.978    0.939:0.807
-0.344:1.586    0.162:-0.274    -0.309:-0.098   -0.728:-0.441   -0.358:-0.310   1.324:-0.816
0.187:-0.160    0.035:0.083     1.386:-0.823    0.117:-1.897    0.475:-0.387    0.295:-0.382

In a future post I’ll cover an even more efficient algorithm to generate uniform sampling, which is appropriate when a very large amount of random sampling numbers is needed. It’s called Ziggurat

c sampling math
February 25, 2018

Creating multiboot ELF kernels with FASM

I’ve been meaning to write about this one for a while, since I hacked together some simple OSes last summer! Multiboot is magical.

Multiboot is a specification that provides a standard format for bootloaders to load kernels.

The most common (and in fact only I know of) implementation is unsurprisingly GRUB. There is actually two multiboot specifications at this time:

  • Multiboot 1 (1995)
  • Multiboot 2 (somewhere around ~2007)

I will cover creating a multiboot 1 loadable 32 bit kernel using FASM. The killer nice thing is, you can boot an ELF directly with Multiboot.


  • Knowledge of x86 assembly in intel syntax
  • FASM: available on windows and linux. written w/ 1.71.62
  • QEMU: qemu system i386 is required if you want to try booting it

A basic knowledge of ELF layout is strongly recommended. If you are not familiar with ELF format layout sections, this may be a bit foreign, I recommend at least skimming an overview of the sections, here is a good overview. Ange Albertini (of has amazing poster graphics as well here.

Creating an ELF with FASM

I have to admit the assembler I remember how to use well enough anymore is FASM, so I will cover how to generate a bootable elf with it. It is probably easy enough to translate mnemonics to nasm or gas if you have the docs handy. One benefit of using FASM is it handles linkage for you so you can assemble a working binary in a single step. However, NASM is more popular.

Our first step is to generate an ELF, lets do that first. This will make a buildable empty ELF:

format elf ; you can use elf or binary with mb kludge
org 0x100000 ; this will be your kernel reserved memory
use32 ; 32 bit

; we will add multiboot header here!

; kernel here

; data here
; bss section here

; reserve bytes for kernel stack
rb 16384


Check your work pt 1

Save this file as kernel1.asm Make sure you didnt typo it by trying to compile it:

$ fasm kernel1.asm
flat assembler  version 1.71.62  (1048576 kilobytes memory)
2 passes, 16683 bytes.

If you get no errors (and a kernel1.o) it worked. We can now move on to adding a multiboot header!

Header magic

The only thing you have to do for a binary to be loadable is include a correctly formatted header (and add it to your grub list). Importantly you do not need to fill out the complete header. I’ve recreated the multiboot 1 header here:

Offset Type Field name Note
0 u32 magic required
4 u32 flags required
8 u32 checksum required
12 u32 header_addr present if flags[16] set
16 u32 load_addr present if flags[16] set
20 u32 load_end_addr present if flags[16] set
24 u32 bss_end_addr present if flags[16] set
28 u32 entry_addr present if flags[16] set
32 u32 mode_type present if flags[2] set
36 u32 width present if flags[2] set
40 u32 height present if flags[2] set
44 u32 depth present if flags[2] set

You can see the multiboot1 docs here, as well as full latest specification, scroll to Boot information format’

The idea is to setup magic, flags, checksum correctly. You toggle certain flags to have multiboot fill out fields you are interested in and let it know what it is loading.

Adding multiboot header

To make the ELF we created above bootable, we need to fill out the multiboot header. It needs to be the first thing in the binary the multiboot loader will see. This means we need to stick it directly under _start and that is why there is _start and _kstart.

I will present the entire filled out header below. Add it below the _start label we created before. Save this file as kernel2.asm to keep track of our work.

; ... snip

; this is the multiboot header
mbflags=0x03 or (1 shl 16)
dd 0x1BADB002
dd mbflags   ; 4k alignment, provide meminfo
dd -0x1BADB002-mbflags  ; mb checksum
    dd _start       ; header_addr
    dd _start       ; load_addr
    dd _end_data    ; load_end_addr
    dd _end         ; bss_end_addr
    dd _kstart      ; entry point
; end mb header

Here is an explanation of what each line is accomplishing

mbflags=0x03 or (1 shl 16)

This is taking 0x03 and bitwise or’ing it with (1 << 16). 1 shifted left 16 is 65536.

in other words It is taking these two binary values (shown as 16 bit/u16 since thats all we need):

(Note: this is BIG ENDIAN / logical)

0x03: 1100 0000 0000 0000
(1 shl 16): 0000 0000 0000 0001

bitwise or’d: 1100 0000 0000 0001

Following the multiboot docs,

we have set the following bits:

  • bit 0 (MULTIBOOT_ALIGN): align on page boundaries
  • bit 1 (MULTIBOOT_MEMINFO): fill out the mem_* fields of the header
  • bit 16 (MULTIBOOT_AOUT_KLUDGE): fields at 12-28 of MB header are valid, use those over the ELF header to determine loading addresses.

The big gotcha for me said bit 16 is not required for ELF format kernels:

This information does not need to be provided if the kernel image is in elf format, but it must be provided if the images is in a.out format or in some other format.
(which is known in sources as `MULTIBOOT_AOUT_KLUDGE`)

I always had to provide it even with the above ELF, Im unsure if I built the sections incorrectly or otherwise, but all online sources I could find did same when building ELF kernels.

We will put these flags in header later on.

dd 0x1BADB002

this satisifes the magic’ part of the header for multiboot1

dd mbflags

Put our flags we set above in the right spot

dd -0x1BADB002-mbflags

This is a tricky way to set the checksum of the 3 required fields which needs to be 0 mod 2^32.

I will breeze over the rest, loading our _start,_end_data, _end and _kstart addresses into the header with dd lets multiboot our elf sections and where to jump to after loading.

That’s it (phew)!

Lets a hlt in _kstart:


This gives an effective memory address in our kstart label.

Check your work pt 2

Make sure we can assemble:

$ fasm kernel2.asm
flat assembler  version 1.71.62  (1048576 kilobytes memory)
2 passes, 16683 bytes.

In theory, this kernel2.o is actually bootable but we won’t be able to tell because it will hang with the QEMU boot messages still visible.

Add some video functionality

Lets add some basic video memory functionality, I will not cover this in detail, I will provide working code, but its an exercise for the reader.

It will clear the framebuffer and write a message, confirming we did actually load and jump to our _kstart code.

here is the full listing with a hello world:

; fasm multiboot example
; this shows how to use elf (or bin if you want)
; fasm output with a multiboot header with grub

format elf ; you can use elf or binary with mb kludge
org 0x100000

; this is the multiboot header
mbflags=0x03 or (1 shl 16)
dd 0x1BADB002
dd mbflags   ; 4k alignment, provide meminfo
dd -0x1BADB002-mbflags  ; mb checksum
    dd _start       ; header_addr
    dd _start       ; load_addr
    dd _end_data    ; load_end_addr
    dd _end         ; bss_end_addr
    dd _kstart      ; entry point
; end mb header

; code
    ; set stack right away
    mov esp, _kstack

    ; grub sets up 80x25 mode for us
    mov edi, 0xB8000 ; video memory

    ; the screen data is left as is, showing
    ; qemu boot messages and crap, so clear it out

    ; since we dont care about the actual
    ; rows and heights we can just linearly nuke the total
    ; num of bytes:
    ; 80x25 = 2000 chars,
    ; each visible char has a value byte and display control byte
    ; so total bytes = 2000 * 2, 4000 bytes
    ; however we can simplify this by setting a full 32 bits each
    ; loop (4 bytes or 2 chars)
    mov ecx, 1000
        ; set 1F control bytes, 0x00 text bytes
        mov dword [edi + ecx * 4], 0x1F001F00
    loop @b

    ; now display a message before halting
    mov esi,msg
    mov ecx,msglen
        mov byte [edi], 0x1F
        inc edi
    loop @b


; data section
msg db 'hello from a multiboot elf'
msglen = $ - msg


; bss uninit data here

; reserve the number of bytes of how big you want the kernel stack to be
rb 16384


Save it as kernel3.asm

Lets assemble and boot it!

$ fasm kernel3.asm
flat assembler  version 1.71.62  (1048576 kilobytes memory)
2 passes, 16683 bytes.
$ qemu-system-i386 -kernel kernel3.o



If you got stuck, full sources and a script to make bootable as an ISO using el-torito (covered later on) is available on my github

next steps and other resources

Understanding writing to the video memory above:

NASM version

os bootloader assembly elf
February 11, 2018

easier updates

I found a much easier way to blog without messing about, This is great! Not that my old setup was hard (static site gen + upload to s3), but anything easier is a welcome change.

Anyway with that said, I’m going to start blogging smaller, more frequent nuggets’ of things as I think of them. I dont wan’t to call it micro-blogging, but something bigger than twitter and something you wouldnt expect to see on a blogspot.

In my day to day work I tend to come across lots of tricky stuff I may or may not lose in the annals of time so this will by my efforts to combat that while sharing them at the same time.

So expect more succinct, raw development notes, and bits here and there.

site meta
January 28, 2018

dexGame engine boogaloo

Well it finally happened, I threw in the towel on the current engine tech I was using for dexGame / BAA. I will detail why, but first some background as to how I ended up here.


I was looking to start a simple, ruthlessly barbaric shooter game. I didn’t need triple A features or crazy artist workflows. However, I did want responsive, good feeling input for a fast shooter. Here are the candidates I seriously considered:

  • Unreal Engine 4
  • Unity
  • Amazon Lumberyard (crytek)
  • Tombstone
  • Old quake engines, etc

Unity: Amazingly, I didn’t pick unity, because I had an unfair, dated stigma about it’s input handling and overall feel’ for fast shooters. My main concern was hitting a point where I wish I had the sources to fix an issue, or customize input or movements. It’s definitely back up on the chopping block. If nothing else, I know C# very well, it was my bread and butter professional development language for over 5 years.

Unreal: While I like unreal and have made throw away project and demos with it, I had a hankering to use something more lightweight’ in terms of code/cognitive load for a fresh game. If I was going to go for something with full code, I’d prefer something where I could understand the majority of the engine front to back. I’m not afraid to admit Unreal is beyond my mental capacity to even pretend to understand each subsystem. Aside from that, I could implement the game in blueprints if I really wanted, but I kept looking.

Lumberyard: I ended up at lumberyard trying to figure out what the heck happened to the old crytek sdk licensing. I like what Amazon is doing with it (tho the AWS integration is lukewarm for really fast shooter servers). I strongly considered this due the amount of work and money Amazon is dumping into it but I was put off by a few things,

  • Old crysis systems were in place next to new amazon (nicer) systems
  • Docs were being caught up/written up, complex interaction
  • Lack of industry acceptance, who knows how long it will last

With that in mind I kept looking.

Tombstone: This engine, previously known as C4, is a one man show. Written by Eric Lengyel (along with many other tools written from scratch for the entire pipeline), It boasts some interesting features and a completely integrated editor. Importantly, he heavily sells his code quality and architecture on the website. All of the model & map, sound, video, texture formats are completely homegrown here. However, he also developed an open exchange format that no one else uses. Some island syndrome here.

Old quake engines etc: I have a lot of experience hacking on quake2, quake3 and source engines, so it was tempting to use something like that but a few issues I had with going this route include:

  • outdated tools (mapping, model importers, source’s stupid qc system)
  • not learning anything new
  • remaking quake in quake would be stupid :D

It’s also worth noting valve doesn’t care about providing SDKs for modders at all anymore (can you blame them?), after a 5 year hiatus the source2013 sdk was dumped and left for dead and all the tooling is held together with duck tape.

Initial Decision

Ultimately, I drank the kool-aid and went with Tombstone. If I had full sources, I wanted to have a hope of understanding most of the interesting systems and I have to say, Eric is not exaggerating. The code quality is absolutely excellent. It’s written in modern C++ and it was the most readable code of any non-trivial system in any language I’ve ever seen (while still being C++) which says a lot! For my goal of understand the code (well most of the important system) I succeeded thanks to the excellent engineering.

The systems make sense and work well together. It’s not free, but for what you get, with no royalties or runtime fees it is a serious bargain.

The content system is a little different, but you can actually create models and maps within the integrated editor, as well as 2D gui systems which was extremely handy. Furthermore, my game compiled and shipped was under 6MB total. It’s a very sweet package and compiles nicely (well mostly, linux require minor opengl header fuss)

Breaking point

If I am praising Tombstone so much, why am I moving away from it? While it was great, overall several small things slowly grated at my already small chances of completing a one man game. None of these are critical, but over time it made me start to consider more and more

  • Everything is really homegrown. For format imports, you have very limited options
  • The tools and editors do not scale to 4K/hidpi, this was very annoying after upgrading monitors
  • The world editor works well, but the binds are unlike anything else i’ve used in my life
  • The editor is only usable from in game. Which is kinda a plus and minus at same time
  • Networking is very basic. there is no proper full netcode (eg, client prediction systems)
  • Movement is done via rigid body dynamics. Having some entities ignore the system with custom movement caused issues
  • There is intense math notation in the code. Eric is a mathematician and you are reminded of this
  • Minor, but development seems to have slowed a bit.
  • No built in code-level scripting system (just basic map entity stuff)
  • Virtually no industry use :(

Not to sound like I’m picking on this engine, I really do like it and still highly recommend it if you can work within those constraints. If I was making a small single player game or visualization/simulation software I’d very likely pick it up again.

Moving forward

Ill probably pick something that starts with U unless lumberyard really makes impressive strides

dexGame engine
December 27, 2017

Happy holidays

Hope everyone had a good christmas and a relaxing break. There will be a bit of a lapse of updates while I find something lower friction for posting updates