EricDubé.com

WebAssembly by Hand; Easy and Fun!

July 2, 2022

Why I Wrote This

Last week I got pretty sick with flu symptoms. Perhaps it was the virus affecting my brain function, but when the headaches stopped I was very motivated to work on something not particularly useful. I decided to learn how to write WebAssembly binaries by hand.

I found a few articles describing WebAssembly's text format, or other languages that compile to .wasm files, but did not find an article describing the binary itself. I'm hoping this entry will serve to fill that void.

WAT and WASM

The WAT format uses s-expressions to represent constructs and instructions. If you've never used Lisp before, it looks a bit like Lisp. You can even write this stack-based language in a functional style instead, which Colin Eberhardt explains in this article.

The WAT gets converted into the .wasm binary. It's pretty much a 1-to-1 translation wrapped in some additional strucutre.

Variable-length Integers with LEB128

WebAssembly binaries store all integers in a variable-length format called LEB128. It's described fairly well on Wikipedia.

Each byte has a 1 as the most-significant bit if there is another byte to the right. This means some values have multiple representations.

00000001 // the number 1
10000001 00000000 // also the number 1
10000001 10000000 00000000 // also the number 1

Vector

A vector is an LEB128 value representing a number of items, followed by the items. A vector is always "of" a known type, so byte-length of the vector is determined with knowledge of its contents.

0x02 [first item] [second item]

Module Format

The .wasm file itself starts with these 8 bytes

0x00 0x61 0x73 0x6D // \0asm
0x01 0x00 0x00 0x00 // version; 32-bit fixed-width

What follows next is a sequence of "sections". Each section is a 1-byte "id" followed by its byte length in LEB128 format (followed by its contents). If the id is 0x00 it is a custom section and the runtime can ignore its contents. All other section IDs have a particular meaning and must occur in sequential order.

To hand-write a simple WASM binary we only need the following sections:

Type Section

From the spec, the type section is defined as follows:

typesec ::= section(vec(functype))
functype ::= 0x60 vec(valtype) vec(valtype)

This means we have a vector of "functype"s, where each functype is the literal byte 0x60 followed by:

Here's the breakdown for a type section that defines a single function type that takes no arguments and returns an integer:

// section type and size
0x01 // this is the "type section"
0x05 // 5 more bytes in this section

// vec(functype)
0x01 // 1 item

    // functype (always starts with 0x60)
    0x60
        // vec(valtype) parameters
        0x00 // 0 parameters

        // vec(valtype) results
        0x01 // 1 result
        0x7F // byte value for int32 type

Types are encoded as a single byte. 0x7F specifies a 32-bit integer. Values for other types can be found on the WebAssembly spec.

Function Section

The function section is just a vector of indices from the type section. It runs in parallel with the code section.

To extend our example so far, the function section for a module with only one function would look like this:

0x03 // this is the "function section"
0x02 // 2 more bytes in this section
0x01 // there is only one entry
0x00 // index of the first entry in "type section"

Export Section


Following the spec, the export section looks like this:

exportsec ::= section(vec(export))
export    ::= name [type of thing] [index of thing]

The name type is a LEB128 size followed by a sequence of bytes in ASCII encoding.

Following our example, we can export our only function as "helloWorld" like this:

07 // this is the "export section"
0E // section has 14 bytes
01 // there is one export
0A // length of identifier is 10 bytes
68 65 6C 6F 57 6F 72 6C 64 // "helloWorld"
00 // type of thing; 00 means "function"
00 // index of function; we only have one, so it's 0

Code Section

Finally the most interesting section - the one with the actual instructions that will be executed.

Just like all the other sections, this one contains a vector. This vector contains function bodies and runs in parallel with the function section.

codesec ::= section(vec(code))
code    ::= size vec(locals) expr
expr    ::= ...instructions... 0x0B


Each entry contains an LEB128 size, a vector for local variables, and a sequence of instructions.

Following our example, so far we defined a function type for a function that returns an integer, and exported it as "helloWorld". The following then defines the body of this function to return "42" to the caller.

0A // this is the code section
06 // 6 more bytes in this section
01 // there is only one entry
04 // the function is 4 bytes long
00 // size of locals vector (there are none)
41 // i32.const instruction
2A // literally "42"
0B // end of code body

Reviewing the Final Binary

That's it! If you followed along with the examples up to this point you'll probably be able to read and understand the entire .wasm file we created just by reading the hex values like a true madlad.

00 61 73 6D
01 00 00 00

01 05 01
60 00 01 7F

03 02 01
00

07 0E 01
0A 68 65 6C 6C 6F 72 6C 64 00 00

0A 06 01
04 00 41 2A 0B