July 2, 2022
Last week I got pretty sick with flu symptoms. Perhaps it was the virus affecting my brain function, but when the headaches stopped I was very motivated to work on something not particularly useful. I decided to learn how to write WebAssembly binaries by hand.
I found a few articles describing WebAssembly's text format, or other languages that compile to .wasm files, but did not find an article describing the binary itself. I'm hoping this entry will serve to fill that void.
.wasm file is bytecode executed in a runtime.wasm), and a text format (.wat)The WAT format uses s-expressions to represent constructs and instructions. If you've never used Lisp before, it looks a bit like Lisp. You can even write this stack-based language in a functional style instead, which Colin Eberhardt explains in this article.
The WAT gets converted into the .wasm binary. It's pretty much a 1-to-1 translation wrapped in some additional strucutre.
WebAssembly binaries store all integers in a variable-length format called LEB128. It's described fairly well on Wikipedia.
Each byte has a 1 as the most-significant bit if there is another byte to the right. This means some values have multiple representations.
00000001 // the number 1
10000001 00000000 // also the number 1
10000001 10000000 00000000 // also the number 1A vector is an LEB128 value representing a number of items, followed by the items. A vector is always "of" a known type, so byte-length of the vector is determined with knowledge of its contents.
0x02 [first item] [second item]
The .wasm file itself starts with these 8 bytes
0x00 0x61 0x73 0x6D // \0asm
0x01 0x00 0x00 0x00 // version; 32-bit fixed-width
What follows next is a sequence of "sections". Each section is a 1-byte "id" followed by its byte length in LEB128 format (followed by its contents). If the id is 0x00 it is a custom section and the runtime can ignore its contents. All other section IDs have a particular meaning and must occur in sequential order.
To hand-write a simple WASM binary we only need the following sections:
0x01 "type section" - specifies arguments and return values of functions0x03 "function section" - maps signatures from the type section to items in the code section0x07 "export section" - allows exporting functions so we can call them from outside (ex: from a javascript environment)0x0A "code section" - function bodies go here.From the spec, the type section is defined as follows:
typesec ::= section(vec(functype))
functype ::= 0x60 vec(valtype) vec(valtype)
This means we have a vector of "functype"s, where each functype is the literal byte 0x60 followed by:
Here's the breakdown for a type section that defines a single function type that takes no arguments and returns an integer:
// section type and size
0x01 // this is the "type section"
0x05 // 5 more bytes in this section
// vec(functype)
0x01 // 1 item
// functype (always starts with 0x60)
0x60
// vec(valtype) parameters
0x00 // 0 parameters
// vec(valtype) results
0x01 // 1 result
0x7F // byte value for int32 type
Types are encoded as a single byte. 0x7F specifies a 32-bit integer. Values for other types can be found on the WebAssembly spec.
The function section is just a vector of indices from the type section. It runs in parallel with the code section.
To extend our example so far, the function section for a module with only one function would look like this:
0x03 // this is the "function section"
0x02 // 2 more bytes in this section
0x01 // there is only one entry
0x00 // index of the first entry in "type section"
Following the spec, the export section looks like this:
exportsec ::= section(vec(export))
export ::= name [type of thing] [index of thing]The name type is a LEB128 size followed by a sequence of bytes in ASCII encoding.
Following our example, we can export our only function as "helloWorld" like this:
07 // this is the "export section"
0E // section has 14 bytes
01 // there is one export
0A // length of identifier is 10 bytes
68 65 6C 6F 57 6F 72 6C 64 // "helloWorld"
00 // type of thing; 00 means "function"
00 // index of function; we only have one, so it's 0Finally the most interesting section - the one with the actual instructions that will be executed.
Just like all the other sections, this one contains a vector. This vector contains function bodies and runs in parallel with the function section.
codesec ::= section(vec(code))
code ::= size vec(locals) expr
expr ::= ...instructions... 0x0BEach entry contains an LEB128 size, a vector for local variables, and a sequence of instructions.
Following our example, so far we defined a function type for a function that returns an integer, and exported it as "helloWorld". The following then defines the body of this function to return "42" to the caller.
0A // this is the code section
06 // 6 more bytes in this section
01 // there is only one entry
04 // the function is 4 bytes long
00 // size of locals vector (there are none)
41 // i32.const instruction
2A // literally "42"
0B // end of code bodyThat's it! If you followed along with the examples up to this point you'll probably be able to read and understand the entire .wasm file we created just by reading the hex values like a true madlad.
00 61 73 6D
01 00 00 00
01 05 01
60 00 01 7F
03 02 01
00
07 0E 01
0A 68 65 6C 6C 6F 72 6C 64 00 00
0A 06 01
04 00 41 2A 0B