Around this time last year I wrote about my experience
writing a WebAssembly VM in C (for fun).
At that point in time it was effectively a prototype. It could
decode the WebAssembly binary format and execute some simple hello world programs.
I’ve continued to work on the project in my free time, and I’m happy to share that it has come
far enough that it passes the full
WebAssembly 2.0 core specification test suite
(minus the SIMD instructions).
Getting this far involved a full rewrite in Rust. This allowed me to move much faster and
restructure my code with confidence. The overall architecture of the C prototype and the
Rust port is largely similar. For the most part my code directly mirrors the semantics in
the core specification. I’m not doing any kind of lowering, register allocation, or optimization —
besides precomputing relative jump offsets and other stack metadata during module validation to
aid in the handling of control flow. I am using a pretty liberal amount of unsafe, but I haven’t
experienced a single segfault after the switch to Rust.
In retrospect, starting with a minimal language like C and then graduating to Rust was an
effective strategy for avoiding premature abstraction. The minimalism of C helped me focus
on what the program is physically doing without going through layers
of indirection. In the past, when working in Rust, I’ve ended up bogging myself down
with trait soup trying to introduce abstractions before I fully understood the problem.
Anyways… I’ve reached the point that my VM is fully compliant with version 2.0 of the
WebAssembly core specification, and I need a large program to stress test it with.
The obvious choice for this sort of thing is a DOOM port. For those unaware, DOOM
runs on just about every computing environment under the sun.
For every calculator, satellite,
or turing-complete system,
there’s likely a software engineer theorizing about how to get DOOM running on it.
Others have written a lot about the structure of the DOOM codebase itself, so I
wont focus on that too much. I will be using the doomgeneric
fork as my starting point since it’s designed to be easily portable.
There are two primary concerns to address when cross-compiling a large existing C codebase to WebAssembly.
The first is how we’ll emit wasm bytecode: any C compiler with an LLVM backend should support some kind of wasm32 target
— emcc, clang, and zig cc are all viable choices here.
The second concern is how our wasm module will communicate with
the outside world: WebAssembly is a sandbox, there is no built-in ability for reading or writing files,
receiving keyboard input, or rendering pixels — any IO capabilities need to be explicitly
provided to the module by the embedder through
function imports. There are a few prebuilt solutions for this, but we can also establish our own host interface.
Emscripten
Emscripten is the OG WebAssembly compiler (wasm itself evolved out of the Emscripten project).
The catch (for me) is that emcc explicitly targets
web browsers. This means it compiles your C code down to a .wasm binary plus an accompanying
Javascript bundle that provides any of the IO capabilities that your C code needs, like a
virtualized filesystem on top of the IndexedDB API. This makes it a non-starter for my purposes since my
WebAssembly VM is not on the Web, I do not have a Javascript runtime.
Clang + wasi-sdk
The emerging standard for a preestablished POSIX-like host interface is the WebAssembly System Interface (WASI).
You build your C code against the wasi-sdk and your wasm module will call out
to wasi:fs/open-at in order to open files, etc. There are various advantages to sticking
to a standard interface, and this is probably the best choice for many projects, but on the engine implementation
side it’s a good bit of work. WASI is defined in terms of the
WebAssembly Component Model which is itself a rather complex specification layered on
top of the WebAssembly core specification. I currently don’t have the time to add component model support to my engine, but I may revisit
this in the future. Arguably I could implement the necessary wasi imports at the core abstraction level,
but I’d like to go through the learning exercise of designing my own interface.
Raw Clang
The remaining option is to define my own host interface. This will involve building a custom embedder in Rust
that implements the hostcalls, and then on the guest side I will need a bespoke libc implementation that
can employ this interface to perform IO. Sounds like fun, let’s get to it!
Doomgeneric is structured so that you only need to implement 6 platform-specific functions.
We need to be able to render raw pixel data to the screen, we need some basic timers, and
we need to be able to read input from the keyboard.
//Implement below functions for your platform
void DG_SleepMs(uint32_t ms);
uint32_t DG_GetTicksMs();
int DG_GetKey(int* pressed, unsigned char* key);
void DG_SetWindowTitle(const char * title);
The only dependency beyond that is the C standard library. We will need to be able to read resources (textures, level data, etc)
at runtime from a .wad file. Let’s sketch out which libc functions we’ll need by attempting to compile, and just declare any functions
that are missing. Eventually we’ll have declared enough that we’ll start seeing linker errors instead of compile errors.
int snprintf(char *str, size_t size, const char *format, ...);
int fprintf(FILE *f, const char *format, ...);
int vfprintf(FILE *stream, const char *format, va_list arg);
int vsnprintf(char *buffer, size_t size, const char *format, va_list argptr);
int printf(const char *format, ...);
int puts(const char *str);
FILE *fopen(const char *path, const char *mode);
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
int fseek(FILE *stream, long int offset, int whence);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
int remove(const char *path);
int rename(const char *src, const char *dst);
int sscanf(const char *str, const char *format, ...);
int atoi(const char *str);
double atof(const char *str);
void *realloc(void *ptr, size_t size);
void *calloc(size_t num, size_t size);
int system(const char *command);
void *memset(void *str, int c, size_t n);
void *memcpy(void *dst, const void *src, size_t len);
void *memmove(void *dst, const void *src, size_t len);
size_t strlen(const char *s);
char *strncpy(char *dest, const char *src, size_t n);
int strcmp(const char *s1, const char *s2);
int strncmp(const char *s1, const char *s2, size_t n);
int strcasecmp(const char *s1, const char *s2);
int strncasecmp(const char *s1, const char *s2, size_t len);
char *strrchr(const char *s, int c);
char *strdup(const char *s);
char *strchr(const char *s, int c);
char *strstr(const char *haystack, const char *needle);
I’ve redacted some typedefs and macros, plus these are split across various libc header files,
but these 43 functions are all we need to run DOOM.
Host Calls
We have enough information now that we can start sketching out the host interface.
This evolved over time as I implemented things, but ultimately here is what I ended up with:
This macro tells the linker that we are purposefully not defining an implementation for the annotated symbol, and that
we want to produce an entry for it in the imports section of our .wasm binary. These functions will all be implemented
in Rust by the embedder and passed into our module when it is instantiated.
#define WASM_IMPORT(module, name) \
__attribute__((import_module(module), import_name(name)))
These functions allow us to terminate the process with an error code, or panic with an
error message — this was useful for me during debugging to be able to crash with
a message for myself when something unexpected ocurred within the guest.
// ----- proc -------- //
WASM_IMPORT("semblance", "exit")
extern void semblance_syscall_exit(int) __attribute__((noreturn));
WASM_IMPORT("semblance", "panic")
extern void semblance_syscall_panic(const char *msg) __attribute__((noreturn));
The following establishes an interface for buffered IO. This will allow us
to write to stdout and stderr, and read from our .wad file. We can let
the Rust embedder handle the buffering since our Wasm hostcalls aren’t as expensive as
a full OS kernel context switch.
WASM_IMPORT("semblance", "fopen")
extern int32_t semblance_syscall_fopen(const char *path, const char *mode);
WASM_IMPORT("semblance", "fwrite")
extern int64_t semblance_syscall_fwrite(int fd, const void *data, size_t len);
WASM_IMPORT("semblance", "ftell")
extern int64_t semblance_syscall_ftell(int fd);
WASM_IMPORT("semblance", "fseek")
extern int32_t semblance_syscall_fseek(int fd, int64_t offset, int32_t whence);
WASM_IMPORT("semblance", "fflush")
extern int32_t semblance_syscall_fflush(int fd);
WASM_IMPORT("semblance", "fread")
extern int32_t semblance_syscall_fread(int fd, void *dst, size_t size);
WASM_IMPORT("semblance", "fclose")
extern int32_t semblance_syscall_fclose(int fd);
Here we allow the guest module to open a GUI window, set the title, and
render raw pixel data to it.
// ------ gfx --------- //
WASM_IMPORT("semblance", "init_window")
extern void semblance_syscall_init_window(
const char *title, int32_t width, int32_t height);
WASM_IMPORT("semblance", "set_window_title")
extern void semblance_syscall_set_window_title(const char *title);
WASM_IMPORT("semblance", "render")
extern void semblance_syscall_render(
const uint32_t *pixels, int32_t width, int32_t height);
These functions let the guest know how long it has been running,
and allow the guest to put the thread to sleep for some specified amount of time.
// ------ timers --------- //
WASM_IMPORT("semblance", "get_ticks_ms")
extern size_t semblance_syscall_get_ticks_ms();
WASM_IMPORT("semblance", "sleep_ms")
extern void semblance_syscall_sleep_ms(size_t ms);
Finally, this allows the guest to read keyboard events.
// ------ keyboard input --------- //
typedef struct read_key_result_t {
WASM_IMPORT("semblance", "read_key")
extern read_key_result_t semblance_syscall_read_key();
Entrypoint
The last part of the host interface we need to define is the entrypoint
of the program. Conventionally this is a function called _start that
will do some secret initial prep (eg opening file descriptors for stdin/stdout/stderr, reading program arguments)
and subsequently call into main(argc, argv).
We’re going to diverge from that here a bit. Since I want the embedder to own the UI event loop,
we’re going to use a dual entrypoint _start which will be called once to initialize the
module and _tick which will be called repeatedly to drive the game loop. This will free up
the embedder to be able to handle windowing events (like the user requesting to close the application)
without involving the guest at all.
We’ve fully specified how our embedder and .wasm guest will communicate with each other.
Let’s dig into the libc implementation.
Entrypoint
We’ll start with the entrypoint. We’re hardcoding our command line arguments,
but it is interesting to peek behind the curtain to see what happens before main
(or in our case init). Namely we define the global stdout and stderr variables
by calling fopen to acquire a file descriptor to "/dev/stdout" and "/dev/stderr".
extern void init(int argc, char **argv);
#define WASM_EXPORT(name) __attribute__((export_name(name)))
static char *__argv[1] = { "/doomgeneric.wasm" };
int stdio_err = __stdio_init();
if (stdio_err) semblance_syscall_panic("failed to initialize stdio");
stderr = fopen("/dev/stderr", "w");
if (stderr == NULL) return 1;
stdout = fopen("/dev/stdout", "w");
if (stderr == NULL) return 2;
Allocator
I may have been able to get away with a super simple bump allocator
here, since DOOM implements it’s own zoned memory allocator, and doesn’t rely too heavily on the libc allocator once the game loop
has started. But for completeness’ sake I opted to pull in a full memory allocator.
I’m using Andy Wingo’s walloc which is
designed specifically for use within WebAssembly.
I forked it slightly in order to extend it with basic support for calloc and realloc.
static size_t walloc_size(void *ptr) {
struct page *page = get_page(ptr);
unsigned chunk = get_chunk_index(ptr);
uint8_t kind = page->header.chunk_kinds[chunk];
if (kind == LARGE_OBJECT) {
struct large_object *obj = get_large_object(ptr);
return granules * GRANULE_SIZE;
void *realloc(void *ptr, size_t size) {
if (ptr == NULL) return malloc(size);
size_t old_size = walloc_size(ptr);
if (old_size == 0) return malloc(size);
size_t min_size = old_size size ? old_size : size;
void *new_ptr = malloc(size);
if (new_ptr == NULL) return NULL;
__builtin_memcpy(new_ptr, ptr, min_size);
void *calloc(size_t num, size_t size) {
size_t bytes = num * size;
void *ptr = malloc(bytes);
if (ptr == NULL) return NULL;
__builtin_memset(ptr, 0, bytes);
Stdio
This was by far the most involved portion to implement. DOOM makes heavy use
of snprintf to format strings as it decodes various named sections of the .wad
file. This means that our libc’s fwrite(..., FILE *f) needs to be able to write to
file descriptors (through our established host interface) but also directly to in-memory buffers.
In order to enable this, I define the FILE struct as a tagged union which either holds a
file descriptor or a reference to a contiguous region of memory.
typedef enum stream_kind_t {
typedef struct buf_stream_t {
The implementation of fwrite uses this tag to decide how to perform the
data transfer (hostcall or memcpy).
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream) {
if (stream == NULL) return 0;
written = semblance_syscall_fwrite(
written = bufwrite(&stream->data.buf_state, ptr, size * nmemb);
Implementing printf was a fun experience. I started by grepping through the DOOM
source to find all the format strings ("%s", "%p", "%.3d", etc) and just
implemented enough of a parser and formatter to handle each of them. printf
had always felt like a primitive component of the language, but it is just a regular
C function after all. Try printf debugging your broken printf implementation, it’s a bit trippy!
int vfprintf(FILE *stream, const char *format, va_list args) {
char *format_end = strchr(format, '\0');
while (format format_end) {
char *pat = strchr(format, '%');
written += fwrite(format, sizeof(char), pat - format, stream);
printf_specifier_t specifier;
format = parse_printf_specifier(pat, &specifier);
written += fwrite_printf_specifier(stream, &specifier, &args);
Getting to this point was a huge milestone because I could finally see
log output and error messages which informed me what was going wrong
when the guest crashed. I was flying blind up to here.
Z_Init: Init zone memory allocation daemon.
zone memory: 0x190108, 600000 allocated for zone
Using . for configuration and saves
V_Init: allocate screens.
M_LoadDefaults: Load system defaults.
saving config in .default.cfg
-iwad not specified, trying a few iwad names
Trying IWAD file:doom2.wad
Trying IWAD file:plutonia.wad
Trying IWAD file:doom.wad
Trying IWAD file:doom1.wad
Using ./.savegame/ for savegames
===========================================================================
===========================================================================
Doom Generic is free software, covered by the GNU General Public
License. There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. You are welcome to change and distribute
copies under certain conditions. See the source for more information.
===========================================================================
I_Init: Setting up machine state.
M_Init: Init miscellaneous info.
R_Init: Init DOOM refresh daemon - ...................
P_Init: Init Playloop state.
S_Init: Setting up sound.
D_CheckNetGame: Checking network game status.
startskill 2 deathmatch: 0 startmap: 1 startepisode: 1
Emulating the behavior of the 'Doom 1.9' executable.
HU_Init: Setting up heads up display.
ST_Init: Init status bar.
I_InitGraphics: framebuffer: x_res: 640, y_res: 400, x_virtual: 640, y_virtual: 400, bpp: 32
I_InitGraphics: framebuffer: RGBA: 8888, red_off: 16, green_off: 8, blue_off: 0, transp_off: 24
I_InitGraphics: DOOM screen size: w x h: 320 x 200
I_InitGraphics: Auto-scaling factor: 2
Doomgeneric
Now we’re getting out of the libc implementation, let’s take a look at those six DG_* functions we
need to implement.
Init & Tick
Here’s where we finally define our init and tick functions that get called by our libc entrypoint.
We’re effectively just passing the calls along to doomgeneric.
void init(int argc, char** argv) {
doomgeneric_Create(argc, argv);
Graphics & Timers
Again, we’re basically just adapting our hostcall interface to the doomgeneric interface.
semblance_syscall_init_window("DOOM", DOOMGENERIC_RESX, DOOMGENERIC_RESY);
void DG_SetWindowTitle(const char *title) {
semblance_syscall_set_window_title(title);
semblance_syscall_render(
void DG_SleepMs(uint32_t ms) {
semblance_syscall_sleep_ms(ms);
void DG_SleepMs(uint32_t ms) {
semblance_syscall_sleep_ms(ms);
uint32_t DG_GetTicksMs() {
return semblance_syscall_get_ticks_ms();
Keyboard Input
Same idea, with an intermediate step to translate our
host keycode (which is an SDL2 keycode) to the encoding
that DOOM expects.
int DG_GetKey(int* pressed, unsigned char* doomKey) {
read_key_result_t result = semblance_syscall_read_key();
if (!result.read) return 0;
*pressed = result.pressed;
*doomKey = translate_key_code(result.key_code);
Build
As a result we end up with a relatively hefty 426k WebAssembly module ready
to be instantiated by my meager little VM.
ls -lh guest/doomgeneric/target/doomgeneric.wasm
# -rwxr-xr-x@ 1 taylor staff 426K Jan 24 15:16 guest/doomgeneric/target/doomgeneric.wasm
My project is setup as a cargo workspace with three different crates:
# semblance semblance-mars semblance-wast
semblance is the core crate that implements the WebAssembly engine, this depends only on the Rust standard library.
semblance-wast is a test harness for running WebAssembly scripts (.wast files which define the core specification test suite) against semblance.
semblance-mars is our embedder for DOOM, it depends on semblance and sdl2 for UI.
This is where all of our hostcalls will be implemented.
Main
There’s a lot going on here, but the gist is that the embedder is going to load our wasm module,
instantiate it with our syscalls available to be imported, grab references to the exported
_start and _tick functions from the guest, and then invoke the guest’s _start function.
At this point the guest will have read the .wad file and opened our SDL2 window, it’s now
ready to enter the game loop.
fn main() -> ResultBoxdyn std::error::Error>> {
let module_path = std::env::args().nth(1).expect("missing module path");
let module_path = PathBuf::from(module_path);
let mut linker = WasmLinker::new();
syscalls::add_to_linker(&mut linker);
let wmod = WasmModule::read(&module_path).expect("unable to load module");
let (mut store, externvals) = linker.link(&wmod)
.expect("unable to resolve imports");
.instantiate(Rc::new(wmod), &externvals)
.expect("failed to instantiate");
let winst = store.instances.resolve(winst_id);
.resolve_export_fn_by_name("_start")
.expect("no _start func exported");
.resolve_export_fn_by_name("_tick")
.expect("no _tick func exported");
.invoke(initfunc, Box::new([]), WasmInvokeOptions::default())
.expect("guest trapped during init");
Once the guest is fully instantiated we’ll start handling events on the SDL event loop.
Each iteration of this loop, we’ll call into the guest’s _tick function to run a frame
of the game.
fn main() -> ResultBoxdyn std::error::Error>> {
guest_gfx::use_sdl_context(|ctx| ctx.event_pump())
.expect("failed to get event pump");
for event in event_pump.poll_iter() {
guest_input::enqueue_key(QueuedKeyEvent {
guest_input::enqueue_key(QueuedKeyEvent {
.invoke(tickfunc, Box::new([]), WasmInvokeOptions::default())
.expect("guest trapped during _tick");
Hostcalls
Let’s see how the embedder allows the guest to perform IO.
Keep in mind that I could make the embedding API much more ergonomic by allowing
the WasmStore to carry some embedder specific runtime state through a generic
WasmStore. But I haven’t gotten there yet, for now any embedder state that
needs to be accessed by a hostcall needs to be available statically, it’s a bit
messy but it works for now.
Our IO hostcalls need access to a table which maps file descriptors to the
underlying reader / writer.
pub trait ReadSeek: Read + Seek {}
implT: Read + Seek> ReadSeek for T {}
pub struct IoTable(VecIoTableEntry>);
static IO_TABLE: RefCellIoTable> = RefCell::new(IoTable::new());
The fopen hostcall is implemented by pushing a new entry into this table,
the index at which our entry is inserted is returned to the guest as the
file descriptor.
pub fn fopen(path: &str, mode: &str) -> i32 {
// we only allow writes to stdout/stderr
let idx = with_io_table_mut(
|io| io.push_writer(Box::new(std::io::stdout()))
let idx = with_io_table_mut(
|io| io.push_writer(Box::new(std::io::stderr()))
_ => todo!("fopen write {}", path),
You’ll notice that this function is accepting &str arguments, but the hostcall is dealing
with raw pointers into the WebAssembly guest’s linear memory. Here’s the actual hostcall
that’s imported by the guest which marshals the data types:
static SYSCALL_FOPEN_TYPE: LazyLockWasmFuncType> =
LazyLock::new(|| WasmFuncType {
input_type: WasmResultType(Box::new([
WasmValueType::Num(WasmNumType::I32), // char *path
WasmValueType::Num(WasmNumType::I32), // char *mode
output_type: WasmResultType(Box::new([
WasmValueType::Num(WasmNumType::I32) // int fd
winst_id: WasmInstanceAddr,
let path = unsafe { args[0].num.i32 };
let path = guest_resolve_cstr(store, winst_id, path);
let mode = unsafe { args[1].num.i32 };
let mode = guest_resolve_cstr(store, winst_id, mode);
let fd = guest_io::fopen(path, mode);
num: WasmNumValue { i32: fd },
pub fn add_to_linker(linker: &mut WasmLinker) {
("fopen", &SYSCALL_FOPEN_TYPE, &syscall_fopen),
You can imagine how the other hostcalls are implemented, they all effectively stick
to this basic pattern.
Let’s run it!
./target/release/semblance-mars guest/doomgeneric/target/doomgeneric.wasm
I’ve paid very little attention to performance as I’ve implemented the VM, but out of the box it’s
able to run DOOM at a playable framerate which is awesome!
Notably, I didn’t run into a single bug in my core VM implementation during this whole process of porting DOOM.
I guess that’s a testament to how comprehensive the core test suite is.
Aside
I did have to fix a Wasm trap because of an issue in the DOOM source code itself.
I_AtExit((atexit_func_t) G_CheckDemoStatus, true);
This function cast is unsafe when targeting WebAssembly (it’s casting a function that returns an int into a void function).
On exit, the call_indirect instruction hits this and traps because the function types don’t match (the core specification
requires that indirect calls are typechecked at runtime). The fix is to wrap G_CheckDemoStatus in a separate
function that returns no value.
I_AtExit(G_CheckDemoStatus_AtExit, true);
void G_CheckDemoStatus_AtExit(void) {
We can see in the compiled WebAssembly we’re now explicitly dropping the unused value from the operand stack.
If that dynamic typecheck were not in place, the operand stack would be silently corrupted and we’d end up
seeing undefined behavior.
(func $G_CheckDemoStatus_AtExit (;155;) (type 0)
Conclusion
I have learned an incredible amount during this whole process. I had a lot of fun working
on this. Pursuing a large project for a sustained period of time has allowed
me to really push outside my comfort zone. Admittedly I’ve worked on it for a lot longer
than I initially thought I would, but I kept finding interesting problems to solve,
and seeing myself make real progress over time was incredibly satisfying.