[Translation] Compiling C to WebAssembly without Emscripten

[Translation] Compiling C to WebAssembly without Emscripten


The compiler is part of Emscripten . And what if you remove all the whistles and leave only him?

Emscripten is required to compile C/C ++ in WebAssembly . But this is much more than just a compiler. The goal of Emscripten is to completely replace your C/C ++ compiler and run code on the web that was not originally designed for the Web. For this, Emscripten emulates the entire POSIX operating system. If the program uses fopen () , then Emscripten will provide file system emulation. If OpenGL is used, Emscripten will provide the C-compatible GL context, supported by WebGL . This is a lot of work, and a lot of code that will have to be implemented in the final package. But can you just ... remove it?

The actual compiler in the Emscripten toolbox is LLVM. It was he who translates the C code into WebAssembly byte code. This is a modern modular framework for analyzing, transforming and optimizing programs. LLVM is modular in the sense that it never compiles directly into machine code. Instead, the built-in front-end compiler generates intermediate representation (IR). This intermediate presentation is actually called LLVM, an abbreviation of Low-Level Virtual Machine, hence the name of the project.

The backend compiler then translates the IR into host machine code. The advantage of such a strict separation is that new architectures are supported by the “simple” addition of a new compiler. In this sense, WebAssembly is just one of the many compilation goals that LLVM supports, and for some time it has been activated by a special flag. Starting with LLVM 8, the goal of compiling WebAssembly is available by default.

On MacOS, you can install LLVM using homebrew :

  $ brew install llvm
 $ brew link --force llvm  

Check for WebAssembly support:

  $ llc --version
 LLVM (http://llvm.org/):
  LLVM version 8.0.0
  Optimized build.
  Default target: x86_64-apple-darwin18.5.0
  Host CPU: skylake

  Registered Targets:
  # ... God, how many architectures ...
  systemz - systemZ
  thumb - Thumb
  thumbeb - Thumb (big endian)
  wasm32 - WebAssembly 32-bit Hooray!  Hooray!  Hooray!
  wasm64 - WebAssembly 64-bit
  x86 - 32-bit X86: Pentium-Pro and above
  x86-64 - 64-bit X86: EM64T and AMD64
  xcore - XCore  

It seems we are ready!

Compiling C in a difficult way


Note: Here we look at some low-level RAW WebAssembly formats. If you find them hard to understand, this is normal. Good use of WebAssembly does not require a mandatory understanding of all the text in this article. If you are looking for code for copy-paste, see the compiler call in the “Optimization” section . But if you're interested, keep reading! I previously wrote an introduction to clean Webassembly and WAT: these are the basics you need to understand this post.
Warning: I I deviate a little from the standard and at every step I will try to use readable formats (as far as possible). Our program here will be very simple to avoid borderline situations and not be distracted:

 //Filename: add.c
 int add (int a, int b) {
  return a * a + b;
 }  

What a magnificent engineering feat! Especially because the program is called add , but in fact it doesn’t add anything (does not add). More importantly: the program does not use the standard library, and of the types here, only 'int'.

Turn C into an internal representation of LLVM


The first step is to turn our C program into LLVM IR.This is the task of the clang frontend compiler that was installed with LLVM:

  clang \
  --target = wasm32 \ # Target WebAssembly
  -emit-llvm \ # Emit LLVM IR (instead of host machine code)
  -c \ # Only compile, no linking just yet
  -S \ # Emit human-readable assembly rather than binary
  add.c  

As a result, we get add.ll with an internal view of LLVM IR. I only show it for the sake of completeness . When working with WebAssembly or even clang, you as a C developer never come into contact with LLVM IR.

 ;  ModuleID = 'add.c'
 source_filename = "add.c"
 target datalayout = "e-m: e-p: 32: 32-i64: 64-n32: 64-S128"
 target triple = "wasm32"

 ;  Function Attrs: norecurse nounwind readnone
 define hidden i32 @add (i32, i32) local_unnamed_addr # 0 {
  % 3 = mul nsw i32% 0,% 0
  % 4 = add nsw i32% 3,% 1
  ret i32% 4
 }

 attributes # 0 = {norecurse nounwind readnone "correctly-rounded-divide-sqrt-fp-math" = "false" "disable-tail-calls" = "false" "less-precise-fpmad" = "false" "min-  legal-vector-width "=" 0 "" no-frame-pointer-elim "=" false "" no-infs-fp-math "=" false "" no-jump-tables "=" false "" no-  nans-fp-math "=" false "" no-signed-zeros-fp-math "=" false "" no-trapping-math "=" false "" stack-protector-buffer-size "=" 8 ""  target-cpu "=" generic "" unsafe-fp-math "=" false "" use-soft-float "=" false "}

 ! llvm.module.flags =! {! 0}
 ! llvm.ident =! {! 1}

 ! 0 =! {I32 1,! "Wchar_size", i32 4}
 ! 1 =! {! "Clang version 8.0.0 (tags/RELEASE_800/final)"}  

LLVM IR is full of additional metadata and annotations, which allows the compiler to make more informed decisions when generating machine code.

Turn LLVM IR into object files


The next step is to call the llc backend compiler to make an object file from the internal view.

The output add.o is already a valid WebAssembly module, which contains all the compiled code of our file C. But usually you will not be able to run the object files because they lack essential parts.

If we had omitted -filetype = obj in the command, we would get the LLVM assembler for WebAssembly, a human-readable format that is somewhat similar to WAT. However, the llvm-mc tool for working with such files does not yet fully support the format and often cannot process files. Therefore, we disassemble the object files after the fact. To validate these object files, you need a specific tool. In the case of WebAssembly, this is wasm-objdump , part WebAssembly Binary Toolkit or wabt for short.

  $ brew install wabt # in case you haven’t
 $ wasm-objdump -x add.o

 add.o: file format wasm 0x1

 Section Details:

 Type [1]:
  - type [0] (i32, i32) - & gt;  i32
 Import [3]:
  - memory [0] pages: initial = 0 & lt; - env ​​.__ linear_memory
  - table [0] elem_type = funcref init = 0 max = 0 & lt; - env ​​.__ indirect_function_table
  - global [0] i32 mutable = 1 & lt; - env ​​.__ stack_pointer
 Function [1]:
  - func [0] sig = 0 & lt; add & gt;
 Code [1]:
  - func [0] size = 75 & lt; add & gt;
 Custom:
  - name: "linking"
  - symbol table [count = 2]
  - 0: F & lt; add & gt;  func = 0 binding = global vis = hidden
  - 1: G & lt; env .__ stack_pointer & gt;  global = 0 undefined binding = global vis = default
 Custom:
  - name: "reloc.CODE"
  - relocations for section: 3 (Code) [1]
 R_WASM_GLOBAL_INDEX_LEB offset = 0x000006 (file = 0x000080) symbol = 1 & lt; env .__ stack_pointer & gt;  

The output shows that our add () function is in this module, but it also contains custom sections with metadata and, surprisingly, several imports. At the next stage of the layout , the custom sections will be analyzed and deleted, and the linker will deal with the import.

Layout


Traditionally, the task of the linker is to assemble several object files into an executable file.The LLVM linker is called lld , and it is called with an indication of the target symlink. For WebAssembly, this is wasm-ld .

  wasm-ld \
  --no-entry \ # We don’t have an entry function
  --export-all \ # Export everything (for now)
  -o add.wasm \
  add.o  

The result is a 262-byte WebAssembly module.

Start


Of course, the most important thing is to see that everything really works. As in last article , you can use a couple of lines of embedded JavaScript to load and run this WebAssembly module.

  & lt;! DOCTYPE html & gt;

 & lt; script type = "module" & gt;
  async function init () {
  const {instance} = await WebAssembly.instantiateStreaming (
  fetch ("./add.wasm")
  );
  console.log (instance.exports.add (4, 1));
  }
  init ();
 & lt;/script & gt;  

If everything is fine, you will see the number 17 in the DevTool console. We have just successfully compiled C into WebAssembly, without touching Emscripten. It is also worth noting that there is no glue code to configure and load the WebAssembly module. < br/>

Compiling C is a bit easier


We have taken quite a few steps to compile C into WebAssembly. As I said, for educational purposes, we have reviewed in detail all the steps. Let's skip the human-readable intermediate formats and immediately apply the C compiler as a Swiss army knife, as it was designed:

  clang \
  --target = wasm32 \
  -nostdlib \ # Don’t try and link against a standard library
  -Wl, - no-entry \ # Flags passed to the linker
  -Wl, - export-all \
  -o add.wasm \
  add.c  

Here we get the same file .wasm , but with one command.

Optimization


Look at the WAT of our WebAssembly module by running wasm2wat :

  (module
  (type (; 0;) (func))
  (type (; 1;) (func (param i32 i32) (result i32)))
  (func $ __ wasm_call_ctors (type 0))
  (func $ add (type 1) (param i32 i32) (result i32)
  (local i32 i32 i32 i32 i32 i32 i32 i32)
  global.get 0
  local.set 2
  i32.const 16
  local.set 3
  local.get 2
  local.get 3
  i32.sub
  local.set 4
  local.get 4
  local.get 0
  i32.store offset = 12
  local.get 4
  local.get 1
  i32.store offset = 8
  local.get 4
  i32.load offset = 12
  local.set 5
  local.get 4
  i32.load offset = 12
  local.set 6
  local.get 5
  local.get 6
  i32.mul
  local.set 7
  local.get 4
  i32.load offset = 8
  local.set 8
  local.get 7
  local.get 8
  i32.add
  local.set 9
  local.get 9
  return)
  (table (; 0;) 1 1 anyfunc)
  (memory (; 0;) 2)
  (global (; 0;) (mut i32) (i32.const 66560))
  (global (; 1;) i32 (i32.const 66560))
  (global (; 2;) i32 (i32.const 1024))
  (global (; 3;) i32 (i32.const 1024))
  (export "memory" (memory 0))
  (export "__ wasm_call_ctors" (func $ __ wasm_call_ctors))
  (export "__heap_base" (global 1))
  (export "__data_end" (global 2))
  (export "__dso_handle" (global 3))
  (export "add" (func $ add)))  

Wow, what a great code. To my surprise, the module uses memory (as seen in the operations of i32.load and i32.store ), eight local and several global variables. Probably, you can manually write a more concise version. This program is so big because we have not applied any optimizations. Let's do it:

  clang \
  --target = wasm32 \
 + -O3 \ # Agressive optimizations
 + -flto \ # Add metadata for link-time optimizations
  -nostdlib \
  -Wl, - no-entry \
  -Wl, - export-all \
 + -Wl, - lto-O3 \ # Aggressive link-time optimizations
  -o add.wasm \
  add.c  

Note: technically, layout optimization (LTO) does not provide any advantages, since we compose only one file. In large LTO projects, it will help to significantly reduce the file size.
After executing these commands, the file.wasm decreased from 262 to 197 bytes, and WAT also became much simpler:

  (module
  (type (; 0;) (func))
  (type (; 1;) (func (param i32 i32) (result i32)))
  (func $ __ wasm_call_ctors (type 0))
  (func $ add (type 1) (param i32 i32) (result i32)
  local.get 0
  local.get 0
  i32.mul
  local.get 1
  i32.add)
  (table (; 0;) 1 1 anyfunc)
  (memory (; 0;) 2)
  (global (; 0;) (mut i32) (i32.const 66560))
  (global (; 1;) i32 (i32.const 66560))
  (global (; 2;) i32 (i32.const 1024))
  (global (; 3;) i32 (i32.const 1024))
  (export "memory" (memory 0))
  (export "__ wasm_call_ctors" (func $ __ wasm_call_ctors))
  (export "__heap_base" (global 1))
  (export "__data_end" (global 2))
  (export "__dso_handle" (global 3))
  (export "add" (func $ add)))  

Call Standard Library


Using C without the standard libc library seems rather rude. It is logical to add it, but I will be honest: it will not be easy. In fact, we do not directly invoke any libc libraries in the article . There are several suitable ones, especially glibc , musl and dietlibc . However, most of these libraries are supposed to run on the POSIX operating system, which implements a specific set of system calls. Since we do not have a kernel interface in JavaScript, we will have to independently implement these POSIX system calls, probably through JavaScript. This is a difficult task and I am not going to do this here. The good news is that this is what Emscripten does for you .

Of course, not all libc functions rely on system calls. Functions like strlen () , sin () or even memset () are implemented in simple C. This means that you can use these functions or even just copy/paste their implementation from some of the mentioned libraries.

Dynamic Memory


Without libc, we cannot access fundamental C interfaces, such as malloc () and free () . In a non-optimized WAT, we have seen that the compiler uses memory if necessary. This means that we cannot simply use the memory as we like without risking damage. Need to understand how it is used.

LLVM Memory Models


The way that WebAssembly is segmented will surprise experienced programmers a little. First, WebAssembly technically admits a null address, but often it is still treated as an error. Secondly, the stack comes first and grows down (to lower addresses), and the heap appears later and grows up. The reason is that the WebAssembly memory may increase at runtime. This means that there is no fixed end for placing the stack or heap.

Here’s the wasm-ld layout:



The stack grows down, and the pile goes up. The stack starts with __ data_end , and the heap starts with __ heap_base . Because the stack is placed first, it is limited to the maximum size set at compilation, that is, __ heap_base minus __ data_end

If you go back and look at the globals section in our WAT, we find these values: __ heap_base is set to 66560, and __ data_end is set to 1024. This means that the stack can grow to a maximum of 64 KiB, which is not much. Fortunately, wasm-ld allows you to change this value:

  clang \
  --target = wasm32 \
  -O3 \
  -flto \
  -nostdlib \
  -Wl, - no-entry \
  -Wl, - export-all \
  -Wl, - lto-O3 \
 + -Wl, -z, stack-size = $ [8 * 1024 * 1024] \ # Set maximum stack size to 8MiB
  -o add.wasm \
  add.c  

Allocator build


The heap area is known to begin with __ heap_base .Since the malloc () function is missing, we know that the next memory area can be safely used. We can place the data there as you wish, and you don’t need to be afraid of memory damage as the stack grows in the opposite direction. However, a heap free for all can quickly become clogged, so some kind of dynamic memory management is usually required. One option is to take the full malloc () implementation, such as Dag Lee’s malloc implementation , which is used in Emscripten. There are a few smaller implementations with different tradeoffs.

But why not write your own malloc () ? We are so deeply stuck that it makes no difference. One of the simplest is the bump-allocator: it is super-fast, extremely small and easy to implement. But there is a drawback: you cannot free memory. Although at first glance, such an allocator seems incredibly useless, but when developing Squoosh , I ran into precedents where it would be an excellent choice. The concept of a bump allocator is that we store the starting address of unused memory as global. If the program requests n bytes of memory, we move the marker to n and return the previous value:

  extern unsigned char __heap_base;

 unsigned int bump_pointer = & amp; __ heap_base;
 void * malloc (int n) {
  unsigned int r = bump_pointer;
  bump_pointer + = n;
  return (void *) r;
 }

 void free (void * p) {
//lol
 }  

Global variables from WAT are actually defined by wasm-ld , so we can access them from our C code as ordinary variables if we declare them extern . So, we just wrote our own malloc () ... in five lines of C .

Note: our bump-allocator is not fully compatible with malloc () from C. For example, we do not give any guarantees of alignment. But it works quite well, so ...

Dynamic Memory Usage


To check, let's make a function C, which takes an array of numbers of arbitrary size and calculates the sum. Not very interesting, but it forces the use of dynamic memory, since we do not know the size of the array at the time of assembly:

  int sum (int a [], int len) {
  int sum = 0;
  for (int i = 0; i & lt; len; i ++) {
  sum + = a [i];
  }
  return sum;
 }  

The function sum (), I hope, is pretty understandable. A more interesting question is how to transfer an array from JavaScript to WebAssembly - after all, WebAssembly understands only numbers. The general idea is to use malloc () from JavaScript to allocate a piece of memory, copy values ​​there and pass the address (number!) Where array is located:

  & lt;! DOCTYPE html & gt;

 & lt; script type = "module" & gt;
  async function init () {
  const {instance} = await WebAssembly.instantiateStreaming (
  fetch ("./add.wasm")
  );

  const jsArray = [1, 2, 3, 4, 5];
//Allocate memory for 5 32-bit integers
//and return get starting address.
  const cArrayPointer = instance.exports.malloc (jsArray.length * 4);
//Turn that sequence of 32-bit integers
//into a Uint32Array, starting at that address.
  const cArray = new Uint32Array (
  instance.exports.memory.buffer,
  cArrayPointer,
  jsArray.length
  );
//Copy the values ​​from JS to C.
  cArray.set (jsArray);
//Run the function, passing the address and length.
  console.log (instance.exports.sum (cArrayPointer, cArray.length));
  }
  init ();
 & lt;/script & gt;  

After starting, you should see in the DevTools console the answer is 15, which is really the sum of all numbers from 1 to 5.

Conclusion


So, you read to the end. Congratulations! Again, if you feel a bit overwhelmed, everything is fine. It’s not necessary to read all the details.Understanding them is completely optional for a good web developer and is not even required for excellent use of WebAssembly . But I wanted to share this information, because it allows you to really appreciate all the work that a project like Emscripten does for you . At the same time, it gives an understanding of how small the purely computational modules of a WebAssembly can be. The Wasm module for the summation of the array contained just 230 bytes, including the dynamic memory allocator . Compiling the same code with Emscripten will produce 100 bytes of WebAssembly code and 11K of junction JavaScript code. You need to try for the sake of such a result, but there are situations when it is worth it.

Source text: [Translation] Compiling C to WebAssembly without Emscripten