Cheerp 2.7: compile C++ to WebAssembly plus JavaScript

Author(s):

14 March 2023 Update: Cheerp 3.0 released and relicensed to Apache 2.0 licence! Release notes here.

Today we are releasing Cheerp 2.7, a unique tool for creating WebAssembly and JavaScript-based libraries from C++ code bases.

UPDATE, 10th March: in depth article on PartialExecuter is out.

Powerful JavaScript-C++ interoperability, support for exceptions and for ES6 modules, and performance and code generation improvements built on top of LLVM’s clang are now available at these links:

GitHub | Issues | Install links | Documentation | Developer’s chat

The whole team at Leaning Technologies is proud to share the progress done in the last year of development, the benchmark’s results, documentation and tutorials.

Cheerp logo

Cheerp in use

  • a Virtual Machine that runs arbitrary x86 exes in-browser ->WebVM
  • the Web-based version of an architectural CAD -> home.by.me
  • a cartoonish multiplayer game (WebRTC included!) -> TeeWorlds-web

What have these projects in common?

Those are complex Web applications, they all rely heavily on interactions with a multitude of browser APIs and external libraries (interoperability ✔), they all require to be lean and performant (optimizations ✔), and all have central components that are C++-based (compiler ✔). And they are made with Cheerp.

Example of Cheerp results: WebVM, home.by.me and TeeWorlds web

The examples are diverse, but the intuition is the same:

  • Browsers are among the most widely available technology, and they act as a standardisation layer allowing both existing tools and new projects to reach new users.

JavaScript engines (Chrome’s V8, Safari’s JavaScriptCore or Firefox’s SpiderMonkey), are an efficient sandbox mechanism that allows code to be executed directly on-device.

Code running in browsers is restricted to be either JavaScript or, since its introduction in 2018, WebAssembly.

How to transform a C/C++ codebase (normally compiled as a native executable) to a WebAssembly or JavaScript-based library encoding the same logic?

Cheerp is based on LLVM’s clang, an industry standard compiler, and the same process that allows C++ code to be compiled to native is followed:

  • code is parsed, eventual warnings and errors are emitted, and then a equivalent IR (=Internal Representation) encoding the input program is generated
  • the IR is optimised by a series of transformations into more compact & more performant IR
  • the optimised IR is finally code-generated as a combination of JavaScript and WebAssembly functions and variables
Diagram of compilation pipeline: C++ to Cheerp to JavaScript and WebAssembly
C++ codebase -> Cheerp (with LLVM’s optimizations) -> JS + Wasm

Once the .js and .wasm files are generated, it’s a matter of serving those 2 static files, embedding them into the relevant HTML pages, building functionality on top of the library’s API, and testing whether all is in order.

Invoking Cheerp again after any change or iteration on your C++ codebase leads to regenerating the files, ready to served instead of the previous versions.

Why Cheerp?

A wide array of other tools exist to help developers create better JavaScript and WebAssembly libraries, but Cheerp is a powerful bridge between two very different models: garbage-collected JavaScript and linear-memory based WebAssembly. And Cheerp’s capabilities unify across the JavaScript / WebAssembly divide, allowing projects developed with Cheerp to be completely encoded in C++.

Why is this fundamental?

  • Whole program optimizations and analysis are enabled (improving performances and footprint in way that would be otherwise not reachable)
  • Allows all output to be code generated by a single tool, that means simplifying the process and allow for simpler debugging
  • The same tool will perform checks in all different components (emitting errors and warnings when things get out of sync). This can significantly improve developers productivity.
  • Allows to leverage both C++ and JavaScript ecosystems
  • Allows to leverage WebAssembly capabilities as efficient compilation-target
  • Allows your libraries to have more powerful interfaces

One example to get more concrete:

typedef bool(JSComparator)(client::Object* a, client::Object* b);[[cheerp::jsexport]]
void stableSort(client::Array& array, JSComparator compareFunc)
{
    const unsigned int len = array.get_length();
    std::stable_sort(&array[0], &array[0] + len, compareFunc);
}

What is this about?

This port the implementation of std::stableSort to JavaScript.

Example of usage while integrated is:

var someArray = [];
//populate someArray
stableSort(someArray, (a,b) => (a < b));

The input is “regular” C++ with just two additional features:

  • client::Array and client::Object classes, forward declared by cheerp-provided libraries
  • the [[cheerp::jsexport]] attribute (used to signal that the function should be part of the external interface)

The output is also “regular” JavaScript, ready to be included in whatever project or library.
Here in particular we generated a function taking a JS-callback and modifying an arbitrary Array-like object.

It could have been adding CSS properties or classes, interacting with the DOM when a callback is fired, creating objects of the same class either in C++ code or in JavaScript code, or basically whatever is required in your domain. Cheerp’s flexibility in both using and producing powerful interfaces means less constraints are put on how your program could behave.

The main advantage from using compile-from-C++ components is that the exact same logic is first ported and then executed. This means that non-trivial-to implement algorithms and data structures can easily be ported over.

Performance wise? This is likely to be the fastest user-land implementation available for a non-adaptive stable sort.

Then Array.sort it’s already provided, so that should be the recommended solution, but you will be positively surprised to discover how compilation-from-C++ might be a powerful, efficient and performant solution in your toolbox.

A year of development

As with any code base, also Cheerp undergoes iterations of improvements, fixes, added features and round of optimizations.

There have been many significant contributions since release 2.6, the most visible work from an user perspective has been done on:

Rebase to the upstream branch of LLVM

Cheerp is an open source fork of LLVM’s clang, rebasing allows any upstream improvement to be enjoyed by Cheerp-processed codebases. New C++ features (eg. -std=c++20) and new optimizations and enhanced warning messages will be up-to-speed with the latest developments.

Exception support

Cheerp now supports throwing and catching of both C++ exceptions and ‘native’ JavaScript exceptions, allowing both porting of exception-based C++ codebases and easier interaction with any JavaScript library.

Here an example:

[[cheerp::genericjs]]
void someAlgorithm(void (*func)(int)) {
     //Some arbitrary code
}[[cheerp::genericjs]][[cheerp::jsexport]]
void someFunction(void (*userProvidedCallback)(int))
{
    try
    {
        someAlgorithm(userProvidedCallback);
    }
    catch (cheerp::JSException& ex)
    {
        console.log(“There has been a native JavaScript throw");
    }
    catch (...)
    {
        console.log(“There has been a throw in the C++ code”);
    }
}

Once compiled, this code can be used in JavaScript like this:

someFunction(myFunction);
someFunction((a) => {if (a < 0) throw "Argh, a negative number";});

To enable exception support pass the -fexceptions flag at compile time.

ES6 modules support

Cheerp code generation has gained the ability to generate ES6 modules. This allows Cheerp-compiled libraries to be more easily included in module-based JavaScript deployments. The command line option -cheerp-make-module=es6 allows opt-in, with other options being commonjs, closure or no-module.

Example of usage at: https://docs.leaningtech.com/cheerp/ES6-Modules

Removal of -cheerp-cfg-legacy and cheerp-mode

We decided to discontinue two command line options.

-cfg-legacy had effect only in the selection of the internal stackifier algorithm. -cheerp-mode has been replaced already in Cheerp 2.6 by -target option that accepts either cheerp-wasm (the default) or cheerp (equivalent to -cheerp-mode=genericjs).

On our documentation the complete list of clang options.

Correctness, performance & size improvements

Following feedback from users and clients, we are always on the lookout for improvements to the parser, adding optimizations or improving code generation. In a year of work there have been plenty of local improvements that will be invisible but for the fact that error messages will be more informative, the code generated will be faster and leaner.

The other products developed by Leaning Technologies are all based on Cheerp, this means that all the experience we acquired by using our own technology over the last year is reflected in today’s release.

Partial Executer

In every release cycle we balance expanding the scope of the compiler with new features and improving it’s signature optimization capabilities.

We are very proud to introduce an innovative optimization technique acting at the LLVM’s IR level: Partial Executer.

We will soon write a more detailed article, but the basic idea is:

Given (partial) knowledge of the call-sites, it’s possible to prove that some edges in the Control Flow Graph are never taken?

If so, then it’s trivial to remove those edges and consequently remove unreachable BasicBlocks.

The basic idea expands on PreExecute, a Cheerp pass that uses LLVM’s own ExecutionEngine to try to complete the execution of the Globals initializers. If successful then it removes those calls while updating the global state.

Somehow similarly here we use the infrastructure that comes with the ExecutionEngine to execute Instructions coupled with a new components that navigates a function Control Flow Graph doing partial-execution starting from a given Call Site.

Practical example: printf. Printf takes a format string, that in most cases is known at compile time, plus other arguments that will be generally dependent on information available only at run-time. Can we partially-execute the printf logic that is dependent on the format string while skipping over parts that depends on the actual values to be printed? It turns out we can:

Control Flow Graph diagram for printf function before and after PartialExecuter
Control Flow Graphs before and after PartialExecuter is run. The arrows connect 3 BasicBlocks showing their position before and after. Most other BBs are proven never visited and removed.

Current version of PartialExecuter is centered around discover edges that are never taken, but we believe this infrastructure to open up a whole new class of optimizations. Stay tuned.

Note: We will update the article to the detailed explanation. If you are curious, check the code: PartialExecuter.cpp.

Benchmark results

Emscripten, an alternative C++ to the Web compiler, has an extensive benchmark suite that we have adopted to provide fair comparison.

Benchmarks have been done between Cheerp 2.7 rc1 and Emscripten 3.1.5.

There are two metrics to be considered, total size of the generated output and execution time.

For each of the test-cases the sum of JavaScript and WebAssembly output has been computed, normalised to the maximum. Lower is better.

Bar diagram of Cheerp and Emscripten total output size (normalized)
Lower is better

Execution time has been measured against the latest available V8 (the JavaScript engine that powers Chrome and Edge) and SpiderMonkey (Firefox), using the latest available packaged versions (v8 10.0.139 and sm 98.0b7). Timing has been measured on a server with TurboBoost disabled and using standard good practices to obtain benchmarks numbers.

Also here lower is better, and barring some inevitable noise, results should mostly be reproducible even in other settings.

Bar diagram of Cheerp and Emscripten performance on V8
Lower is better
Bar diagram of Cheerp and Emscripten performance on SpiderMonkey
Lower is better

How to install

For instruction and guidance, please visit:

Install: Ubuntu or DebianRed Hat LinuxWindows | MacOS

Or build directly from source: Linux | Windows

To get started, why not follow one of the examples (here). Then just take some simpler component in your domain, write (or adapt) a simple C++ implementation, generate an equivalent component, and substitute the old piece with the new one.

Gitter or Github’s issues are the fastest way to get in contact.

If you prefer to check the code, the Cheerp fork of LLVM is at cheerp-compiler, while cheerp-utilscheerp-libs and cheerp-newlib are external components.

Summary

Leaning Technologies is a technology company with extensive experience in solutions for the Web as a platform.

We participate in WebAssembly standardisation, pushing the standard to enable further optimizations (branch_hinting and soon tail calls), we improve browser engines (V8 and the recently started work on JSC), and we develop and evolve three unique WebAssembly+JavaScript based solutions: CheerpCheerpJ and CheerpX.

With Cheerp 2.7, we are giving developers access to the same powerful too we use to make the WebVM — an in-browser Debian terminal.

Update your Cheerp version, or give try it for the first time getting started on our documentation.

Get in touch either via the public channels or get in contact, to discover how Cheerp can bring your code to any browser.

For more of this follow us on twitter and on our website. For additional information, please visit Cheerp’s documentation or our technical blog.

Thanks to Alessandro, Jules, Lorenzo, Serena, Yuri and Tom.

Latest Blogs