Sketch: Linear memory GC with IT

This sketches an alternative way in which GC could in theory work in Wasm. It is probably crazy, but bear with me, it may have some merit.

NOTE: this is NOT meant to be a "proposal", it is merely a sketch of an idea with most details left open, that puts emphasis on different advantages compared to the existing proposal(s), intended for discussion. If some of these aspects are deemed important, then maybe they can inform future GC features. 

I will focus on these aspects of what would make for a good GC provided out of the box by Wasm:
1. Allow modules to have access to an industrial strength GC at zero code size cost.
2. Allow different languages (and the host) to exchange data easily.
3. Keep cost of GC as local and scalable as possible.
4. Allow existing language runtimes/codegen to interact with the Wasm GC.

This proposal keeps (1), arrives at (2) thru different means, and improves upon (3) & (4) compared to the existing GC proposals.

The basics of the idea are simple: _A linear memory program/runtime assigns certain 64KB pages to the Wasm GC. The Wasm GC manages this memory as it sees fit. The linear program can address this memory, but only has minimal guarantees on what layout to expect. Inter module communication uses Interface Types, not `anyref`. A special kind of `i32` provides exact roots._

Now, the details. This trivially retains (1), so let's tackle (2) & (3) first:

### Inter-module data exchange.

The existing GC proposal(s) assume it is desirable that different modules can share references to GC objects freely, without incurring the cost of copying, and with the benefit of being able to collect inter-module cycles trivially. There are however issues with this approach:
* The existing GC proposals do not define language "objects", they define low level GC building blocks (structs and arrays) that can be used to represent them. Different languages may have wildly different representations, especially when it comes to e.g. vtables and other implementation techniques. This means that 2 languages can typically not access objects they receive from each other directly, they can merely refer to them. For, say, Java and Haskell to exchange an object, they'll likely have to go through some form of serialization anyway. Even in the context of browsers, current JS objects (and existing JS APIs) have completely different semantics and representation from GC "structs" (yes I am aware of the JS typed objects proposal, which will help only in limited cases). We already intend to access host APIs with IT.
* Different modules holding on to each others memory is an anti-pattern, not a feature, the way I see it. "leaking" memory in a GC program is very easy, and now being at the mercy of every other module you interact with to not cause leaks in your module seems like a very brittle scenario. We cannot rely on programmers to write code that does not leak, so shielding modules from it at the module level allows for more predictable and stable systems.

So instead, we'll use Interface Types. The potential for Interface Types (with Shared-Nothing-Linking) to revolutionize how we build software out of parts, and how we can work with different languages is huge. If most interfacing with GC language modules is going to go over IT, we might as well go all the way, and get some of the benefits this would bring:
* GC being a per-module thing may have great performance benefits. Most GC algorithms do not scale linearly with the size of data being managed, having 10s (or 100s) of modules all having small, independent GC spaces allows them to be collected more quickly (with less "lag") than trying to be collected all at once, can allow collecting to trivially happen on independent threads, can even allow different GC algorithms/configurations per module that suit the language, etc.
  * It may be able to use simpler, faster collector algorithms: collecting all objects across modules and multiple threads requires incremental and/or generational designs that come at additional cost in terms of barriers that need to be checked and synchronized, something that a collector for a smaller space (with control of threading) possibly can do without.
  * It may be more suitable for Wasm runtimes that are memory constrained (embedded and mobile).
  * It allows the cost of memory use to be accessed and controlled on a per module basis, rather than the "tragedy of the commons" that could result in a many-module single GC space scenario.
* Each module can have a lot of freedom in designing (and changing) their data layout, since there are no constraints that we'd try to match something other languages are doing for exchange reasons, or having to stay compatible over time.
* Emphasis on copying rather than sharing in API design will in general result in simpler, more general APIs. Some of the cost of copying will be offset by the fact that once the data lands in another module, it is fully under that modules control. It will make it easier to swap out modules with new algorithms,  languages and language implementations.
* Cross-module references (identity of objects) should be discouraged as much as possible, but where needed it is something that is better off becoming a feature of IT, which would mean it also becomes more general (more useful in exchanges between all languages, including non-GC ones). There will always be reasons why you'd want to establish some form of identity across modules, but this is best left to a high level feature (which is very much opt-in) than a low level feature like `anyref` that affects every possible object, and can't be opted-out of.

Now for (4):

### Runtime interaction.

The vast majority of current language implementation come with some form of "runtime", usually written in C or C++. Besides the GC algorithm (which we're trying to save them from having to ship), this can include other support functions that implement the runtime semantics of the language, implementations of the built-in functionality, and general standard library implementations. Typically all that code will access (and create) GC objects directly. Under the existing GC proposals all this code would need an almost full rewrite from C/C++ into a language that is Wasm-GC aware (i.e. can emit Wasm-GC object access/creation opcodes directly from the language, to be efficient). There may be ways for this to retrofitted on to C/C++ (using intrinsics?) but how this would flow all the way thru the LLVM pipeline is an open question. Even if we had this functionality, this would still require a significant rewrite of a language runtime.

Will language implementers go thru such a rewrite? Or will they throw in the towel, and port their existing GC to linear memory, not using a standard Wasm GC at all?

There is generally the question of how well a standard generic Wasm GC will cater for all languages. It is well known that many languages have intricate GC designs, whose functionality intertwines with the language's promised semantics. Arguably, for the runtime to be able to interact with a Wasm GC directly in linear memory would expand the possible cases where such a GC can be used, in favor of porting an existing GC. Given that the GC is per-module, it can possibly be parametrized to fit the language, rather than collecting across languages generically, which means it cannot take anything about the language into account.

How would this GC/runtime interaction go? First, the linear memory program needs to assign pages to the GC, which it could do either manually, or we could also imagine a more managed system where the upper part of memory is always for the GC (such that the GC could grow it independently of the linear program, though possibly making `memory.grow` more expensive if the GC memory needs to be moved). I am sure there are many possible ways to do this, for now assume there exist linear memory addressable GC pages.

The GC would need to know the roots outside of its GC space, which I imagine would come in two forms: references on the Wasm stack, and optionally from a table managed by the linear program of additional roots. These roots can be used by the linear program as naked pointers (`i32`), so wouldn't be an `anyref`. To help the GC be exact (non-conservative), we'd introduce a new (sub)type of `i32` that, when present on the Wasm stack or a table, would be a root (let's say `i32r`). This type would degrade to `i32` when stored in (unmanaged) linear memory, and would then not guarantee the pointee to be retained (it is a "untraced ref" that can only be used if the linear program guarantees there is at least one active `i32r` on the stack/tables or `i32` in the GC pages).

Existing load/store ops can be used to access these objects, using either `i32` from linear memory or `i32r` from the stack. We'd define what a GC struct and an array must look like in the GC pages of linear memory, which given their low level nature, doesn't sound too much of a restriction to GC implementation (any additional data a GC wishes to store can be at negative offsets).

The GC makes NO promises as to the state of memory outside of well defined parts of structs and arrays, much like it is already undefined behavior to even access memory outside of a `malloc` returned block in C/C++.

`i32r` values can possibly be rewritten by the GC to point elsewhere, invalidating any "untraced ref" copies in linear memory. For this to function there needs to be a minimal level of cooperation between the linear memory program and the GC as to when these changes may happen. How, TBD.

There is of course the ability for the linear program to access GC memory, and even corrupt it. This does not seem like a problem from a language point of view: almost all programming languages "trust" their runtime code already, and all have the potential for bugs in said code to disrupt the otherwise safe execution of the language. Given that this is now isolated to a single module, the impact of this is smaller than if there was a bug in a Wasm runtime, affecting all languages within.

It does have the downside of requiring a slightly more conservative Wasm GC implementation, i.e. `i32`s loaded from linear memory will need to be bounds-checked, whereas the existing GC proposals can directly treat `anyref` as a pointer. I think this is a small cost compared to the potential for optimisation this new system otherwise brings (see above). It generally goes along with the existing Wasm philosophy of intra-module trusted, inter-module untrusted which IT builds on, and which existing GC proposals complicate.

A final problem may be non-determinism (but only for "buggy" programs): access to GC memory outside of designated objects could result in different data under different Wasm implementations. We'd have to specify that this is undefined behavior?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sketch: Linear memory GC with IT #78

Inter-module data exchange.

Runtime interaction.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sketch: Linear memory GC with IT #78

Description

Inter-module data exchange.

Runtime interaction.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions