Description
In a benchmark I discovered that AssemblyScript is significantly slower than JavaScript when using --runtime incremental
when filling Arrays.
Language | Runtime | Average | vs JS |
---|---|---|---|
JavaScript | 233.68ms | 1.0x | |
AssemblyScript | stub | 345.35ms | 1.5x |
AssemblyScript | minimal | 354.60ms | 1.5x |
AssemblyScript | incremental | 18758.50ms | 80.3x |
A d8
trace revealed the following profile:
[Bottom up (heavy) profile]:
ticks parent name
18670 96.1% /usr/lib/system/libsystem_platform.dylib
13530 72.5% Function: *~lib/rt/itcms/__renew
13530 100.0% Function: *~lib/array/ensureSize
13530 100.0% Function: *~lib/array/Array#push
13530 100.0% Function: *binaryheap_optimized/BinaryHeap#push
13530 100.0% Function: *binaryheap_optimized/push
5119 27.4% Function: *~lib/rt/itcms/__new
5119 100.0% Function: *~lib/rt/itcms/__renew
5119 100.0% Function: *~lib/array/ensureSize
5119 100.0% Function: *~lib/array/Array#push
5119 100.0% Function: *binaryheap_optimized/BinaryHeap#push
which led me to investigate the Array#push()
implementation, which callsensureSize()
. ensureSize()
will create a new buffer with exactly one slot more if the previous buffer was full. This means that once you ran out of capacity once, every subsequent call to push()
will create a new buffer copy all the data to the new buffer and create work for the garbage collector as the old buffer is now collectible.
Both Rust and Go amortize that work by doubling the capacity every time the buffer is exhausted. I’m sure C++ does the same but I couldn’t get through the stdlib code 😅
In an experiment, replacing the ~lib/array/Array
with a CustomArray<T>
that has exponential buffer growth, the performance problems go away:
Language | Variant | Runtime | Average | vs JS |
---|---|---|---|---|
JavaScript | 233.68ms | 1.0x | ||
AssemblyScript | customarray | stub | 329.43ms | 1.4x |
AssemblyScript | customarray | minimal | 329.23ms | 1.4x |
AssemblyScript | customarray | incremental | 335.35ms | 1.4x |
AssemblyScript | ~lib | stub | 345.35ms | 1.5x |
AssemblyScript | ~lib | minimal | 354.60ms | 1.5x |
AssemblyScript | ~lib | incremental | 18758.50ms | 80.3x |
I would like to PR in a fix, but wanted to check first if there are any concerns about adopting a similar approach for ASC’s Array<T>
?