optimize the codegen of Vec::clone #290

japaric · 2022-05-10T10:50:05Z

these changes optimize Vec<u8, 1024>::clone down to these operations

reserve the stack space (1028 bytes on 32-bit ARM) and leave it uninitialized
zero the len field
memcpy len bytes of data from the parent

analyzed source code

use heapless::Vec;

fn clone(vec: &Vec<u8, 1024>) {
    let mut vec = vec.clone();
    black_box(&mut vec);
}

fn black_box<T>(val: &mut T) {
    unsafe { asm!("// {0}", in(reg) val) }
}

machine code with lto = fat, codegen-units = 1 and opt-level = 'z' ('z' instead of 3 to avoid loop unrolling and keep the machine code readable)

00020100 <clone>:
   20100:              b5d0             push    {r4, r6, r7, lr}
   20102:              af02             add     r7, sp, #8
   20104:              f5ad 6d81        sub.w   sp, sp, #1032   ; 0x408
   20108:              2300             movs    r3, #0
   2010a:              c802             ldmia   r0!, {r1}
   2010c:              9301             str     r3, [sp, #4]
   2010e:              aa01             add     r2, sp, #4
   20110:       /--/-X b141             cbz     r1, 20124 <clone+0x24>
   20112:       |  |   4413             add     r3, r2
   20114:       |  |   f810 4b01        ldrb.w  r4, [r0], #1
   20118:       |  |   3901             subs    r1, #1
   2011a:       |  |   711c             strb    r4, [r3, #4]
   2011c:       |  |   9b01             ldr     r3, [sp, #4]
   2011e:       |  |   3301             adds    r3, #1
   20120:       |  |   9301             str     r3, [sp, #4]
   20122:       |  \-- e7f5             b.n     20110 <clone+0x10>
   20124:       \----> a801             add     r0, sp, #4
   20126:              f50d 6d81        add.w   sp, sp, #1032   ; 0x408
   2012a:              bdd0             pop     {r4, r6, r7, pc}

note that it's not optimizing step (3) to an actual memcpy because we lack the 'trait specialization' code that libstd uses

before clone was optimized to

reserve and zero (memclr) 1028 (!?) bytes of stack space
(unnecessarily) runtime check if len is equal or less than 1024 (capacity) -- this included a panicking branch
memcpy len bytes of data from the parent

these changes optimize `Vec<u8, 1024>::clone` down to these operations 1. reserve the stack space (1028 bytes on 32-bit ARM) and leave it uninitialized 2. zero the `len` field 3. memcpy `len` bytes of data from the parent analyzed source code ``` rust use heapless::Vec; fn clone(vec: &Vec<u8, 1024>) { let mut vec = vec.clone(); black_box(&mut vec); } fn black_box<T>(val: &mut T) { unsafe { asm!("// {0}", in(reg) val) } } ``` machine code with `lto = fat`, `codegen-units = 1` and `opt-level = 'z'` ('z' instead of 3 to avoid loop unrolling and keep the machine code readable) ``` armasm 00020100 <clone>: 20100: b5d0 push {r4, r6, r7, lr} 20102: af02 add r7, sp, #8 20104: f5ad 6d81 sub.w sp, sp, #1032 ; 0x408 20108: 2300 movs r3, #0 2010a: c802 ldmia r0!, {r1} 2010c: 9301 str r3, [sp, #4] 2010e: aa01 add r2, sp, #4 20110: /--/-X b141 cbz r1, 20124 <clone+0x24> 20112: | | 4413 add r3, r2 20114: | | f810 4b01 ldrb.w r4, [r0], #1 20118: | | 3901 subs r1, #1 2011a: | | 711c strb r4, [r3, #4] 2011c: | | 9b01 ldr r3, [sp, #4] 2011e: | | 3301 adds r3, #1 20120: | | 9301 str r3, [sp, #4] 20122: | \-- e7f5 b.n 20110 <clone+0x10> 20124: \----> a801 add r0, sp, #4 20126: f50d 6d81 add.w sp, sp, #1032 ; 0x408 2012a: bdd0 pop {r4, r6, r7, pc} ``` note that it's not optimizing step (3) to an actual `memcpy` because we lack the 'trait specialization' code that libstd uses --- before `clone` was optimized to 1. reserve and zero (`memclr`) 1028 (!?) bytes of stack space 2. (unnecessarily) runtime check if `len` is equal or less than 1024 (capacity) -- this included a panicking branch 3. memcpy `len` bytes of data from the parent

japaric · 2022-05-10T11:47:47Z

bors r+

bors · 2022-05-10T11:52:48Z

Build succeeded:

ci

bors bot merged commit e52d483 into master May 10, 2022

bors bot deleted the vec-faster-clone branch May 10, 2022 11:52

jordens mentioned this pull request Jul 24, 2022

memset()/memclr() emitted for Vec::new() #305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize the codegen of Vec::clone #290

optimize the codegen of Vec::clone #290

Uh oh!

japaric commented May 10, 2022

Uh oh!

japaric commented May 10, 2022

Uh oh!

bors bot commented May 10, 2022

Uh oh!

Uh oh!

optimize the codegen of Vec::clone #290

optimize the codegen of Vec::clone #290

Uh oh!

Conversation

japaric commented May 10, 2022

Uh oh!

japaric commented May 10, 2022

Uh oh!

bors bot commented May 10, 2022

Uh oh!

Uh oh!