Skip to content

CONFIG_THUMB2_KERNEL fails to boot when linked w/ LLD #773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nickdesaulniers opened this issue Nov 22, 2019 · 22 comments
Closed

CONFIG_THUMB2_KERNEL fails to boot when linked w/ LLD #773

nickdesaulniers opened this issue Nov 22, 2019 · 22 comments
Assignees
Labels
[ARCH] arm32 This bug impacts ARCH=arm [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LLVM] 11 This bug was fixed in LLVM 11.0 [TOOL] lld The issue is relevant to LLD linker

Comments

@nickdesaulniers
Copy link
Member

MTK asked me about CONFIG_THUMB2_KERNEL support for ARCH=arm. Looks like everything in mainline is fine and boots when linked with BFD. With LLD, I get no output from QEMU. Tested on mainline, defconfig+CONFIG_THUMB2_KERNEL.

@nickdesaulniers nickdesaulniers added [BUG] Untriaged Something isn't working [TOOL] lld The issue is relevant to LLD linker [ARCH] arm32 This bug impacts ARCH=arm labels Nov 22, 2019
@nickdesaulniers
Copy link
Member Author

maybe not a bug, via https://nickdesaulniers.github.io/blog/2018/10/24/booting-a-custom-linux-kernel-in-qemu-and-debugging-it-with-gdb/, lx-dmesg looks like:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.4.0-rc8-00037-gcc079039c9b6 ([email protected]) (Nick Desaulniers clang version 10.0.0 ([email protected]:llvm/llvm-project.git 0aed648649775bfe29b01f7ad9b072f92df7c448)) #10 SMP Fri Nov 22 13:30:19 PST 2019
[    0.000000] CPU: ARMv7 Processor [412fc0f1] revision 1 (ARMv7), cr=50c5387d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[    0.000000] OF: fdt: Machine model: linux,dummy-virt
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 64 MiB at 0xbc000000
[    0.000000] On node 0 totalpages: 524288
[    0.000000]   DMA zone: 1536 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 196608 pages, LIFO batch:63
[    0.000000]   HighMem zone: 327680 pages, LIFO batch:63
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv0.2 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] percpu: Embedded 20 pages/cpu s49228 r8192 d24500 u81920
[    0.000000] pcpu-alloc: s49228 r8192 d24500 u81920 alloc=20*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 522752
[    0.000000] Kernel command line: console=ttyAMA0 root=/dev/ram0
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 1989812K/2097152K available (10240K kernel code, 1660K rwdata, 4912K rodata, 2048K init, 395K bss, 41804K reserved, 65536K cma-reserved, 1245184K highmem)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: 	RCU event tracing is enabled.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=1.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143]
[    0.000000] random: get_random_bytes called from start_kernel+0x19d/0x30a with crng_init=0
[    0.000000] arch_timer: cp15 timer(s) running at 62.50MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
[    0.000187] sched_clock: 56 bits at 62MHz, resolution 16ns, wraps every 4398046511096ns
[    0.000332] Switching to timer-based delay loop, resolution 16ns
[    0.029584] Console: colour dummy device 80x30
[    0.050209] Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=625000)
[    0.056956] pid_max: default: 32768 minimum: 301
[    0.143776] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.144614] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.216273] CPU: Testing write buffer coherency: ok
[    0.219825] CPU0: Spectre v2: firmware did not set auxiliary control register IBE bit, system vulnerable
[    0.245184] /cpus/cpu@0 missing clock-frequency property
[    0.246545] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.252400] Setting up static identity map for 0x40300000 - 0x403000a0
[    0.254613] rcu: Hierarchical SRCU implementation.
--Type <RET> for more, q to quit, c to continue without paging--
[    0.259530] EFI services will not be available.
[    0.260752] smp: Bringing up secondary CPUs ...
[    0.260825] smp: Brought up 1 node, 1 CPU
[    0.260949] SMP: Total of 1 processors activated (125.00 BogoMIPS).
[    0.261045] CPU: All CPU(s) started in SVC mode.
[    0.271450] devtmpfs: initialized
[    0.280329] VFP support v0.3: implementor 41 architecture 4 part 30 variant f rev 0
[    0.294131] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.295057] futex hash table entries: 256 (order: 2, 16384 bytes, linear)
[    0.295558] Run 
[    0.295558]  as init process
[    0.300132] Run /etc/init as init process
[    0.300524] Run /bin/init as init process
[    0.300580] Run /bin/sh as init process
[    0.300795] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
[    0.301085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc8-00037-gcc079039c9b6 #10
[    0.301111] Hardware name: Generic DT based system
[    0.302015] [<c030e783>] (unwind_backtrace) from [<c030a4a9>] (show_stack+0xb/0xc)
[    0.302157] [<c030a4a9>] (show_stack) from [<c0bf8cd7>] (dump_stack+0x7d/0x9e)
[    0.302188] [<c0bf8cd7>] (dump_stack) from [<c033b00d>] (panic+0xc9/0x260)
[    0.302222] [<c033b00d>] (panic) from [<c0c0ab2d>] (kernel_init+0x1a1/0x222)
[    0.302336] [<c0c0ab2d>] (kernel_init) from [<ee0bde40>] (0xee0bde40)
[    0.303017] ---[ end Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. ]---

I can see the serial driver get loaded w/ BFD but not with LLD

[    0.094728] Serial: AMBA PL011 UART driver

bfd.txt
lld.txt

@nickdesaulniers
Copy link
Member Author

Also, BFD linked kernel loaded the initrd just fine.

@mwsealey
Copy link

It looks like anything marked with __init isn't being called - the "Freeing unused kernel memory" stanza is completely missing and always comes before ramdisk init launch, and the ramdisk is never extracted (populate_rootfs is __init and run through __define_initcall which sticks it in .init).

.. could lld be dropping initcall sections?

It's also rather curious that it seems to have lost the /init in "Run /init as init process" and replaced it with a NULL or something:

[ 0.295558] Run [ 0.295558] as init process

@ihalip
Copy link

ihalip commented Dec 19, 2019

I think I found what happens. Some __init functions do get called, up until futex_init.

futex.c runs some asm to check a runtime feature. futex_atomic_cmpxchg_inatomic (more specifically, the ldrex instruction) intentionally triggers an exception, and __do_kernel_fault gets called through the exception handler. The call stack looks like this:

#0  fixup_exception (regs=0xdb099dd0) at arch/arm/mm/extable.c:12
#1  0xc0311f64 in __do_kernel_fault (mm=0x0, addr=0, fsr=5, regs=0xdb099dd0) at arch/arm/mm/fault.c:112
#2  0xc03121f6 in do_page_fault (addr=0, fsr=5, regs=0xdb099dd0) at arch/arm/mm/fault.c:376
#3  0xc03120ec in do_translation_fault (addr=0, fsr=5, regs=0xdb099dd0) at arch/arm/mm/fault.c:415
#4  0xc0311ffe in do_DataAbort (addr=0, fsr=5, regs=0xdb099dd0) at arch/arm/mm/fault.c:533
#5  0xc0301a52 in __dabt_svc () at arch/arm/kernel/entry-armv.S:203 <------ data access exception handler

The issue is that when __dabt_svc returns, PC jumps to an incorrect location. With ld.bfd, PC points inside futex_detect_cmpxchg, while with ld.lld PC points to rest_init. As a result, the execution continues from an incorrect location.

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Dec 20, 2019

I wonder if this is a similar issue to #282 (https://lore.kernel.org/r/[email protected]/) (I'm suspecting there may be a bug in CONFIG_BUILDTIME_EXTABLE_SORT related to orphan section placement). We should see if disabling CONFIG_BUILDTIME_EXTABLE_SORT makes a difference.

@ihalip
Copy link

ihalip commented Dec 20, 2019

No, that's not the issue. I manually disabled the exception table sorting, there's no difference. The exception table is fine.

The issue happens because the code in the exception table handlers isn't correct. This seems to happen for all handlers created through __futex_atomic_ex_table. If you look at the macro, it adds an entry to __ex_table, then creates the handler for it in .text.fixup. When building with ld.lld and ld.bfd, I see that the handler address is the same. The handler has a branch instruction which points to a wrong address with ld.lld, and the right address with ld.bfd.

AFAICT, that address should be filled by as, relative to the current instruction... but maybe ld.bfd does some extra work to replace that address.

@ihalip
Copy link

ihalip commented Dec 20, 2019

fixup_exception gets called through the data exception handler. It finds a kernel handler associated with regs->pc (the handler was created by __futex_atomic_ex_table), which is then called. That handler then jumps to a bad address. This is what I mean:

BAD (ld.lld):

(gdb) b fixup_exception 
Breakpoint 1 at 0xc031117c: file arch/arm/mm/extable.c, line 12.
(gdb) c
Continuing.

Breakpoint 1, fixup_exception (regs=0xdb049dc8) at arch/arm/mm/extable.c:12
12		fixup = search_exception_tables(instruction_pointer(regs));
(gdb) n
13		if (fixup) {
(gdb) x/20i fixup->fixup
   0xc039cf84:	mov	r3, r1
   0xc039cf86:	b.w	0xc0bb6330 <rest_init+144>    <---- incorrect address
   0xc039cf8a:	nop
...

GOOD (ld.bfd):

(gdb) b fixup_exception 
Breakpoint 1 at 0xc031117c: file arch/arm/mm/extable.c, line 12.
(gdb) c
Continuing.

Breakpoint 1, fixup_exception (regs=0xdb049dc8) at arch/arm/mm/extable.c:12
12		fixup = search_exception_tables(instruction_pointer(regs));
(gdb) n
13		if (fixup) {
(gdb) x/20i fixup->fixup
   0xc039cf84:	mov	r3, r1
   0xc039cf86:	b.w	0xc1219a58 <futex_detect_cmpxchg+116>    <---- correct address
   0xc039cf8a:	nop
...

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Dec 20, 2019

Can you provide a dump of the __ex_tables section with relocations? Oh, looks like llvm-objdump can't: https://llvm.org/pr44357.

kernel_futext_extable.txt

Looks like just a bunch of standard R_ARM_ABS32 relocations, so it seems weird that lld would mess those up. I don't think LLD is resolving relocations incorrectly; maybe search_exception_tables is failing.

@ihalip
Copy link

ihalip commented Dec 20, 2019

Attached: kernel_futex_extable.txt

maybe search_exception_tables is failing

Don't think so. It returns the same handler for the same instruction that triggered the exception, in both cases.
lld:

(gdb) p/x *fixup
$1 = {
  insn = 0xc131bfec, 
  fixup = 0xc03a9030
}
(gdb) x/i fixup->insn
   0xc131bfec <futex_detect_cmpxchg+94>:	ldrex	r0, [lr]
(gdb) x/3i fixup->fixup
   0xc03a9030:	mov	r3, r1
   0xc03a9032:	b.w	0xc0c3e71c <rest_init+144>
   0xc03a9036:	nop

bfd:

(gdb) p/x *fixup
$1 = {
  insn = 0xc131bfec, 
  fixup = 0xc03a9030
}
(gdb) x/i fixup->insn
   0xc131bfcc <futex_detect_cmpxchg+94>:	ldrex	r0, [lr]
(gdb) x/3i fixup->fixup
   0xc03a9030:	mov	r3, r1
   0xc03a9032:	b.w	0xc131bfe2 <futex_detect_cmpxchg+116>
   0xc03a9036:	nop

@nickdesaulniers
Copy link
Member Author

I still suspect that sorting is failing. It might be interesting to print insn/fixup pairs of the __ex_table BEFORE runtime sorting, and compare+validate against bfd.

@ihalip
Copy link

ihalip commented Dec 22, 2019

I still suspect that sorting is failing

Probably not. I get a similar behavior, with the fixups pointing to other functions in kernel, by dropping --pic-veneer from the linker command line... so it's probably either an issue with the linker generated thunks, or how the linker treats Arm<-->Thumb-2 calls.

Also, while __ex_table has R_ARM_ABS32 as you mentioned, the fixups which are actually in .text.fixup are R_ARM_THM_JUMP24. An example:

  58:	460b      	mov	r3, r1
  5a:	f000 b88e 	b.w	120 <futex_exec_release+0x6>
			5a: R_ARM_THM_JUMP24	.init.text
  5e:	bf00      	nop

@ihalip
Copy link

ihalip commented Jan 7, 2020

The more time I look at this, the more I'm convinced this is related to #325.

@smithp35
Copy link

smithp35 commented Jan 22, 2020

One existing problem in LLD that I know about is that strictly it should not interwork or perform the BL to BLX transition when the symbol has type STT_NOTYPE. LLD at the moment does not have the symbol type available to it when it decides whether to do a BL or BLX so it goes on the bottom bit regardless of the Symbol type.

I've put an example below to illustrate the problem:

$ llvm-mc --triple=armv7a-linux-gnueabihf --arm-add-build-attributes inter.s -filetype=obj -o inter.o
$ ld.lld inter.o -o inter.axf
$ llvm-objdump -d inter.axf
        .syntax unified
        .section .arm_target, "ax", %progbits
        .balign 4
        .arm
        .type bar, %function
bar:    
        bx lr

        .section .thumb_target, "ax", %progbits
        .balign 4
        .thumb
        .type foo, %function
foo:    
        bx lr

        .section .arm_caller, "ax", %progbits
        .balign 4
        .arm
        bl .arm_target   // expect bl STT_SECTION (bit 0 clear anyway)
        bl .thumb_target // expect bl STT_SECTION bit 0 clear anyway)
        bl foo           // expect blx STT_FUNC bit 0 clear
        bl bar           // expect bl STT_FUNC bit 0 set

        .section .thumb_caller, "ax", %progbits
        .thumb
        .balign 4
        bl .arm_target   // expect bl STT_SECTION (bit 0 clear, so LLD does blx)
        bl .thumb_target // expect bl STT_SECTION (bit 0 clear, so LLD does blx)
        bl foo           // expect bl STT_FUNC bit 0 set
        bl bar           // expect blx STT_FUNC bit 0 clear

gives me:

Disassembly of section .arm_caller:

000110bc $a.2:
   110bc: fc ff ff eb                  	bl	#-16 <bar>
   110c0: fc ff ff eb                  	bl	#-16 <foo>
   110c4: fb ff ff fa                  	blx	#-20 <foo>
   110c8: f9 ff ff eb                  	bl	#-28 <bar>

Disassembly of section .thumb_caller:

000110cc $t.3:
   110cc: ff f7 f2 ef                  	blx	#-28
   110d0: ff f7 f2 ef                  	blx	#-28
   110d4: ff f7 f0 ff                  	bl	#-32
   110d8: ff f7 ec ef                  	blx	#-40

Whereas GNU ld correctly gives me:

Disassembly of section .arm_caller:

0001005c $a.2:
   1005c: fc ff ff eb                  	bl	#-16 <bar>
   10060: fc ff ff eb                  	bl	#-16 <foo>
   10064: fb ff ff fa                  	blx	#-20 <foo>
   10068: f9 ff ff eb                  	bl	#-28 <bar>

Disassembly of section .thumb_caller:

0001006c $t.3:
   1006c: ff f7 f2 ff                  	bl	#-28
   10070: ff f7 f2 ff                  	bl	#-28
   10074: ff f7 f0 ff                  	bl	#-32
   10078: ff f7 ec ef                  blx	#-40

If you have branches to non STT_FUNC symbols then there this a reasonable chance of LLD doing interworking when you don't want it to as most non STT_FUNC symbols will look like ARM to it.

This is fixable in LLD by threading the symbol through to the relocations. If there is a concrete use case from an important program then I think I've got a good argument that the patch is necessary.

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Jan 22, 2020

I spoke more with @smithp35 today (cc @MaskRay ) online.

If you have branches to non STT_FUNC symbols then there this a reasonable chance of LLD doing interworking when you don't want it to as most non STT_FUNC symbols will look like ARM to it.

I suspect that may be what's happening. In particular, via this comment, this code reproduced:

#define __futex_atomic_ex_table(err_reg)			\
	"3:\n"							\
	"	.pushsection __ex_table,\"a\"\n"		\
	"	.align	3\n"					\
	"	.long	1b, 4f, 2b, 4f\n"			\
	"	.popsection\n"					\
	"	.pushsection .text.fixup,\"ax\"\n"		\
	"	.align	2\n"					\
	"4:	mov	%0, " err_reg "\n"			\
	"	b	3b\n"					\
	"	.popsection"

@smithp35 suspects the b 3b may not know that the "local label" 3's symbol type (it's not STT_FUNC, see below), and there doesn't seem to be a way to express it. (I tried ".type 3f, %%function\n" but GAS won't accept the 3; it wants a symbol).

Also, playing with CONFIG_BUILDTIME_TABLE_SORT didn't seem to make a difference.

I spent some time playing with gcc vs clang, bfd vs lld, and it seems that this is reproducible when using lld, regardless of compiler.

So to reproduce:

$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LD=ld.lld -j71 defconfig
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LD=ld.lld -j71 menuconfig
# enable CONFIG_THUMB2_KERNEL=y and save
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LD=ld.lld -j71
# llvm-objdump has issues w/ thumb vs arm:
# https://llvm.org/pr44626
$ arm-linux-gnueabihf-objdump -drj .text.fixup kernel/futex.o
kernel/futex.o:     file format elf32-littlearm

Disassembly of section .text.fixup:

00000000 <.text.fixup>:
   0:   4604            mov     r4, r0
   2:   f000 b862       b.w     c8 <.LC4+0x60>
                        2: R_ARM_THM_JUMP24     .text
   6:   bf00            nop
   8:   4610            mov     r0, r2
   a:   f002 beec       b.w     2ddc <do_futex+0x530>
                        a: R_ARM_THM_JUMP24     .text
   e:   bf00            nop
  10:   4610            mov     r0, r2
  12:   f002 bfae       b.w     2f60 <do_futex+0x6b4>
                        12: R_ARM_THM_JUMP24    .text
  16:   bf00            nop
  18:   4610            mov     r0, r2
  1a:   f002 bfd2       b.w     2fa8 <do_futex+0x6fc>
                        1a: R_ARM_THM_JUMP24    .text
  1e:   bf00            nop
  20:   4610            mov     r0, r2
  22:   f002 bff6       b.w     2ff0 <do_futex+0x744>
                        22: R_ARM_THM_JUMP24    .text
  26:   bf00            nop
  28:   4610            mov     r0, r2
  2a:   f003 b819       b.w     3036 <do_futex+0x78a>
                        2a: R_ARM_THM_JUMP24    .text
  2e:   bf00            nop

So there's 6 pair's of "fixups" (each followed by a nop to realign to 2^2 B). To check the symbol type:

$ llvm-readelf -S kernel/futex.o

TODO: finish comment

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Jan 23, 2020

$ llvm-readelf -S kernel/futex.o
...
  [10] __ex_table        PROGBITS        00000000 0038d0 000060 00   A  0   0  8
  [11] .rel__ex_table    REL             00000000 043768 0000c0 08   I 45  10  4
  [12] .text.fixup       PROGBITS        00000000 003930 000030 00  AX  0   0  4
  [13] .rel.text.fixup   REL             00000000 043828 000030 08   I 45  12  4
...

Shows the interesting section id/numbers (ie. 10, 11, 12, and 13). To see the symbol type:

$ llvm-readelf -s kernel/futex.o
...
   Num:    Value  Size Type    Bind   Vis       Ndx Name
...
    14: 00000000     0 SECTION LOCAL  DEFAULT    10 __ex_table
    15: 00000000     0 NOTYPE  LOCAL  DEFAULT    10 $d
    16: 00000000     0 SECTION LOCAL  DEFAULT    12 .text.fixup
    17: 00000000     0 NOTYPE  LOCAL  DEFAULT    12 $t

And llvm-readobj seems to confirm:

$ llvm-readobj --section-symbols -s kernel/futex.o
...
  Section {
    Index: 12
    Name: .text.fixup (111)
    Type: SHT_PROGBITS (0x1)
    Flags [ (0x6)
      SHF_ALLOC (0x2)
      SHF_EXECINSTR (0x4)
    ]
    Address: 0x0
    Offset: 0x3930
    Size: 48
    Link: 0
    Info: 0
    AddressAlignment: 4
    EntrySize: 0
    Symbols [
      Symbol {
        Name: .text.fixup (0)
        Value: 0x0
        Size: 0
        Binding: Local (0x0)
        Type: Section (0x3)
        Other: 0
        Section: .text.fixup (0xC)
      }
      Symbol {
        Name: $t (9)
        Value: 0x0
        Size: 0
        Binding: Local (0x0)
        Type: None (0x0)
        Other: 0
        Section: .text.fixup (0xC)
      }
    ]
  }

Notice NOTYPE and Type: None for $t. (though I am surprised that there's not multiple symbols; one for each fixup). @smithp35 mentioned that $t had some significance vs $a or $d. ($a: arm, $t: thumb, $d: data). But I think that confirms @smithp35 's hypothesis about branches to non-STT_FUNC symbols.

Now when we repro @ihalip 's setup from above first with bfd, then lld:

BFD

rebuild:

# Also enabled CONFIG_DEBUG_INFO for debug symbols.
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make -j71

run in qemu:

$ qemu-system-arm -kernel arch/arm/boot/zImage -nographic -m 2048 \
  --append "console=ttyAMA0 root=/dev/ram0" -machine virt \
  -initrd /android0/continuous-integration/images/arm/rootfs.cpio \
  -s -S

attach GDB with a script of @ihalip 's commands

# fixup.gdb
target remote :1234
hbreak fixup_exception
continue
n
p fixup
x/i fixup->insn
x/3i fixup->fixup
$ /android0/aosp/prebuilts/gdb/linux-x86/bin/gdb vmlinux -x fixup.gdb
...
Reading symbols from vmlinux...
0x40000000 in ?? ()
Hardware assisted breakpoint 1 at 0xc0311450: file arch/arm/mm/extable.c, line 12.

Breakpoint 1, fixup_exception (regs=0xee0bbe50) at arch/arm/mm/extable.c:12
12              fixup = search_exception_tables(instruction_pointer(regs));
13              if (fixup) {
$1 = (const struct exception_table_entry *) 0xc112d288
   0xc0397572 <cmpxchg_futex_value_locked+78>:  ldrex   r7, [r1]
   0xc039a928:  mov     r4, r0
   0xc039a92a:  b.w     0xc0397588 <cmpxchg_futex_value_locked+100>
   0xc039a92e:  nop

Then compare this to the disassembly. Looks like .text.fixup get's placed near the end of .text in the final vmlinux host executable. The interesting addresses from the above output are:

  1. 0xc0397572 (our fixup->insn)
  2. 0xc039a928 (our fixup->fixup)
  3. 0xc0397588 (our 3b from the fixup)
$ arm-linux-gnueabihf-objdump -d vmlinux
...
c0397524 <cmpxchg_futex_value_locked>:
...
c0397572:       e851 7f00       ldrex   r7, [r1]
...
c0397588:       4620            mov     r0, r4
...
c039a828 <__se_sys_futex_time32>:
...
c039a928:       4604            mov     r4, r0
c039a92a:       f7fc be2d       b.w     c0397588 <cmpxchg_futex_value_locked+0x64>
c039a92e:       bf00            nop
c039a930:       4610            mov     r0, r2
c039a932:       f7ff bcb3       b.w     c039a29c <do_futex+0x530>
c039a936:       bf00            nop
c039a938:       4610            mov     r0, r2
c039a93a:       f7ff bd71       b.w     c039a420 <do_futex+0x6b4>
c039a93e:       bf00            nop
c039a940:       4610            mov     r0, r2
c039a942:       f7ff bd91       b.w     c039a468 <do_futex+0x6fc>
c039a946:       bf00            nop
c039a948:       4610            mov     r0, r2
c039a94a:       f7ff bdb1       b.w     c039a4b0 <do_futex+0x744>
c039a94e:       bf00            nop
c039a950:       4610            mov     r0, r2
c039a952:       f7ff bdd0       b.w     c039a4f6 <do_futex+0x78a>
c039a956:       bf00            nop

Oh look, our 6 "fixups" are in some random label because include/asm-generic/vmlinux.lds.h munges .text.fixup into .text. Either way, note in the 6 fixups the instruction at c039a92a is a branch to c0397588 which is our 3b. Next, let's look at linking with LLD.

@nickdesaulniers
Copy link
Member Author

LLD

rebuild:

$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make -j71 clean
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LD=ld.lld -j71

run in qemu (same as above), run in gdb (with same fixup.gdb script) to get:

Reading symbols from vmlinux...
0x40000000 in ?? ()
Hardware assisted breakpoint 1 at 0xc0311470: file arch/arm/mm/extable.c, line 12.

Breakpoint 1, fixup_exception (regs=0xee0bbe50) at arch/arm/mm/extable.c:12
12              fixup = search_exception_tables(instruction_pointer(regs));
13              if (fixup) {
$1 = (const struct exception_table_entry *) 0xc1142288
   0xc0397572 <cmpxchg_futex_value_locked+78>:  ldrex   r7, [r1]
   0xc039a928:  mov     r4, r0
   0xc039a92a:  b.w     0xc0b849d8 <rest_init+20>
   0xc039a92e:  nop

eek. now we have:

  1. 0xc0397572 (our fixup->insn) (literally same as bfd)
  2. 0xc039a928 (our fixup->fixup) (literally the same as bfd)
  3. 0xc0b849d8 (our 3b from the fixup) (wtf)
    and corresponding disassembly:
$ arm-linux-gnueabihf-objdump -d vmlinux
...
c0397524 <cmpxchg_futex_value_locked>:
...
c0397572:       e851 7f00       ldrex   r7, [r1]
...
c039a828 <__se_sys_futex_time32>:
...
c039a928:       4604            mov     r4, r0
c039a92a:       f3ea b055       b.w     c0b849d8 <rest_init+0x14>
c039a92e:       bf00            nop
c039a930:       4610            mov     r0, r2
c039a932:       f3ec b6e1       b.w     c0b876f8 <__ww_mutex_lock.constprop.6+0x1d4>
c039a936:       bf00            nop
c039a938:       4610            mov     r0, r2
c039a93a:       f3ec b7a5       b.w     c0b87888 <__ww_mutex_lock.constprop.6+0x364>
c039a93e:       bf00            nop
c039a940:       4610            mov     r0, r2
c039a942:       f3ec b7cb       b.w     c0b878dc <__ww_mutex_lock.constprop.6+0x3b8>
c039a946:       bf00            nop
c039a948:       4610            mov     r0, r2
c039a94a:       f3ec b7f1       b.w     c0b87930 <__ww_mutex_lock.constprop.6+0x40c>
c039a94e:       bf00            nop
c039a950:       4610            mov     r0, r2
c039a952:       f3ed b016       b.w     c0b87982 <__ww_mutex_lock.constprop.6+0x45e>
c039a956:       bf00            nop
...
c0b849c4 <rest_init>:
...
c0b849d8:       f7af d8bc       bl      c0333b54 <kernel_thread>
...

So again our fixup object is found and has valid pointers, but the 3b label corresponding to .text.fixup is just completely messed up.

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Jan 23, 2020

If there is a concrete use case from an important program then I think I've got a good argument that the patch is necessary.

Specifically, it seems that symbols with STT_NOTYPE are being produced via local labels in inline assembly in the Linux kernel's exception table fixup return destinations, and LLD is currently not calculating the return destination correctly, which leads to aberrant control flow during early boot.

Just to try to reproduce the failure more, I've compiled kernel/futex.o, but instead of -c used -S to get the generated assembly. Then I've hacked it down to just cmpxchg_futex_value_locked. It looks like lld is generating a label called __Thumbv7ABSLongThunk_, which doesn't look like it generates the correct assembly to me.

futex.s.txt

$ arm-linux-gnueabihf-gcc futex.s -o futex.o -Wa,-mimplicit-it=always -c
$ llvm-readelf -S futex.o
...
  [ 8] .text.fixup       PROGBITS        00000000 0000e0 000008 00  AX  0   0  4
...
# ok, so .text.fixup is section id/number 8
$ llvm-readelf -s futex.o
...
    12: 00000000     0 SECTION LOCAL  DEFAULT     8 .text.fixup
    13: 00000000     0 NOTYPE  LOCAL  DEFAULT     8 $t
# and symbols for the section and $t were the only symbols in section id/number 8
$ arm-linux-gnueabihf-objdump -d futex.bfd.o
$ ld.lld futex.o -o futex.lld.o --warn-unresolved-symbols

comparing the disassemblies of futex.bfd.o to futex.lld.o:
futex.bfd.o:

...
00010074 <cmpxchg_futex_value_locked>:
...
   100d8:       4620            mov     r0, r4
...
   10108:       4604            mov     r4, r0
   1010a:       f7ff bfe5       b.w     100d8 <cmpxchg_futex_value_locked+0x64>
   1010e:       bf00            nop

futex.lld.o:

00011128 <cmpxchg_futex_value_locked>:
...
   1118c:       4620            mov     r0, r4
...
   111bc:       4604            mov     r4, r0
   111be:       f000 b833       b.w     11228 <__Thumbv7ABSLongThunk_+0x64>
   111c2:       bf00            nop

000111c4 <__Thumbv7ABSLongThunk_>:
   111c4:       f241 1c28       movw    ip, #4392       ; 0x1128
   111c8:       f2c0 0c01       movt    ip, #1
   111cc:       4760            bx      ip

IIUC, the lld produced object code jumps to a thunk (__Thumbv7ABSLongThunk_), sets %ip to 0x11128, then jumps to it. But 0x11128 is not where we should have gone, 0x1118c is.

** EDIT **

Err, the lld produced object jumps past the end of the thunk. Not sure how that works, as I cant see anything in the disassembly at that address.

@MaskRay
Copy link
Member

MaskRay commented Jan 23, 2020

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gold/arm.cc#l11897 gold can make decisions based on st_type. I don't know ARM enough to judge whether that is a reasonable behavior...

Threading symbol through sounds reasonable. I created https://reviews.llvm.org/D73250 and https://reviews.llvm.org/D73254

@smithp35
Copy link

Thanks for the investigation, and thanks for the patches above. I'll do my best to look at those today, at a glance they look like they are enough to make the necessary changes to do interworking in a strict ABI way. Will get a patch as soon as I can (sorry running a bit behind at the moment).

@smithp35
Copy link

I've submitted https://reviews.llvm.org/D73474 which is independent of https://reviews.llvm.org/D73250 https://reviews.llvm.org/D73254 . This is likely sufficient to fix the B.w problem described above. Once D73254 lands then I'll work on a patch to fix BL and BLX.

@nickdesaulniers
Copy link
Member Author

That patch allows me to boot successfully!

@nickdesaulniers nickdesaulniers added [BUG] llvm A bug that should be fixed in upstream LLVM [PATCH] Submitted A patch has been submitted for review and removed [BUG] Untriaged Something isn't working labels Jan 27, 2020
smithp35 added a commit to llvm/llvm-project that referenced this issue Jan 28, 2020
…mbols

ELF for the ARM architecture requires linkers to provide
interworking for symbols that are of type STT_FUNC. Interworking for
other symbols must be encoded directly in the object file. LLD was always
providing interworking, regardless of the symbol type, this breaks some
programs that have branches from Thumb state targeting STT_NOTYPE symbols
that have bit 0 clear, but they are in fact internal labels in a Thumb
function. LLD treats these symbols as ARM and inserts a transition to Arm.

This fixes the problem for in range branches, R_ARM_JUMP24,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19. This is expected to be the vast
majority of problem cases as branching to an internal label close to the
function.

There is at least one follow up patch required.
- R_ARM_CALL and R_ARM_THM_CALL may do interworking via BL/BLX
  substitution.

In theory range-extension thunks can be altered to not change state when
the symbol type is not STT_FUNC. I will need to check with ld.bfd to see if
this is the case in practice.

Fixes (part of) ClangBuiltLinux/linux#773

Differential Revision: https://reviews.llvm.org/D73474
@nickdesaulniers
Copy link
Member Author

Fixed in llvm/llvm-project@4f38ab2

Closing for now as we can now boot thumb2. It sounds like there may be more work to do for arm callers calling into thumb.

@nickdesaulniers nickdesaulniers added [FIXED][LLVM] 11 This bug was fixed in LLVM 11.0 and removed [PATCH] Submitted A patch has been submitted for review labels Jan 28, 2020
zmodem pushed a commit to llvm/llvm-project that referenced this issue Jan 29, 2020
…mbols

ELF for the ARM architecture requires linkers to provide
interworking for symbols that are of type STT_FUNC. Interworking for
other symbols must be encoded directly in the object file. LLD was always
providing interworking, regardless of the symbol type, this breaks some
programs that have branches from Thumb state targeting STT_NOTYPE symbols
that have bit 0 clear, but they are in fact internal labels in a Thumb
function. LLD treats these symbols as ARM and inserts a transition to Arm.

This fixes the problem for in range branches, R_ARM_JUMP24,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19. This is expected to be the vast
majority of problem cases as branching to an internal label close to the
function.

There is at least one follow up patch required.
- R_ARM_CALL and R_ARM_THM_CALL may do interworking via BL/BLX
  substitution.

In theory range-extension thunks can be altered to not change state when
the symbol type is not STT_FUNC. I will need to check with ld.bfd to see if
this is the case in practice.

Fixes (part of) ClangBuiltLinux/linux#773

Differential Revision: https://reviews.llvm.org/D73474

(cherry picked from commit 4f38ab2)
yeewang pushed a commit to yeewang/llvm-project that referenced this issue Feb 4, 2020
…TT_FUNC symbols

ELF for the ARM architecture requires linkers to provide
interworking for symbols that are of type STT_FUNC. Interworking for
other symbols must be encoded directly in the object file. LLD was always
providing interworking, regardless of the symbol type, this breaks some
programs that have branches from Thumb state targeting STT_NOTYPE symbols
that have bit 0 clear, but they are in fact internal labels in a Thumb
function. LLD treats these symbols as ARM and inserts a transition to Arm.

This fixes the problem for in range branches, R_ARM_JUMP24,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19. This is expected to be the vast
majority of problem cases as branching to an internal label close to the
function.

There is at least one follow up patch required.
- R_ARM_CALL and R_ARM_THM_CALL may do interworking via BL/BLX
  substitution.

In theory range-extension thunks can be altered to not change state when
the symbol type is not STT_FUNC. I will need to check with ld.bfd to see if
this is the case in practice.

Fixes (part of) ClangBuiltLinux/linux#773

Differential Revision: https://reviews.llvm.org/D73474

(cherry picked from commit 4f38ab2)

Bug: 145916209
Change-Id: I6c9bf5576fd45109f56c918b303c470db7474b95
Signed-off-by: Nick Desaulniers <[email protected]>
mydongistiny pushed a commit to benzoClang/llvm-project that referenced this issue Feb 18, 2020
ELF for the ARM architecture requires linkers to provide
interworking for symbols that are of type STT_FUNC. Interworking for
other symbols must be encoded directly in the object file. LLD was always
providing interworking, regardless of the symbol type, this breaks some
programs that have branches from Thumb state targeting STT_NOTYPE symbols
that have bit 0 clear, but they are in fact internal labels in a Thumb
function. LLD treats these symbols as ARM and inserts a transition to Arm.

This fixes the problem for in range branches, R_ARM_JUMP24,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19. This is expected to be the vast
majority of problem cases as branching to an internal label close to the
function.

There is at least one follow up patch required.
- R_ARM_CALL and R_ARM_THM_CALL may do interworking via BL/BLX
  substitution.

In theory range-extension thunks can be altered to not change state when
the symbol type is not STT_FUNC. I will need to check with ld.bfd to see if
this is the case in practice.

Fixes (part of) ClangBuiltLinux/linux#773

Differential Revision: https://reviews.llvm.org/D73474

Change-Id: I4282ac7f4a347926f95636d43fc404bf15589995
Signed-off-by: mydongistiny <[email protected]>
arichardson pushed a commit to arichardson/llvm-project that referenced this issue Apr 1, 2020
…mbols

ELF for the ARM architecture requires linkers to provide
interworking for symbols that are of type STT_FUNC. Interworking for
other symbols must be encoded directly in the object file. LLD was always
providing interworking, regardless of the symbol type, this breaks some
programs that have branches from Thumb state targeting STT_NOTYPE symbols
that have bit 0 clear, but they are in fact internal labels in a Thumb
function. LLD treats these symbols as ARM and inserts a transition to Arm.

This fixes the problem for in range branches, R_ARM_JUMP24,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19. This is expected to be the vast
majority of problem cases as branching to an internal label close to the
function.

There is at least one follow up patch required.
- R_ARM_CALL and R_ARM_THM_CALL may do interworking via BL/BLX
  substitution.

In theory range-extension thunks can be altered to not change state when
the symbol type is not STT_FUNC. I will need to check with ld.bfd to see if
this is the case in practice.

Fixes (part of) ClangBuiltLinux/linux#773

Differential Revision: https://reviews.llvm.org/D73474
ajohnson-uoregon pushed a commit to ajohnson-uoregon/clang-rewrite-only that referenced this issue Jul 17, 2022
…mbols

ELF for the ARM architecture requires linkers to provide
interworking for symbols that are of type STT_FUNC. Interworking for
other symbols must be encoded directly in the object file. LLD was always
providing interworking, regardless of the symbol type, this breaks some
programs that have branches from Thumb state targeting STT_NOTYPE symbols
that have bit 0 clear, but they are in fact internal labels in a Thumb
function. LLD treats these symbols as ARM and inserts a transition to Arm.

This fixes the problem for in range branches, R_ARM_JUMP24,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19. This is expected to be the vast
majority of problem cases as branching to an internal label close to the
function.

There is at least one follow up patch required.
- R_ARM_CALL and R_ARM_THM_CALL may do interworking via BL/BLX
  substitution.

In theory range-extension thunks can be altered to not change state when
the symbol type is not STT_FUNC. I will need to check with ld.bfd to see if
this is the case in practice.

Fixes (part of) ClangBuiltLinux/linux#773

Differential Revision: https://reviews.llvm.org/D73474

(cherry picked from commit 992b5ff)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] arm32 This bug impacts ARCH=arm [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LLVM] 11 This bug was fixed in LLVM 11.0 [TOOL] lld The issue is relevant to LLD linker
Projects
None yet
Development

No branches or pull requests

5 participants