Closed Bug 1824544 Opened 2 years ago Closed 1 year ago

Thunderbird built with rustc-1.68.0 and LLVM-16 segfaults on startup

Categories

(Thunderbird :: Untriaged, defect)

Thunderbird 102
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: renodr, Unassigned)

References

Details

(Whiteboard: [closeme 2023-10-01])

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0

Steps to reproduce:

Build Thunderbird with rustc-1.68.0 and LLVM-16. The configuration used for Linux From Scratch can be found here: https://linuxfromscratch.org/blfs/view/systemd/xsoft/thunderbird.html

Note that the build requires a patch for the rust-bindgen crate when using LLVM-16 as well, which can be grabbed from: https://linuxfromscratch.org/patches/blfs/svn/firefox-102.9.0-upstream_fixes-1.patch (it applies cleanly to the tree and fixes the build error there) - the more "proper" way to handle this would be to just update the rust-bindgen crate, though that isn't trivial.

Our ticket for this issue at BLFS can be found at https://wiki.linuxfromscratch.org/blfs/ticket/17794, where I've been documenting my various attempts to debug this issue.

Actual results:

When building Thunderbird-102.9.0 with rustc-1.68.0, the application immediately crashes upon startup. After checking 'ldd' on the rustc command, I've verified that it's using libLLVM-16.so. If I use rustc-1.67.1 (which uses LLVM-15 on a Linux From Scratch system), it builds and functions properly.

In terms of console output, I get:

[ImapModuleLoader] Using nsImapService.cpp
[NntpModuleLoader] Using NntpService.jsm
[Pop3ModuleLoader] Using Pop3Service.jsm
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
[calBackendLoader] Using Thunderbird's ical.js backend
Segmentation fault (core dumped)

And for a backtrace, I get:

(gdb) bt full
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=11, no_tid=no_tid@entry=0) at pthread_kill.c:44
tid = <optimized out>
ret = 0
pd = <optimized out>
old_mask = {__val = {0}}
ret = <optimized out>
#1 0x00007f7e8687c0ff in __pthread_kill_internal (signo=11, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007f7e8682e462 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
ret = <optimized out>
#3 0x00007f7e80b06b4c in () at /usr/lib/thunderbird/libxul.so
#4 0x00007f7e815ca9a7 in () at /usr/lib/thunderbird/libxul.so
#5 0x00007f7e8682e500 in <signal handler called> () at /usr/lib/libc.so.6
#6 0x00007f7e81d17fb6 in () at /usr/lib/thunderbird/libxul.so
#7 0x00007f7e81863586 in () at /usr/lib/thunderbird/libxul.so
#8 0x00007f7e81862e4c in () at /usr/lib/thunderbird/libxul.so
#9 0x0000000000000000 in ()

In order to get more information, I attempted a build with debugging symbols present. That crashed with the following output once it tried to compile gkrust:

27:10.54 error: Cannot represent a difference across sections
27:32.84 error: could not compile gkrust due to previous error

I then attempted a build with stripping turned off so that I could hopefully get more information, and wasn't able to get anything out of there either.

My colleagues and I did a check to see if there was any differences between the Firefox-102.9.0esr Mozilla code vs. what Thunderbird is using and we weren't able to find anything, so we think that something is going on with the Thunderbird-specific code in this case. We've been able to do some rudimentary debugging, but haven't been able to go as in-depth as desired because we cannot get a debug build to work without the build failing.

One of my colleagues, Xi Ruoyao, was able to get that it's something with style::custom_properties::CustomPropertiesBuilder::build::he418231f9106fe2e (or _ZN5style17custom_properties23CustomPropertiesBuilder5build17h), and the instruction sequence at crash is:

0x00007ffff0445706: mov (%r14),%rax 0x00007ffff0445709: cmp $0xffffffffffffffff,%rax 0x00007ffff044570d: je 0x7ffff0445912 0x00007ffff0445713: lock incq (%r14) 0x00007ffff0445717: jg 0x7ffff0445912

... and r14 contains "0xe5e5e5e5e5e5e5e5", which Mozilla's documentation says is a Use-After-Free.

Note that I've tried a fresh profile as well as part of normal troubleshooting and had no difference.

Expected results:

Thunderbird should build and execute correctly.

I'm another developer of the linuxfromscratch books, and I'd say it is not a problem with build instructions, since the same build instructions work with llvm-15+rust-1.67.1, and even llvm-15 +rust-1.68.1. The failure comes from something new in llvm-16. Whether it is a bug in llvm-16 or a bug in thunderbird revealed by the change in llvm (or even a bug in rust revealed by a change in llvm), I cannot tell, but there is something going on.

Interestingly, if ac_add_options --disable-release is added to the mozconfig (without changing anything else to what is on https://linuxfromscratch.org/blfs/view/systemd/xsoft/thunderbird.html), then the crash is gone.

This is now tracked under Bug 1831242 as well.

See Also: → 1831242

Does this also reproduce when using version 115 started in Help > Troubleshoot Mode?
If it does, and you have not already done so, please list complete steps to reproduce.

Whiteboard: [closeme 2023-10-01]

It does not reproduce with 115. You can close it as fixed.

Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.