Open Bug 1873619 Opened 5 months ago Updated 27 days ago

Crash in [@ boot.oat@0x3844a0]

Categories

(Fenix :: General, defect, P5)

Unspecified
Android

Tracking

(firefox124 affected, firefox125 affected, firefox126 affected)

Tracking Status
firefox124 --- affected
firefox125 --- affected
firefox126 --- affected

People

(Reporter: towhite, Unassigned)

Details

(Keywords: crash)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/58838faf-4b03-4655-8c0c-6181c0240108

Reason: SIGSEGV / SEGV_MAPERR

Top 3 frames of crashing thread:

0  libutils.so  libutils.so@0xede0  
1  boot.oat  boot.oat@0x3844a0  
2  ?  @0x00006e90148652e4  

The severity field is not set for this bug.
:amejia, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(amejiamarmol)

I looked at a few crashes. The crashes were happening on the thread named RenderThread. I'm not sure if that means this is the graphics driver or what. All of the crashes look to be on Xiaomi devices.

The crashes I looked at had stacks that were mostly junk, so maybe something is going wrong with stack walking. The crashes I looked at had mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) and basically nothing else on the Gecko thread. I do see some actual frames on other threads, but still mixed in with a bunch of junk.

There's something very funny going on here, I'll try to disassemble the crashing area tomorrow to try and figure out what is happening at the crash point and what's really on the stack. The fact that they all come from Xiami devices and they all seem to be on Qualcom 8xx variants suggests that there might be something wrong with this particular combination. I'd be inclined to think there's a problem at the kernel level where a thread state gets messed up (we've seen those in the past) but there's a bunch of different kernel versions included so that doesn't sound like it.

It's worth noting that while most crashes appear to be NULL pointer accesses but those that aren't exhibit a peculiar aspect. See this crash, or this one or even this one: the three of them are crashing on an address that isn't near NULL but if you check among the registers you'll notice that the crashing address is always in x9, but the first 16 bits of the address are set to 0x43. That part of the address should be zero as per AArch64 address space conventions. The presence of non-zero bits smells like pointer tagging or something along the lines.

Flags: needinfo?(gsvelto)

Wait a sec, I know what's going on with those upper bits. More recent versions of the Linux kernel can be made to expose those bits to signal handlers. The reason why most crashes appear NULL-ish is because they're on kernels where those bits get cleared before the address is delivered to our signal handler. Those bits are used by Memory Tagging Extensions which Android does use for security purposes. Chances are something is amiss with Xiaomi's Android builds but it would be interesting to understand if we're triggering from our side and how.

Here's something interesting. I've tried looking for crashes similar to the ones we see here but from another vendor... and I found them. Here's an example. This is the same exact crash but from a Sony device. The signature is different because the Android build is different, but this is clearly the same crash, so this isn't a Xiaomi-specific issue. I'll find some time tomorrow to dig out all the related signatures.

In the meantime this really looks like we're tripping some Android security facility. Assuming that all the pointers involved have been tagged with MTE then we can infer that the crash is caused by a piece of code accessing an object that it's not supposed to access. That is the tag in the pointer belongs to another piece of code (a different Android library maybe?) and it's the pointer tag check failing that's causing the crash, not the access per se.

Thanks Gabriele for investigating, are there any next steps that suggest?
Thanks in advance!

Flags: needinfo?(amejiamarmol)

For starters here's a few related signatures, my gut feeling is that there's more with different stack traces for different Android versions. The next step is to figure out what's going on, but it's very hard to tell from the crash reports without user comments to help us.

Crash Signature: [@ boot.oat@0x3844a0] → [@ boot.oat@0x36dbe0] [@ boot.oat@0x36ebe0] [@ boot.oat@0x36fbf0] [@ boot.oat@0x370bf0] [@ boot.oat@0x371be0] [@ boot.oat@0x372bf0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3844a0]
Flags: needinfo?(gsvelto)

Keeping track of this thing is tricky due to the changing signatures. I still didn't have time to dig further into the minidumps unfortunately.

Crash Signature: [@ boot.oat@0x36dbe0] [@ boot.oat@0x36ebe0] [@ boot.oat@0x36fbf0] [@ boot.oat@0x370bf0] [@ boot.oat@0x371be0] [@ boot.oat@0x372bf0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3844a0] → [@ boot.oat@0x10bbe0] [@ boot.oat@0x31c890] [@ boot.oat@0x32c050] [@ boot.oat@0x32f050] [@ boot.oat@0x365be0] [@ boot.oat@0x36bbe0] [@ boot.oat@0x36dbe0] [@ boot.oat@0x36ebe0] [@ boot.oat@0x36fbe0] [@ boot.oat@0x36fbf0] [@ boot.oat@0x370bf0] [@…

It's worth noting that the vast majority of the crashes here are on Android API 31 (so version 12) so chances are it's a bug with that specific version.

Severity: -- → S2
Crash Signature: boot.oat@0x3854a0] → boot.oat@0x3854a0] [@ libc.so@0x6b410 | boot.oat@0x1fac38]
Priority: -- → P5

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on release (startup)

For more information, please visit BugBot documentation.

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit BugBot documentation.

Severity: S2 → S3

This is a signature change, adjusting the signatures.

Crash Signature: boot.oat@0x3854a0] [@ libc.so@0x6b410 | boot.oat@0x1fac38] → boot.oat@0x3854a0] [@ boot.oat]
You need to log in before you can comment on or make changes to this bug.