Open Bug 1873619 Opened 5 months ago Updated 27 days ago

Crash in [@ boot.oat@0x3844a0]

Tracking

(firefox124 affected, firefox125 affected, firefox126 affected)

Status:

NEW

Tracking Flags:

Tracking

Status

firefox124

---

affected

firefox125

---

affected

firefox126

---

affected

People

(Reporter: towhite, Unassigned)

Details

(Keywords: crash)

Crash Data

twhite

Reporter

Description

•

5 months ago

Crash report: https://crash-stats.mozilla.org/report/index/58838faf-4b03-4655-8c0c-6181c0240108

Reason: SIGSEGV / SEGV_MAPERR

Top 3 frames of crashing thread:

0  libutils.so  libutils.so@0xede0  
1  boot.oat  boot.oat@0x3844a0  
2  ?  @0x00006e90148652e4

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

5 months ago

The severity field is not set for this bug.
:amejia, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(amejiamarmol)

Andrew McCreight [:mccr8]

Comment 2

•

4 months ago

I looked at a few crashes. The crashes were happening on the thread named RenderThread. I'm not sure if that means this is the graphics driver or what. All of the crashes look to be on Xiaomi devices.

The crashes I looked at had stacks that were mostly junk, so maybe something is going wrong with stack walking. The crashes I looked at had mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) and basically nothing else on the Gecko thread. I do see some actual frames on other threads, but still mixed in with a bunch of junk.

Gabriele Svelto [:gsvelto]

Comment 3

•

4 months ago

There's something very funny going on here, I'll try to disassemble the crashing area tomorrow to try and figure out what is happening at the crash point and what's really on the stack. The fact that they all come from Xiami devices and they all seem to be on Qualcom 8xx variants suggests that there might be something wrong with this particular combination. I'd be inclined to think there's a problem at the kernel level where a thread state gets messed up (we've seen those in the past) but there's a bunch of different kernel versions included so that doesn't sound like it.

It's worth noting that while most crashes appear to be NULL pointer accesses but those that aren't exhibit a peculiar aspect. See this crash, or this one or even this one: the three of them are crashing on an address that isn't near NULL but if you check among the registers you'll notice that the crashing address is always in x9, but the first 16 bits of the address are set to 0x43. That part of the address should be zero as per AArch64 address space conventions. The presence of non-zero bits smells like pointer tagging or something along the lines.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 4

•

4 months ago

Wait a sec, I know what's going on with those upper bits. More recent versions of the Linux kernel can be made to expose those bits to signal handlers. The reason why most crashes appear NULL-ish is because they're on kernels where those bits get cleared before the address is delivered to our signal handler. Those bits are used by Memory Tagging Extensions which Android does use for security purposes. Chances are something is amiss with Xiaomi's Android builds but it would be interesting to understand if we're triggering from our side and how.

Gabriele Svelto [:gsvelto]

Comment 5

•

4 months ago

Here's something interesting. I've tried looking for crashes similar to the ones we see here but from another vendor... and I found them. Here's an example. This is the same exact crash but from a Sony device. The signature is different because the Android build is different, but this is clearly the same crash, so this isn't a Xiaomi-specific issue. I'll find some time tomorrow to dig out all the related signatures.

In the meantime this really looks like we're tripping some Android security facility. Assuming that all the pointers involved have been tagged with MTE then we can infer that the crash is caused by a piece of code accessing an object that it's not supposed to access. That is the tag in the pointer belongs to another piece of code (a different Android library maybe?) and it's the pointer tag check failing that's causing the crash, not the access per se.

Arturo Mejia [:amejia]

Comment 6

•

4 months ago

Thanks Gabriele for investigating, are there any next steps that suggest?
Thanks in advance!

Flags: needinfo?(amejiamarmol)

Gabriele Svelto [:gsvelto]

Comment 7

•

4 months ago

For starters here's a few related signatures, my gut feeling is that there's more with different stack traces for different Android versions. The next step is to figure out what's going on, but it's very hard to tell from the crash reports without user comments to help us.

Crash Signature: [@ boot.oat@0x3844a0] → [@ boot.oat@0x36dbe0] [@ boot.oat@0x36ebe0] [@ boot.oat@0x36fbf0] [@ boot.oat@0x370bf0] [@ boot.oat@0x371be0] [@ boot.oat@0x372bf0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3844a0]

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 8

•

3 months ago

Keeping track of this thing is tricky due to the changing signatures. I still didn't have time to dig further into the minidumps unfortunately.

Crash Signature: [@ boot.oat@0x36dbe0] [@ boot.oat@0x36ebe0] [@ boot.oat@0x36fbf0] [@ boot.oat@0x370bf0] [@ boot.oat@0x371be0] [@ boot.oat@0x372bf0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3834a0] [@ boot.oat@0x3844a0] → [@ boot.oat@0x10bbe0] [@ boot.oat@0x31c890] [@ boot.oat@0x32c050] [@ boot.oat@0x32f050] [@ boot.oat@0x365be0] [@ boot.oat@0x36bbe0] [@ boot.oat@0x36dbe0] [@ boot.oat@0x36ebe0] [@ boot.oat@0x36fbe0] [@ boot.oat@0x36fbf0] [@ boot.oat@0x370bf0] [@…

Gabriele Svelto [:gsvelto]

Comment 9

•

3 months ago

It's worth noting that the vast majority of the crashes here are on Android API 31 (so version 12) so chances are it's a bug with that specific version.

Chris Peterson [:cpeterson]

Updated

•

2 months ago

Severity: -- → S2

Crash Signature: boot.oat@0x3854a0] → boot.oat@0x3854a0] [@ libc.so@0x6b410 | boot.oat@0x1fac38]

status-firefox124: --- → affected

status-firefox125: --- → affected

status-firefox126: --- → affected

Priority: -- → P5

BugBot [:suhaib / :marco/ :calixte]

Comment 10

•

2 months ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 AArch64 and ARM crashes on release (startup)

For more information, please visit BugBot documentation.

Keywords: topcrash, topcrash-startup

BugBot [:suhaib / :marco/ :calixte]

Comment 11

•

1 month ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash, topcrash-startup

BugBot [:suhaib / :marco/ :calixte]

Comment 12

•

27 days ago

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit BugBot documentation.

Severity: S2 → S3

Gabriele Svelto [:gsvelto]

Comment 13

•

27 days ago

This is a signature change, adjusting the signatures.

Crash Signature: boot.oat@0x3854a0] [@ libc.so@0x6b410 | boot.oat@0x1fac38] → boot.oat@0x3854a0] [@ boot.oat]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Crash in [@ boot.oat@0x3844a0]

Categories

(Fenix :: General, defect, P5)

Tracking

(firefox124 affected, firefox125 affected, firefox126 affected)

People

(Reporter: towhite, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Comment 13