Crash in [@ boot.oat@0x3844a0]
Categories
(Firefox for Android :: General, defect, P5)
Tracking
()
People
(Reporter: towhite, Unassigned)
Details
(Keywords: crash)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/58838faf-4b03-4655-8c0c-6181c0240108
Reason: SIGSEGV / SEGV_MAPERR
Top 3 frames of crashing thread:
0 libutils.so libutils.so@0xede0
1 boot.oat boot.oat@0x3844a0
2 ? @0x00006e90148652e4
Comment 1•2 years ago
|
||
The severity field is not set for this bug.
:amejia, could you have a look please?
For more information, please visit BugBot documentation.
Comment 2•2 years ago
|
||
I looked at a few crashes. The crashes were happening on the thread named RenderThread. I'm not sure if that means this is the graphics driver or what. All of the crashes look to be on Xiaomi devices.
The crashes I looked at had stacks that were mostly junk, so maybe something is going wrong with stack walking. The crashes I looked at had mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) and basically nothing else on the Gecko thread. I do see some actual frames on other threads, but still mixed in with a bunch of junk.
Comment 3•2 years ago
|
||
There's something very funny going on here, I'll try to disassemble the crashing area tomorrow to try and figure out what is happening at the crash point and what's really on the stack. The fact that they all come from Xiami devices and they all seem to be on Qualcom 8xx variants suggests that there might be something wrong with this particular combination. I'd be inclined to think there's a problem at the kernel level where a thread state gets messed up (we've seen those in the past) but there's a bunch of different kernel versions included so that doesn't sound like it.
It's worth noting that while most crashes appear to be NULL pointer accesses but those that aren't exhibit a peculiar aspect. See this crash, or this one or even this one: the three of them are crashing on an address that isn't near NULL but if you check among the registers you'll notice that the crashing address is always in x9
, but the first 16 bits of the address are set to 0x43
. That part of the address should be zero as per AArch64 address space conventions. The presence of non-zero bits smells like pointer tagging or something along the lines.
Comment 4•2 years ago
|
||
Wait a sec, I know what's going on with those upper bits. More recent versions of the Linux kernel can be made to expose those bits to signal handlers. The reason why most crashes appear NULL-ish is because they're on kernels where those bits get cleared before the address is delivered to our signal handler. Those bits are used by Memory Tagging Extensions which Android does use for security purposes. Chances are something is amiss with Xiaomi's Android builds but it would be interesting to understand if we're triggering from our side and how.
Comment 5•2 years ago
|
||
Here's something interesting. I've tried looking for crashes similar to the ones we see here but from another vendor... and I found them. Here's an example. This is the same exact crash but from a Sony device. The signature is different because the Android build is different, but this is clearly the same crash, so this isn't a Xiaomi-specific issue. I'll find some time tomorrow to dig out all the related signatures.
In the meantime this really looks like we're tripping some Android security facility. Assuming that all the pointers involved have been tagged with MTE then we can infer that the crash is caused by a piece of code accessing an object that it's not supposed to access. That is the tag in the pointer belongs to another piece of code (a different Android library maybe?) and it's the pointer tag check failing that's causing the crash, not the access per se.
Comment 6•2 years ago
|
||
Thanks Gabriele for investigating, are there any next steps that suggest?
Thanks in advance!
Comment 7•2 years ago
|
||
For starters here's a few related signatures, my gut feeling is that there's more with different stack traces for different Android versions. The next step is to figure out what's going on, but it's very hard to tell from the crash reports without user comments to help us.
Comment 8•2 years ago
|
||
Keeping track of this thing is tricky due to the changing signatures. I still didn't have time to dig further into the minidumps unfortunately.
Comment 9•2 years ago
|
||
It's worth noting that the vast majority of the crashes here are on Android API 31 (so version 12) so chances are it's a bug with that specific version.
Updated•1 years ago
|
Comment 10•1 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on release (startup)
For more information, please visit BugBot documentation.
Comment 11•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 12•1 year ago
|
||
Since the crash volume is low (less than 15 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit BugBot documentation.
Comment 13•1 year ago
|
||
This is a signature change, adjusting the signatures.
Comment 14•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'Fenix::Crash Reporting' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Updated•1 year ago
|
Comment 15•1 year ago
|
||
This is not a crash reporting bug, moving it back to General because we haven't quite figured out what causes it, though chances are that it's a bug in Android libraries and not in our code.
Comment 16•1 year ago
|
||
Would it be worth reporting in the Google issue tracker? Or do you want to wait till you have more certainty?
Comment 17•1 year ago
|
||
I think it would, though we have very little to go by besides that it happens in libutils.so
and always happens in our rendering thread (but it's unclear what code is calling it, as our stack walker can't find our code during unwinding). If we'd have some symbol information for libutils.so
it'd already be an improvement, as we'd know at least the name of the function where the crash is happening.
Updated•1 year ago
|
Description
•