Crash in [@ std::_Load_relaxed_8] or [@ std::_Load_relaxed_4] or [@ mozilla::detail::IntrinsicMemoryOps<T>::load ]
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
People
(Reporter: RyanVM, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, topcrash, topcrash-startup)
Crash Data
Appears to be a new crash signature in 106. For the two combined signatures, it's roughly a top 10 content process crash.
Crash report: https://crash-stats.mozilla.org/report/index/9baa6ad8-12f4-4476-8e7d-67e260221004
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll std::_Load_relaxed_8 /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xatomic.h:1863
0 xul.dll std::_Atomic_load_8 /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xatomic.h:1891
0 xul.dll std::atomic_load_explicit /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xxatomic:495
0 xul.dll std::_Atomic_ullong::load const /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xxatomic:629
0 xul.dll mozilla::detail::IntrinsicMemoryOps<unsigned long long, mozilla::Relaxed>::load mfbt/Atomics.h:195
0 xul.dll mozilla::detail::AtomicBaseIncDec<unsigned long long, mozilla::Relaxed>::operator unsigned long long const mfbt/Atomics.h:340
0 xul.dll js::gc::CellWithLengthAndFlags::headerFlagsField const js/src/gc/Cell.h:620
0 xul.dll JSString::flags const js/src/vm/StringType.h:201
0 xul.dll JSString::isRope const js/src/vm/StringType.h:474
0 xul.dll JSRope::flattenInternal js/src/vm/StringType.cpp:783
Comment 2•2 years ago
|
||
This is probably a signature change due to inlined frames now being added. Two common prior signatures for crashes where the proto signature contain flattenInternal are [@ js::GCMarker::traverse<T> ] (eg bp-73f481ba-d3ff-4713-b5c6-fcfab0221004 ) and [@ js::gc::detail::CellHasStoreBuffer ] (eg bp-209c9369-21c0-4f1c-a753-f13cb0221004 ).
Comment 3•2 years ago
|
||
Bug 1791509 is on file for adding at least some atomic std methods to the prefix list. I don't think that will make this change to one of the existing signatures, but ideally it would at least get us in a state where more obvious GC-related stuff is in the signature.
Comment 4•2 years ago
|
||
When I search for these signatures and facet on proto signature they all do just look like the existing soup of GC traverse and sweep crashes.
Comment 5•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #4)
When I search for these signatures and facet on proto signature they all do just look like the existing soup of GC traverse and sweep crashes.
Yes, I opened a bunch of them by hand and they all looked GC-related too (i.e. most likely machines with bad memory).
Comment 6•2 years ago
|
||
The bug is linked to topcrash signatures, which match the following criteria:
- Top 20 desktop browser crashes on beta (startup)
- Top 10 content process crashes on beta
For more information, please visit auto_nag documentation.
Comment 7•2 years ago
|
||
This is a very generic signature. Most of the stacks look like existing GC bad memory / heap corruption crashes though.
Comment 8•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 9•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 10•2 years ago
|
||
Looking at the crash addresses, I see different things, including ones which are looking like an integer field with bit flags sets.
Looking at the stack report I see the name CellWithLengthAndFlags
, which seems to be the kind of content that we are looking for, but I would not expect us to de-reference this kind of data.
Could this be a cast of the wrong offset or a miss-interpreted register?
Comment 11•2 years ago
|
||
What is the likelihood of having these (comment 10) kind of crash addresses?
Comment 12•2 years ago
|
||
It looks like general corruption: some kind of GC pointer field that has the wrong thing in it, perhaps because the structure that stores the pointer is itself not what the code thinks it is. So the crash is when dereferencing a non-Cell* as a Cell* (specifically trying to load the length and flags from something that isn't even a pointer). If the crash address appears to be an integer field with bit flags set, then I would guess it's because the owning struct is being misinterpreted, and we're looking at some expected-to-be-Cell-pointer field when it's actually a flags field or similar.
(Because you're right, no code should be dereferencing a CellWithLengthAndFlags
value, except maybe when following forwarding pointers but the stacks are not within a minor GC so that doesn't appear to be happening.)
I notice that the crash addresses are all aligned. Or at least, the ones that aren't 0xffffffffffffffff are aligned. Maybe the unaligned ones are getting reported as 0xffffffffffffffff? I would check another OS, except 100% of the crashes are on Windows.
If this is the case, then it should be more common to not make it as far as the _Load_relaxed_8 or whatever, and instead crash earlier. But that would show up as a different signature, and would be one of our other buckets that look like generalized corruption. This signature would be for the general corruption where we happen to interpret the corrupt value as something containing a Cell*
to be examined. (If instead the pointer to the owning struct wasn't a pointer, it would crash when dereferencing that pointer in just about any code, before it made it here.)
Feel free to point out where my analysis doesn't make sense!
Comment 13•2 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #11)
What is the likelihood of having these (comment 10) kind of crash addresses?
Agree with Steve, these looks like the normal sort of GC crash addresses.
Comment 14•2 years ago
|
||
Note that Socorro now ignores the functions starting with std::_Load
so the crashes under this signature should disappear and re-appear under new more useful signatures.
Comment 15•2 years ago
|
||
Since the crash volume is low (less than 5 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Comment 16•2 years ago
|
||
mozilla::detail::IntrinsicMemoryOps<T>::load
Seems to be the new name for these crashes. It got wrongly associated to Bug 1702010.
Updated•2 years ago
|
Comment 17•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 20 desktop browser crashes on beta (startup)
- Top 10 desktop browser crashes on nightly
- Top 10 content process crashes on beta
- Top 10 content process crashes on release
:sdetar, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 18•2 years ago
|
||
We likely want all mozilla::detail::IntrinsicMemoryOps*
prefixes to be ignored during crash signature generation, shall we drop them?
Comment 19•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #18)
We likely want all
mozilla::detail::IntrinsicMemoryOps*
prefixes to be ignored during crash signature generation, shall we drop them?
This sounds good to me, then make sure to reassign this particular bug to js::gc::CellWithLengthAndFlags::headerFlagsField
, or the next function which appears to be the new crash location given the stack trace from comment 0.
Updated•2 years ago
|
Comment 20•2 years ago
|
||
From my understanding, this signature covered multiple different bugs. As the signatures are being divided in their own individual bugs, I will mark it as WONTFIX. Feel free to undo this change.
Comment 21•2 years ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.
Description
•