Closed Bug 1793644 Opened 2 years ago Closed 2 years ago

Crash in [@ std::_Load_relaxed_8] or [@ std::_Load_relaxed_4] or [@ mozilla::detail::IntrinsicMemoryOps<T>::load ]

Categories

(Core :: JavaScript: GC, defect)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: RyanVM, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, topcrash, topcrash-startup)

Crash Data

Appears to be a new crash signature in 106. For the two combined signatures, it's roughly a top 10 content process crash.

Crash report: https://crash-stats.mozilla.org/report/index/9baa6ad8-12f4-4476-8e7d-67e260221004

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 xul.dll std::_Load_relaxed_8 /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xatomic.h:1863
0 xul.dll std::_Atomic_load_8 /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xatomic.h:1891
0 xul.dll std::atomic_load_explicit /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xxatomic:495
0 xul.dll std::_Atomic_ullong::load const /builds/worker/fetches/vs/vc/tools/msvc/14.16.27023/include/xxatomic:629
0 xul.dll mozilla::detail::IntrinsicMemoryOps<unsigned long long, mozilla::Relaxed>::load mfbt/Atomics.h:195
0 xul.dll mozilla::detail::AtomicBaseIncDec<unsigned long long, mozilla::Relaxed>::operator unsigned long long const mfbt/Atomics.h:340
0 xul.dll js::gc::CellWithLengthAndFlags::headerFlagsField const js/src/gc/Cell.h:620
0 xul.dll JSString::flags const js/src/vm/StringType.h:201
0 xul.dll JSString::isRope const js/src/vm/StringType.h:474
0 xul.dll JSRope::flattenInternal js/src/vm/StringType.cpp:783

:sfink, could you investigate this new crash?

Flags: needinfo?(sphink)

This is probably a signature change due to inlined frames now being added. Two common prior signatures for crashes where the proto signature contain flattenInternal are [@ js::GCMarker::traverse<T> ] (eg bp-73f481ba-d3ff-4713-b5c6-fcfab0221004 ) and [@ js::gc::detail::CellHasStoreBuffer ] (eg bp-209c9369-21c0-4f1c-a753-f13cb0221004 ).

Bug 1791509 is on file for adding at least some atomic std methods to the prefix list. I don't think that will make this change to one of the existing signatures, but ideally it would at least get us in a state where more obvious GC-related stuff is in the signature.

Depends on: 1791509

When I search for these signatures and facet on proto signature they all do just look like the existing soup of GC traverse and sweep crashes.

(In reply to Andrew McCreight [:mccr8] from comment #4)

When I search for these signatures and facet on proto signature they all do just look like the existing soup of GC traverse and sweep crashes.

Yes, I opened a bunch of them by hand and they all looked GC-related too (i.e. most likely machines with bad memory).

The bug is linked to topcrash signatures, which match the following criteria:

  • Top 20 desktop browser crashes on beta (startup)
  • Top 10 content process crashes on beta

For more information, please visit auto_nag documentation.

This is a very generic signature. Most of the stacks look like existing GC bad memory / heap corruption crashes though.

Blocks: GCCrashes
Flags: needinfo?(sphink)
Keywords: stalled

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Looking at the crash addresses, I see different things, including ones which are looking like an integer field with bit flags sets.
Looking at the stack report I see the name CellWithLengthAndFlags, which seems to be the kind of content that we are looking for, but I would not expect us to de-reference this kind of data.

Could this be a cast of the wrong offset or a miss-interpreted register?

What is the likelihood of having these (comment 10) kind of crash addresses?

Flags: needinfo?(jcoppeard)

It looks like general corruption: some kind of GC pointer field that has the wrong thing in it, perhaps because the structure that stores the pointer is itself not what the code thinks it is. So the crash is when dereferencing a non-Cell* as a Cell* (specifically trying to load the length and flags from something that isn't even a pointer). If the crash address appears to be an integer field with bit flags set, then I would guess it's because the owning struct is being misinterpreted, and we're looking at some expected-to-be-Cell-pointer field when it's actually a flags field or similar.

(Because you're right, no code should be dereferencing a CellWithLengthAndFlags value, except maybe when following forwarding pointers but the stacks are not within a minor GC so that doesn't appear to be happening.)

I notice that the crash addresses are all aligned. Or at least, the ones that aren't 0xffffffffffffffff are aligned. Maybe the unaligned ones are getting reported as 0xffffffffffffffff? I would check another OS, except 100% of the crashes are on Windows.

If this is the case, then it should be more common to not make it as far as the _Load_relaxed_8 or whatever, and instead crash earlier. But that would show up as a different signature, and would be one of our other buckets that look like generalized corruption. This signature would be for the general corruption where we happen to interpret the corrupt value as something containing a Cell* to be examined. (If instead the pointer to the owning struct wasn't a pointer, it would crash when dereferencing that pointer in just about any code, before it made it here.)

Feel free to point out where my analysis doesn't make sense!

(In reply to Nicolas B. Pierron [:nbp] from comment #11)

What is the likelihood of having these (comment 10) kind of crash addresses?
Agree with Steve, these looks like the normal sort of GC crash addresses.

Flags: needinfo?(jcoppeard)

Note that Socorro now ignores the functions starting with std::_Load so the crashes under this signature should disappear and re-appear under new more useful signatures.

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3

mozilla::detail::IntrinsicMemoryOps<T>::load Seems to be the new name for these crashes. It got wrongly associated to Bug 1702010.

Crash Signature: [@ std::_Load_relaxed_8] [@ std::_Load_relaxed_4] → [@ std::_Load_relaxed_8] [@ std::_Load_relaxed_4] [@ mozilla::detail::IntrinsicMemoryOps<T>::load]
Summary: Crash in [@ std::_Load_relaxed_8] or [@ std::_Load_relaxed_4] → Crash in [@ std::_Load_relaxed_8] or [@ std::_Load_relaxed_4] or [@ mozilla::detail::IntrinsicMemoryOps<T>::load ]

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on release (startup)
  • Top 20 desktop browser crashes on beta (startup)
  • Top 10 desktop browser crashes on nightly
  • Top 10 content process crashes on beta
  • Top 10 content process crashes on release

:sdetar, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(sdetar)

We likely want all mozilla::detail::IntrinsicMemoryOps* prefixes to be ignored during crash signature generation, shall we drop them?

Flags: needinfo?(nicolas.b.pierron)

(In reply to Gabriele Svelto [:gsvelto] from comment #18)

We likely want all mozilla::detail::IntrinsicMemoryOps* prefixes to be ignored during crash signature generation, shall we drop them?

This sounds good to me, then make sure to reassign this particular bug to js::gc::CellWithLengthAndFlags::headerFlagsField, or the next function which appears to be the new crash location given the stack trace from comment 0.

Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(sdetar)
Depends on: 1798479

From my understanding, this signature covered multiple different bugs. As the signatures are being divided in their own individual bugs, I will mark it as WONTFIX. Feel free to undo this change.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
You need to log in before you can comment on or make changes to this bug.