Open Bug 1815788 Opened 2 years ago Updated 2 years ago

Crash in [@ core::core_arch::x86::sse2::_mm_loadu_si128] under hashbrown::HashMap::get_inner (from CSS/neqo/cert-storage/...)

Categories

(Core :: General, defect)

defect

Tracking

()

Tracking Status
firefox-esr102 --- unaffected
firefox109 --- wontfix
firefox110 --- affected
firefox111 --- affected

People

(Reporter: aryx, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash)

Crash Data

This crash signature affects Ubuntu 20.04 the most (38% of crashes) with 46% of all crashes in the first minute. The crash count per release version branch is increasing (106: 1, 107: 20, 108: 69, 109: 159). Interestingly, the dot releases (e.g. x.0.1) have more crash reports than the initial one (x.0). At least for branch 109, there should not be such a difference (23 vs 136).

Crash report: https://crash-stats.mozilla.org/report/index/4032edab-6e51-4423-84e6-6413e0230208

Reason: SIGSEGV / SI_KERNEL

Top 7 frames of crashing thread:

0  libxul.so  core::intrinsics::copy_nonoverlapping  /build/rustc-8Y3gKi/rustc-1.63.0+dfsg0ubuntu1~llvm/library/core/src/intrinsics.rs:2137
0  libxul.so  core::core_arch::x86::sse2::_mm_loadu_si128  /build/rustc-8Y3gKi/rustc-1.63.0+dfsg0ubuntu1~llvm/library/stdarch/crates/core_arch/src/x86/sse2.rs:1196
0  libxul.so  hashbrown::raw::sse2::Group::load  /build/rustc-8Y3gKi/rustc-1.63.0+dfsg0ubuntu1~llvm/vendor/hashbrown/src/raw/sse2.rs:50
0  libxul.so  hashbrown::raw::RawTableInner<A>::find_inner  /build/rustc-8Y3gKi/rustc-1.63.0+dfsg0ubuntu1~llvm/vendor/hashbrown/src/raw/mod.rs:1174
0  libxul.so  hashbrown::raw::RawTable<T, A>::find  /build/rustc-8Y3gKi/rustc-1.63.0+dfsg0ubuntu1~llvm/vendor/hashbrown/src/raw/mod.rs:816
0  libxul.so  hashbrown::rustc_entry::<impl hashbrown::map::HashMap<K, V, S, A>>::rustc_entry  /build/rustc-8Y3gKi/rustc-1.63.0+dfsg0ubuntu1~llvm/vendor/hashbrown/src/rustc_entry.rs:36
0  libxul.so  std::collections::hash::map::HashMap<K, V, S>::entry  /build/rustc-8Y3gKi/rustc-1.63.0+dfsg0ubuntu1~llvm/library/std/src/collections/hash/map.rs:853

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 5 desktop browser crashes on Linux on release

For more information, please visit auto_nag documentation.

Keywords: topcrash

Setting a component for this issue.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

Looks like the crashes are happening from accessing a hash map in the style resolver. Changing component.

Component: Graphics: WebRender → CSS Parsing and Computation

The severity field is not set for this bug.
:dholbert, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dholbert)

(In reply to Brad Werth [:bradwerth] from comment #3)

Looks like the crashes are happening from accessing a hash map in the style resolver. Changing component.

Seems like bug 1406996. emilio, should we dupe there, or leave this open for investigation like bug 1797655?

(It looks like crash volume here is pretty low, fortunately, aside from the spike on Feb 7th, which is what prompted this bug to be filed I think.)

Flags: needinfo?(dholbert) → needinfo?(emilio)

This seems a bit more general to "rust hashmaps". It's not only style that's affected, e.g., bp-ddcb20cb-cabe-4fd6-aae2-ad3d80230223 is in cert-storage, and there's another crash in neqo.

Ultimately I believe the root cause is going to be the same tho (bit flips).

Component: CSS Parsing and Computation → General
Flags: needinfo?(emilio)
Summary: Crash in [@ core::core_arch::x86::sse2::_mm_loadu_si128] → Crash in [@ core::core_arch::x86::sse2::_mm_loadu_si128] under hashbrown::HashMap::get_inner (from CSS/neqo/cert-storage/...)
Severity: -- → S3

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #0)

Interestingly, the dot releases (e.g. x.0.1) have more crash reports than the initial one (x.0).
(In reply to Emilio Cobos Álvarez (:emilio) from comment #6)
Ultimately I believe the root cause is going to be the same tho (bit flips).

Superficially, it's hard to harmonize these^ two statements -- it doesn't seem like there should be any correlation between bit flips and dot releases / version-number.

But, guessing/hand-waving a bit... I could imagine that there are classes of users (e.g. maybe enterprise installations) that intentionally skip the initial release of each major version, and choose to wait for the first dot-release before updating. And it's conceivable that bitflip-prone RAM might be higher among this set of users, for economic or hardware-lifecycle reasons. If this is true (again, I'm totally guessing), then that could explain the higher prevalence of bitflip-type-crashes on dot releases vs. the initial release.

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
You need to log in before you can comment on or make changes to this bug.