Open Bug 1797655 Opened 2 years ago Updated 1 year ago

Crash in [@ servo_arc::thin_to_thick]

Categories

(Core :: CSS Parsing and Computation, defect)

Other Branch
Desktop
All
defect

Tracking

()

REOPENED
Tracking Status
firefox-esr102 --- unaffected
firefox111 --- wontfix
firefox112 --- wontfix
firefox113 --- wontfix
firefox114 --- fix-optional

People

(Reporter: ash153311, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/4528a7eb-af00-4d4d-bb71-5b4570221027

Reason: EXCEPTION_ACCESS_VIOLATION_READTop 10 frames of crashing thread:

0  xul.dll  servo_arc::thin_to_thick  servo/components/servo_arc/lib.rs:911
0  xul.dll  servo_arc::Arc<servo_arc::HeaderSlice<servo_arc::HeaderWithLength<selectors::builder::SpecificityAndFlags>, slice$<enum$<selectors::parser::Component<style::gecko::selector_parser::SelectorImpl> > > > >::from_thin  servo/components/servo_arc/lib.rs:1041
0  xul.dll  servo_arc::impl$37::drop  servo/components/servo_arc/lib.rs:1007
0  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
0  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
0  xul.dll  core::ptr::drop_in_place<style::invalidation::element::invalidation_map::Dependency>  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
1  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
1  xul.dll  alloc::vec::impl$28::drop  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/mod.rs:2920
1  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
1  xul.dll  smallvec::impl$36::drop  third_party/rust/smallvec/src/lib.rs:1815

The Bugbug bot thinks this bug should belong to the 'Core::CSS Parsing and Computation' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: General → CSS Parsing and Computation

Can you reproduce this somewhat consistently? We've seen some related crashes but they seem to be caused by some hardware failure / external memory corruption.

Flags: needinfo?(ash153311)

I cannot reproduce this crash precisely because I reported several crashes with different signatures.
However, all crash reports indicate that the first crashing thread is the same.

https://hg.mozilla.org/releases/mozilla-release/file/4b58b98809817ef444fab57234d4702e4581320d/servo/components/servo_arc/lib.rs#l911

Flags: needinfo?(ash153311)

Well that's the main thread so not terribly surprising (it's where most work happens). Curious, can you attach your about:support information?

Also, if you had the time, would there be any chance you could run memtest on your machine and report if there are any errors?

We've seen some of these that correlate with particular CPU versions or so, but that doesn't seem to be the case here, so it'd be great to confirm it's also not bad memory (which is a usual thing when a single user finds crashes in a bunch of different places that don't seem related to each other at a glance).

We've found the style system hash maps in particular in big pages are very prone to crash on bitflips (because they allocate massive contiguous pages of memory that get read and written a lot). See all the investigation and diagnostics in bug 1406996 and dependent bugs.

Thanks.

Flags: needinfo?(ash153311)
Attached image memtest64ramtest.png

I test memtest64 and result with no error with 10 loops.

Flags: needinfo?(ash153311)
See Also: → 1797657
Severity: -- → S3

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on release

:emilio, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(emilio)
Keywords: topcrash

All the crashes seem clearing the style hashmaps, hitting null pointers on things that shouldn't be null. I think it's effectively a version of bug 1406996, it's just that we now get inlined frames in the signatures.

Status: NEW → RESOLVED
Closed: 2 years ago
Duplicate of bug: stylo-hashmap-crashes
Flags: needinfo?(emilio)
Resolution: --- → DUPLICATE

Un-duping since this signature seems to be a substantial crasher right now and merits its own investigation.

Status: RESOLVED → REOPENED
No longer duplicate of bug: stylo-hashmap-crashes
Resolution: DUPLICATE → ---

(In reply to Ryan VanderMeulen [:RyanVM] from bug 1406996 comment #76)

Manifestation of bug 1801006 maybe?

This one doesn't seem board-specific like that one was. So, a bit mysterious.

The fact that it doesn't seem to happen on beta/nightly with such high volume is also a bit surprising. I'd disassemble one of the dumps to see if there's some funny code going on or so, but I'm more familiar with x86 assembly... Mike do you know if there's any specific toolchain change or so that could explain this spike?

Flags: needinfo?(mh+mozilla)

Without a time frame for the spike (not sure where to look), I can't tell much. We upgrade rust frequently, and depending how far back, we also updated clang.

Flags: needinfo?(mh+mozilla) → needinfo?(emilio)

I see a massive spike from February 13 to February 26, only on the release channel, and where most crashes come from Fenix 110.0.1 (which seems different from FF 110.0.1, which was just tagged, it seems to be 20230213213738, so from that date). Crash rate on Fenix 110.0 is much lower, if I'm reading the data correctly...

Flags: needinfo?(emilio) → needinfo?(mh+mozilla)

(We see some crashes with this signature in release before that, but the crash rate is fairly similar to what we were used to seeing in bug 1406996)

Between FF 109 and 110 (which is what I would expect to be in Fenix 110.0.1), we had bug 1797419 (update to rust 1.66 (from 1.65))

Flags: needinfo?(mh+mozilla)

So it sounds like this might be associated with the rust toolchain-upgrade.

(I imagine that would have been in Fenix 110.0 as well; I'm not clear why 110.0 had low crash volume with respect to 110.0.1. Maybe that's due to most users getting upgraded to 110.0.1 relatively quickly, due to release hold-backs and/or 110.0.1 coming out early in the release cycle, and hence 110.0 not having a long time to experience crashes?)

In any case, it seems like comment 4 and comment 8 contain our latest assessments here (and make this unfortunately not-very-actionable).

Flags: needinfo?(emilio)
Flags: needinfo?(dholbert)

Note that this could be an issue similar to bug 1831242, whereby some optimization breaks things because of blatant undefined behavior. LLVM 16 made things much worse, but we can't exclude earlier versions of LLVM breaking things in smaller ways.

This shouldn't be related to bug 1831242, there's no ffi shenanigans involved here. We added a bunch of diagnostics for these hashmaps in the past (see bug 1406996) and got nowhere.

Flags: needinfo?(emilio)
Flags: needinfo?(dholbert)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: