Open Bug 1797655 Opened 2 years ago Updated 1 year ago

Crash in [@ servo_arc::thin_to_thick]

Tracking

()

Status:

REOPENED

Tracking Flags:

Tracking

Status

firefox-esr102

---

unaffected

firefox111

---

wontfix

firefox112

---

wontfix

firefox113

---

wontfix

firefox114

---

fix-optional

People

(Reporter: ash153311, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

about:support file about this crash 2 years ago Taegeon Lee 63.59 KB, text/plain		Details
memtest64ramtest.png 2 years ago Taegeon Lee 36.25 KB, image/png		Details

Taegeon Lee

Reporter

Description

•

2 years ago

Crash report: https://crash-stats.mozilla.org/report/index/4528a7eb-af00-4d4d-bb71-5b4570221027

Reason: EXCEPTION_ACCESS_VIOLATION_READTop 10 frames of crashing thread:

0  xul.dll  servo_arc::thin_to_thick  servo/components/servo_arc/lib.rs:911
0  xul.dll  servo_arc::Arc<servo_arc::HeaderSlice<servo_arc::HeaderWithLength<selectors::builder::SpecificityAndFlags>, slice$<enum$<selectors::parser::Component<style::gecko::selector_parser::SelectorImpl> > > > >::from_thin  servo/components/servo_arc/lib.rs:1041
0  xul.dll  servo_arc::impl$37::drop  servo/components/servo_arc/lib.rs:1007
0  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
0  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
0  xul.dll  core::ptr::drop_in_place<style::invalidation::element::invalidation_map::Dependency>  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
1  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
1  xul.dll  alloc::vec::impl$28::drop  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/vec/mod.rs:2920
1  xul.dll  core::ptr::drop_in_place  ../a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487
1  xul.dll  smallvec::impl$36::drop  third_party/rust/smallvec/src/lib.rs:1815

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

2 years ago

The Bugbug bot thinks this bug should belong to the 'Core::CSS Parsing and Computation' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: General → CSS Parsing and Computation

Emilio Cobos Álvarez (:emilio)

Comment 2

•

2 years ago

Can you reproduce this somewhat consistently? We've seen some related crashes but they seem to be caused by some hardware failure / external memory corruption.

Flags: needinfo?(ash153311)

Taegeon Lee

Reporter

Comment 3

•

2 years ago

•

Edited

I cannot reproduce this crash precisely because I reported several crashes with different signatures.
However, all crash reports indicate that the first crashing thread is the same.

https://hg.mozilla.org/releases/mozilla-release/file/4b58b98809817ef444fab57234d4702e4581320d/servo/components/servo_arc/lib.rs#l911

Flags: needinfo?(ash153311)

Emilio Cobos Álvarez (:emilio)

Comment 4

•

2 years ago

Well that's the main thread so not terribly surprising (it's where most work happens). Curious, can you attach your about:support information?

Also, if you had the time, would there be any chance you could run memtest on your machine and report if there are any errors?

We've seen some of these that correlate with particular CPU versions or so, but that doesn't seem to be the case here, so it'd be great to confirm it's also not bad memory (which is a usual thing when a single user finds crashes in a bunch of different places that don't seem related to each other at a glance).

We've found the style system hash maps in particular in big pages are very prone to crash on bitflips (because they allocate massive contiguous pages of memory that get read and written a lot). See all the investigation and diagnostics in bug 1406996 and dependent bugs.

Thanks.

Flags: needinfo?(ash153311)

Taegeon Lee

Reporter

Comment 5

•

2 years ago

Attached file about:support file about this crash — Details

Taegeon Lee

Reporter

Comment 6

•

2 years ago

Attached image memtest64ramtest.png — Details

I test memtest64 and result with no error with 10 loops.

Flags: needinfo?(ash153311)

Tiaan Louw

Updated

•

2 years ago

Updated

•

2 years ago

Severity: -- → S3

BugBot [:suhaib / :marco/ :calixte]

Comment 7

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 AArch64 and ARM crashes on release

:emilio, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(emilio)

Keywords: topcrash

Emilio Cobos Álvarez (:emilio)

Comment 8

•

2 years ago

All the crashes seem clearing the style hashmaps, hitting null pointers on things that shouldn't be null. I think it's effectively a version of bug 1406996, it's just that we now get inlined frames in the signatures.

Status: NEW → RESOLVED

Closed: 2 years ago

Duplicate of bug: stylo-hashmap-crashes

Flags: needinfo?(emilio)

Resolution: --- → DUPLICATE

Daniel Holbert [:dholbert]

Comment 9

•

2 years ago

Un-duping since this signature seems to be a substantial crasher right now and merits its own investigation.

Status: RESOLVED → REOPENED

No longer duplicate of bug: stylo-hashmap-crashes

Resolution: DUPLICATE → ---

Daniel Holbert [:dholbert]

Comment 10

•

2 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from bug 1406996 comment #76)

Manifestation of bug 1801006 maybe?

This one doesn't seem board-specific like that one was. So, a bit mysterious.

Emilio Cobos Álvarez (:emilio)

Comment 11

•

2 years ago

The fact that it doesn't seem to happen on beta/nightly with such high volume is also a bit surprising. I'd disassemble one of the dumps to see if there's some funny code going on or so, but I'm more familiar with x86 assembly... Mike do you know if there's any specific toolchain change or so that could explain this spike?

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 12

•

2 years ago

Without a time frame for the spike (not sure where to look), I can't tell much. We upgrade rust frequently, and depending how far back, we also updated clang.

Flags: needinfo?(mh+mozilla) → needinfo?(emilio)

Emilio Cobos Álvarez (:emilio)

Comment 13

•

2 years ago

I see a massive spike from February 13 to February 26, only on the release channel, and where most crashes come from Fenix 110.0.1 (which seems different from FF 110.0.1, which was just tagged, it seems to be 20230213213738, so from that date). Crash rate on Fenix 110.0 is much lower, if I'm reading the data correctly...

Flags: needinfo?(emilio) → needinfo?(mh+mozilla)

Emilio Cobos Álvarez (:emilio)

Comment 14

•

2 years ago

(We see some crashes with this signature in release before that, but the crash rate is fairly similar to what we were used to seeing in bug 1406996)

Daniel Holbert [:dholbert]

Updated

•

2 years ago

Depends on: stylo-hashmap-crashes

Mike Hommey [:glandium]

Comment 15

•

2 years ago

Between FF 109 and 110 (which is what I would expect to be in Fenix 110.0.1), we had bug 1797419 (update to rust 1.66 (from 1.65))

Flags: needinfo?(mh+mozilla)

Chris Peterson [:cpeterson]

Updated

•

2 years ago

status-firefox111: --- → affected

status-firefox112: --- → affected

status-firefox113: --- → affected

status-firefox-esr102: --- → affected

Keywords: regression

Donal Meehan [:dmeehan]

Updated

•

2 years ago

status-firefox111: affected → wontfix

Suhaib Mujahid [:suhaib]

Updated

•

2 years ago

status-firefox-esr102: affected → unaffected

Daniel Holbert [:dholbert]

Comment 16

•

2 years ago

•

Edited

So it sounds like this might be associated with the rust toolchain-upgrade.

(I imagine that would have been in Fenix 110.0 as well; I'm not clear why 110.0 had low crash volume with respect to 110.0.1. Maybe that's due to most users getting upgraded to 110.0.1 relatively quickly, due to release hold-backs and/or 110.0.1 coming out early in the release cycle, and hence 110.0 not having a long time to experience crashes?)

In any case, it seems like comment 4 and comment 8 contain our latest assessments here (and make this unfortunately not-very-actionable).

Suhaib Mujahid [:suhaib]

Updated

•

2 years ago

status-firefox112: affected → wontfix

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox113: affected → wontfix

status-firefox114: --- → fix-optional

Randell Jesup [:jesup] (needinfo me)

Updated

•

1 year ago

Flags: needinfo?(emilio)

Flags: needinfo?(dholbert)

Mike Hommey [:glandium]

Comment 18

•

1 year ago

Note that this could be an issue similar to bug 1831242, whereby some optimization breaks things because of blatant undefined behavior. LLVM 16 made things much worse, but we can't exclude earlier versions of LLVM breaking things in smaller ways.

Emilio Cobos Álvarez (:emilio)

Comment 19

•

1 year ago

This shouldn't be related to bug 1831242, there's no ffi shenanigans involved here. We added a bunch of diagnostics for these hashmaps in the past (see bug 1406996) and got nowhere.

Flags: needinfo?(emilio)

Daniel Holbert [:dholbert]

Updated

•

1 year ago

Flags: needinfo?(dholbert)

Comment hidden (Intermittent Failures Robot)

BugBot [:suhaib / :marco/ :calixte]

Comment 21

•

1 year ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Comment hidden (Intermittent Failures Robot)

You need to log in before you can comment on or make changes to this bug.