Open Bug 1731142 Opened 3 years ago Updated 2 months ago

Crash in [@ style::stylist::CascadeData::add_rule_list<T>]

Categories

(Core :: CSS Parsing and Computation, defect)

Unspecified
Windows
defect

Tracking

()

Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox92 --- unaffected
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- wontfix

People

(Reporter: aryx, Assigned: emilio)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

8 crashes from 7 different installations, all Windows

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/fb9f038d-8903-4e13-9e70-6cd180210916

MOZ_CRASH Reason: Locked::read_with called with a guard from an unrelated SharedRwLock

Top 10 frames of crashing thread:

0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:91
2 xul.dll core::ops::function::Fn::call<fn ../a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/ops/function.rs:227
3 xul.dll std::panicking::rust_panic_with_hook ../a178d0322ce20e33eac124758e837cbd80a6f633//library/std/src/panicking.rs:626
4 xul.dll std::panicking::begin_panic::{{closure}}<str> ../a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:542
5 xul.dll std::sys_common::backtrace::__rust_end_short_backtrace<closure-0, !> ../a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/sys_common/backtrace.rs:141
6 xul.dll std::panicking::begin_panic<str> ../a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:541
7 xul.dll style::stylist::CascadeData::add_rule_list<style::gecko::data::GeckoStyleSheet> servo/components/style/stylist.rs:2394
8 xul.dll style::stylist::CascadeData::add_stylesheet<style::gecko::data::GeckoStyleSheet> servo/components/style/stylist.rs:2453
9 xul.dll style::stylist::CascadeData::rebuild<style::gecko::data::GeckoStyleSheet> servo/components/style/stylist.rs:2042
Flags: needinfo?(emilio)

I think this probably just changed signatures with those changes, so shouldn't be a regression per se... But I'll add some more diagnostic information.

Flags: needinfo?(emilio)

Note that the crash reason is sanitized so we're not exposing anything
sensitive.

I think my patch just changed the signature of the stack, as it didn't
change anything related to guards or what not. But without knowing why
is failing or a repro it's hard to know what's going on.

Printing the address at list would give us some indication of what might
be going wrong (perhaps we're using a static lock when we don't expect
one or such?).

Assignee: nobody → emilio
Status: NEW → ASSIGNED
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d0661e7dd0c9
Print lock address on assert. r=firefox-style-system-reviewers,layout-reviewers,boris
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch

(It is only a diagnostic patch)

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Low volume crasher on beta and we are nearing the end of the beta cycle, fix-optional 93.

Status: REOPENED → NEW
Target Milestone: 94 Branch → ---

Anything useful in the Fx94 reports w/ diagnostic patch?

Flags: needinfo?(emilio)

Looking at some of the crashes I see messages like:

  • Locked::read_with called with a guard from an unrelated SharedRwLock: 0x4010 vs. 0x168e0680
  • Locked::read_with called with a guard from an unrelated SharedRwLock: 0x20 vs. 0x13f1ca8

That looks really weird, the locks are static and shouldn't change address and should be alive forever... I see other crashes that don't have crash reason and just are a very-near-null crash...

So nothing particularly actionable, I suspect some sort of corruption... I'll try to chkimg some of the reports...

Flags: needinfo?(emilio)
Has Regression Range: --- → yes

(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)

I think this probably just changed signatures with those changes, so shouldn't be a regression per se...

--> Dropping 'regression' keyword and moving 'regressed by' bug to 'depends on' instead.

Has Regression Range: yes → ---
Depends on: 1728754
Keywords: regression
No longer regressed by: 1728754

Found another signature and indeed it looks like memory corruption. The lock address is either totally bogus (e.g. 329c7dfa-3136-4135-993f-a4c3e0220414 SharedRwLock: 0x100000011 vs. 0x10c960ad0) or obvious bit flips (e.g. b127a677-6123-4f60-bb29-3b4200220411 SharedRwLock: 0x79fc8738e0 vs. 0x79fc873ae0).

Crash Signature: [@ style::stylist::CascadeData::add_rule_list<T>] → [@ style::stylist::CascadeData::add_rule_list<T>] [@ style::stylist::CascadeData::add_rule_list]

(In reply to Gabriele Svelto [:gsvelto] from comment #10)

Found another signature and indeed it looks like memory corruption.

S2 doesn't seem appropriate then. Let's call this S4 I guess, given that there's presumably not much we can do aside from hypothetically shipping new memory modules to our users. :)

(gsvelto, do we have any other sort of resolution or end-state for bugs for crash signatures with ongoing crash volume that seem to be memory-corruption?)

Severity: S2 → S4
Flags: needinfo?(gsvelto)

Agreed, we can re-evaluate these bugs once we have bit-flip detection and we can finally tell most of the just bad ones apart.

Flags: needinfo?(gsvelto)

It looks like we've got folks reliably reproducing a crash with a backtrace that matches comment 0 (in TorBrowser on FreeBSD, I think?) over in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272445

See in particular the first attachment there, which is a backtrace that matches comment 0 exactly. I'll quote the relevant bits of the backtrace from that attachment (after the core::panicking::panic_fmt invocation):

#15 0x000036208ee0b752 in style::stylist::CascadeData::add_rule_list::h10605d5720993ba3 () at /usr/local/lib/tor-browser/libxul.so
#16 0x000036208ee0bac1 in style::stylist::CascadeData::add_stylesheet::hd116a1b1152580f6 () at /usr/local/lib/tor-browser/libxul.so
#17 0x000036208ee0c13c in style::stylist::CascadeData::rebuild::hec212d139969252d () at /usr/local/lib/tor-browser/libxul.so
#18 0x000036208ee5c643 in Servo_StyleSet_FlushStyleSheets () at /usr/local/lib/tor-browser/libxul.so
#19 0x000036208ca731b2 in mozilla::ServoStyleSet::UpdateStylist() () at /usr/local/lib/tor-browser/libxul.so
#20 0x000036208cad318e in mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) () at /usr/local/lib/tor-browser/libxul.so
#21 0x000036208caaa4cb in nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick) () at /usr/local/lib/tor-browser/libxul.so
#22 0x000036208cab04b9 in mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) ()

ni=emilio to be sure he sees comment 13. Seems like we might have some kinda-reliable STR (in TorBrowser on FreeBSD) over on that above-linked bug; though it probably requires a bit of setup to get into a state where we can poke. Still: that might be our best clue yet about how to proceed here.

(For the record, it does look like we continue to see crash volume in recent versions here. If I add "major version" on the aggregations page for the add_rule_list signature -- no <T> -- I see we've had 37 crashes for v117 (current release), and 1 crash for v118 (currently on beta, though the one crash was from when 118 was nightly). There are zero crashes for 119 Nightly, but that's not too surprising given the lower nightly vs. release population. Not sure if there was anything in the current 119 Nightly cycle that would've helped here; I assume probably-not.)

Flags: needinfo?(emilio)

Thank you,

(In reply to Daniel Holbert [:dholbert] from comment #13)

… (in TorBrowser on FreeBSD, I think?) …

Correct. Orientation: https://www.freshports.org/www/tor-browser/ below the description, alongside the home icon there are four icons for mirrors of the part of the FreeBSD 'ports' tree that relates to this port.

At https://github.com/freebsd/freebsd-ports/tree/main/www/tor-browser (in the GitHub mirror) the files subdirectory contains patches, and so on.

I'll happily do what I can to debug, although I'm not a developer.

I can reproduce various crashes if I run their tor-browser package on a FreeBSD VM.

Questions are:

  • Are there debug symbols for these packages?
  • It seems there are fixes to build with rust 1.70, but tor browser is based on ESR. There were a bunch of build fixes when we updated to rust 1.70 that fixed various memory issues with that compiler, see dependencies of bug 1821228. But those don't seem to be on the Tor repo. If it's built with rust 1.70 I suspect it's one of the reasons for this. Would there be a chance to use the same rust compiler version as our official builds?
  • Does this reproduce on Firefox ESR package? I couldn't repro at a glance. Is there any significant difference on how the firefox-esr port and the tor-browser port are built that could explain this?
Flags: needinfo?(emilio)
  • Are there debug symbols for these packages?

Re: https://www.freshports.org/www/tor-browser/#config, FreeBSD-provided packages for this port are built with DEBUG off by default.

I built with DEBUG on, the resulting package is unexpectedly DEBUG off, I'll seek an explanation in a FreeBSD area.

See Also: → 1880581
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: