Closed Bug 1672560 Opened 4 years ago Closed 4 years ago

Crash in [@ nsRefreshDriver::AddRefreshObserver]

Categories

(Core :: Layout, defect)

defect

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox81 --- unaffected
firefox82 --- unaffected
firefox83 --- unaffected
firefox84 + fixed

People

(Reporter: aryx, Assigned: bgrins)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

If the regressing change is identified, this could need a backout (22 installations affected so far, there is one from 2 days ago, something from earlier but haven't checked that).

Emilio, could this be from bug 1665476?

Push log: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5b5ffbe4add97c98003026e0d9a730302161b646&tochange=d8861d51b01e9489672f998648d67662a60a8b3a

Crash report: https://crash-stats.mozilla.org/report/index/2810b931-9f8d-4202-8b38-cca100201021

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(mPresContext)

Top 10 frames of crashing thread:

0 xul.dll nsRefreshDriver::AddRefreshObserver layout/base/nsRefreshDriver.cpp:1260
1 xul.dll mozilla::dom::L10nMutations::StartRefreshObserver dom/l10n/L10nMutations.cpp:179
2 xul.dll mozilla::dom::L10nMutations::L10nElementChanged dom/l10n/L10nMutations.cpp:114
3 xul.dll static mozilla::dom::MutationObservers::NotifyAttributeChanged dom/base/MutationObservers.cpp:163
4 xul.dll mozilla::dom::Element::UnsetAttr dom/base/Element.cpp:2721
5 xul.dll mozilla::dom::Element::RemoveAttribute dom/base/Element.cpp:1369
6 xul.dll mozilla::dom::Element_Binding::removeAttribute dom/bindings/ElementBinding.cpp:1469
7 xul.dll mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions> dom/bindings/BindingUtils.cpp:3229
8 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:598
9 xul.dll Interpret js/src/vm/Interpreter.cpp:3336
Flags: needinfo?(emilio)

Not really, not related to bug 1665476 whatsoever. Zibi has anything localization-related changed recently which could explain this?

Flags: needinfo?(emilio) → needinfo?(gandalf)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)

Not really, not related to bug 1665476 whatsoever. Zibi has anything localization-related changed recently which could explain this?

bug 1660393 looks plausible?

Dan can you take a look?

Flags: needinfo?(dglastonbury)

:mhoye had steps to repro that are moderately reliable - enable fission, open customize mode, switch themes, and then quickly switch tabs / mouse around.

This crashes earlier builds for me, so I'm clearing needinfo for the l10n folks.

Earliest I've seen crash so far was https://hg.mozilla.org/integration/autoland/rev/40bc2dbc5201def6aebc34113cd9c422c82f85ff , which given https://bugzilla.mozilla.org/show_bug.cgi?id=1668875#c26 and the STR seems moderately plausible as a regressor?

Edit: slack thread for reference: https://mozilla.slack.com/archives/C4D3JFF26/p1603389242479700

Flags: needinfo?(gandalf)
Flags: needinfo?(dglastonbury)

I was wrong, Mike can crash the build from cb8c18a63c9028b0ea8181bcc4053b425a7562d0, which narrows the window a bit further.

Because I'm struggling to mozregression effectively, I passed that on to Mike, who can hopefully find us a culprit.

Flags: needinfo?(mhoye)
Fission Milestone: --- → ?

This crash affects Fission and non-Fission, so it's probably not a Fission bug. Perhaps Fission process timing makes the crash more likely?

No longer blocks: fission-dogfooding
Fission Milestone: ? → ---

Mike can reliably crash 38627c09131de26fac7d842bf3cc8572987c3129 but nothing before that, which given the STR is also plausible (and definitely well before 40bc2dbc5201def6aebc34113cd9c422c82f85ff ).

Brian, Emilio, I guess it's plausible that after bug 1671619, we could be destroying a prescontext at an inopportune time, or something? Maybe the callsite at https://hg.mozilla.org/mozilla-central/file/03de9a8a6f7c949b046b5a1197988391ede9e84f/dom/l10n/L10nMutations.cpp#l179 needs to be updated to do more checks before trying to add the refresh observer? Or maybe we're destroying a prescontext but should be destroying the refresh driver as well, but aren't, or something?

Flags: needinfo?(emilio)
Flags: needinfo?(bgrinstead)
Flags: needinfo?(mhoye)

I'm going to mark this as a regression. You can see similar looking asserts on older builds, very rarely, but Nightly went from 0 or 1 crashes per build to more than 20 crashes per build, so something got a lot worse.

Keywords: regression

Will have to wait for Emilio's input on Comment 7, but if it makes things easier I don't have any problem with Bug 1671619 getting backed out in the meantime.

Flags: needinfo?(bgrinstead)

Bug 1671619 got backed out from central, the currently building nightly will have the change reverted.

I think easiest fix for the crash is indeed just adding a mRefreshDriver->GetPresContext() check too... But we probably want to audit what exactly is the L10Mutations code trying to do, because setting hidden back and forth will potentially recreate the pres-context if the document is in the same process (which I suspect we should avoid, that was my concern with bug 167619 comment 12...). I bet if the page is in-process bug 1671619 causes all sorts of havoc (including potentially failing its own tests).

Flags: needinfo?(emilio)

Setting S2, though looks like the crash has stopped since the backout.

Severity: -- → S2

Confirmed fixed by backout.

Assignee: nobody → bgrinstead
Status: NEW → RESOLVED
Closed: 4 years ago
Regressed by: 1671619
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.