Closed Bug 1488886 Opened 5 years ago Closed 2 months ago

Crash in mozilla::layers::APZCTreeManagerParent::RecvUpdateZoomConstraints

Categories

(Core :: Panning and Zooming, defect, P3)

defect

Tracking

()

RESOLVED FIXED
110 Branch
Tracking Status
thunderbird_esr102 --- fixed
firefox-esr60 --- wontfix
firefox-esr102 110+ fixed
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox108 --- wontfix
firefox109 --- wontfix
firefox110 --- fixed

People

(Reporter: marcia, Assigned: dlrobertson)

References

Details

(Keywords: crash, regression, Whiteboard: [tbird crash])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-7314e0a6-2bd8-4eb4-bab0-ad0190180904.
=============================================================

Seen while looking at nightly crash stats: https://bit.ly/2MTAkqk. Cross Platform crash which is more visible in 63 - and a few crashes in Linux in the latest 64 build. Around 22 crashes in the last 7 days (63/64).

No comments and - the 2 URLs that come up are a Slack message and moz-extension://65a29e7a-8dda-4a4c-a11e-2d4dd86e7d79/data/slider/slider.html 

Top 10 frames of crashing thread:

0 xul.dll mozilla::layers::APZCTreeManagerParent::RecvUpdateZoomConstraints gfx/layers/ipc/APZCTreeManagerParent.cpp:128
1 xul.dll mozilla::layers::PAPZCTreeManagerParent::OnMessageReceived ipc/ipdl/PAPZCTreeManagerParent.cpp:326
2 xul.dll void mozilla::ipc::MessageChannel::DispatchMessageW ipc/glue/MessageChannel.cpp:2166
3 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:2045
4 xul.dll bool MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:459
5 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:534
6 xul.dll base::MessagePumpForUI::DoRunLoop ipc/chromium/src/base/message_pump_win.cc:210
7 xul.dll base::MessagePumpWin::Run ipc/chromium/src/base/message_pump_win.h:80
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:318
9 xul.dll base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:181

=============================================================
Component: Graphics → Panning and Zooming
Flags: needinfo?(botond)
Priority: -- → P3
The crash reports suggest that mTreeManager is null in APZCTreeManagerParent::RecvUpdateZoomConstraints().

Based on code inspection, some possible ways this could occur are:

  1) APZCTreeManagerParent was given a null APZCTreeManager in
     the constructor or in ChildAdopted().

     This would cause any subsequent RecvXXX() call on the
     APZCTreeManagerParent to crash, so RecvUpdateZoomConstraints()
     would have to be the first such call, which is possible
     given that this is a startup crash.

  2) The APZCTreeManagerParent object itself has been destroyed 
     and its fields overwritten (as far as I can tell, the 
     destructor itself doesn't clear mTreeManager) at the time of 
     the RecvUpdateZoomConstraints() call (i.e., a use-after-free).

Neither of these should happen, but I haven't been able to rule out either by inspection. #1 seems more likely than #2 (in particular, if it were #2, I think we'd see some crashes at addresses other than 0x0).

We could be defensive and add a check for a null mApzcTreeManager in release builds (we already assert in the places where the field is set in debug builds), but we could be covering up a deeper problem.

Ryan, are you aware of any IPC-related changes in the past couple of months that may have caused a spike in this crash?
Flags: needinfo?(botond) → needinfo?(rhunt)
Unfortunately I can't think of any general changes. I also don't think specifically the PAPZCTreeManager code has been touched in a while.
Flags: needinfo?(rhunt)
17 crashes/20 installs across all platforms in the last 7 days. No real useful comments or URLs. One URL appears to be a slack message and the other is a link to amazon music album.
Marking fix-optional for 64. We could still take a patch for 65, and if it's verified and doesn't seem risky, could still take fixes for 64 as well.
I started Nightly (on Linux) this morning, checked for updates, an update was found and downloaded, I closed the browser and started it again. It crashed after applying the update when the UI was starting to show up, the crash ID is 462236b9-5f74-4fd6-8fba-c57740181223. My comment in the crash report might say something different about when it crashed, just ignore that. I looked into the crash report and found this bug as related.

Note that I have "privacy.resistFingerprinting=true", so that the window is resized during startup. Start page is just about:blank. If you have further questions (e.g. regarding my setup), I would be happy to answer them.
Marking fix-optional for 65 and 66 so that these already triaged issues don't show up repeatedly in weekly regression triage. Happy to take a patch in nightly.

It looks like Florian figured out a way to reproduce this crash reliably. See bug 1742797 comment 85.

See Also: → 1779309, 1742797

Those steps to reproduce worked, but as soon as another browser chrome test was added (less than a day after that bug landed) they no longer reproduced. The new test must have changed the chunking or order or something about the tests. Still possible to debug by pushing to try on a month+ old tip though.

Severity: critical → S2

FWIW the release of Thunderbird 102 seems to have resulted in an increase in the frequency of this crash.

Did some debugging and found that we're getting a null mApzcTreeManager in the call to CompositorBridgeParent::AllocateAPZCTreeManagerParent in ContentCompositorBridgeParent::AllocPAPZCTreeManagerParent. I haven't verified this, but I'm guessing this occurs when CompositorBridgeParent::StopAndClearResources sets the mApzcTreeManager to null here. Perhaps we should enter this block if mPaused is set or mCanSend has been set by CompositorBridgeParent::ActorDestroy or CompositorBridgeParent::RecvWillClose.

Assignee: nobody → drobertson

(In reply to Dan Robertson (:dlrobertson) from comment #11)

Did some debugging and found that we're getting a null mApzcTreeManager in the call to CompositorBridgeParent::AllocateAPZCTreeManagerParent in ContentCompositorBridgeParent::AllocPAPZCTreeManagerParent. I haven't verified this, but I'm guessing this occurs when CompositorBridgeParent::StopAndClearResources sets the mApzcTreeManager to null here. Perhaps we should enter this block if mPaused is set or mCanSend has been set by CompositorBridgeParent::ActorDestroy or CompositorBridgeParent::RecvWillClose.

This theory appears to be incorrect, as I'd expect the assert to be different in this try run

Fix static-analysis warnings in ContentCompositorBridgeParent.

See Also: → 1805957
Pushed by drobertson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7fc8e79f2ff0
Fix static-analysis warnings. r=botond
https://hg.mozilla.org/integration/autoland/rev/8cdb673361ab
AllocPAPZCTreeManager should gracefully fail if APZ is not enabled. r=botond

Backed out for causing build bustages on ContentCompositorBridgeParent.cpp

Flags: needinfo?(drobertson)

I guess we need to do this sort of thing.

(In reply to Botond Ballo [:botond] from comment #17)

I guess we need to do this sort of thing.

That would work, but I may just use a standard iterator and use it.second. Is there another reason to unwrap the iterator like that?

Flags: needinfo?(drobertson)

(In reply to Dan Robertson (:dlrobertson) from comment #18)

(In reply to Botond Ballo [:botond] from comment #17)

I guess we need to do this sort of thing.

That would work, but I may just use a standard iterator and use it.second. Is there another reason to unwrap the iterator like that?

That works too. (One tiny nit, I would call it something like entry, since an "iterator" is something that dereferences to an element of the range being iterated, while the range-for gives you the result of dereferencing a (hidden) iterator.)

Pushed by drobertson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c220d03cfb7f
Fix static-analysis warnings. r=botond
https://hg.mozilla.org/integration/autoland/rev/00fb6658b130
AllocPAPZCTreeManager should gracefully fail if APZ is not enabled. r=botond
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 110 Branch

Not sure this is really worth backporting to Beta since Firefox doesn't appear to be crashing at all and TB is only crashing on 102, but I'm open to a nomination if y'all think it'd be worth it. It does graft cleanly to Beta & ESR.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #22)

Not sure this is really worth backporting to Beta since Firefox doesn't appear to be crashing at all and TB is only crashing on 102, but I'm open to a nomination if y'all think it'd be worth it. It does graft cleanly to Beta & ESR.

Thunderbird is based on ESR right? If so it might be worthwhile to backport to ESR, since the next ESR that would contain this isn't for some time. I would like to let this sit in nightly for a bit before a potential backport to ESR. IMO this is a low-risk patch, but I'd still like to have it tested in nightly a bit first.

Yeah, let's revisit next cycle.

Regressions: 1807221

(In reply to Jonathan Watt [:jwatt] from comment #10)

FWIW the release of Thunderbird 102 seems to have resulted in an increase in the frequency of this crash.

Yes. Thanks for catching that.

And an interesting, odd peak at 102.3.x, then dropped off weeks later. https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Alayers%3A%3AAPZCTreeManagerParent%3A%3ARecvUpdateZoomConstraints&date=%3E%3D2022-06-30T15%3A14%3A00.000Z&date=%3C2022-12-31T15%3A14%3A00.000Z#graphs Which is curious, because were were still massively migrating users from version 91, even in November.

Whiteboard: [tbird crash]

(In reply to Ryan VanderMeulen [:RyanVM] from comment #24)

Yeah, let's revisit next cycle.

The patch didn't seem to cause any regressions in nightly, so it might be worth a uplift to esr to check if it fixes the issue for thunderbird.

Frequency does seem lower for TB these days, but feel free to nominate.

Flags: needinfo?(drobertson)

Also disproportionately affects linux users of Thunderbird.

Comment on attachment 9308274 [details]
Bug 1488886 - AllocPAPZCTreeManager should gracefully fail if APZ is not enabled. r=botond

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Thunderbird is based on ESR and this bug largely impacts thunderbird. The next ESR release that would include this doesn't occur for quite some time.
  • User impact if declined: Crashes may occur from time to time on thunderbird.
  • Fix Landed on Version: 110
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Added a check to AllocPAPZCTreeManager to gracefully fail if APZ is not enabled.
Flags: needinfo?(drobertson)
Attachment #9308274 - Flags: approval-mozilla-esr102?
Attachment #9308273 - Flags: approval-mozilla-esr102?

Comment on attachment 9308273 [details]
Bug 1488886 - Fix static-analysis warnings. r=botond

Approved for 102.8esr.

Attachment #9308273 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
Attachment #9308274 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
You need to log in before you can comment on or make changes to this bug.