Closed Bug 1644254 Opened 7 months ago Closed 7 months ago

Crash in [@ RtlAcquireSRWLockExclusive | mozilla::detail::MutexImpl::lock | mozilla::layers::ImageBridgeParent::ShutdownInternal]

Categories

(Core :: Graphics, defect)

78 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox-esr68 --- unaffected
firefox77 --- unaffected
firefox78 + fixed
firefox79 --- fixed

People

(Reporter: philipp, Assigned: jya)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-ac6b49eb-22fe-47c2-99e2-022eb0200608.

Top 10 frames of crashing thread:

0 ntdll.dll RtlAcquireSRWLockExclusive 
1 mozglue.dll mozilla::detail::MutexImpl::lock mozglue/misc/Mutex_windows.cpp:22
2 xul.dll static mozilla::layers::ImageBridgeParent::ShutdownInternal gfx/layers/ipc/ImageBridgeParent.cpp:119
3 xul.dll mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/gfx/layers/ipc/ImageBridgeParent.cpp:137:7'>::Run xpcom/threads/nsThreadUtils.h:574
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1211
5 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:332
6 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
7 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
8 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:444
9 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399

these cross-platform crashes are regressing in firefox 78. the first affected nightly was build 20200521093657.

Are we trying to take a lock on shutdown after it's already been cleared?

Severity: -- → S3
Flags: needinfo?(kats)

This looks like the same thing as bug 1639131. I already did a bit of investigation there but didn't find any smoking gun. The timing doesn't quite line up but :jya touched some of this threading code in bug 1634253 last month, ni? to see if he has any thoughts on this or spots anything I missed.

Flags: needinfo?(kats) → needinfo?(jyavenard)
See Also: → 1639131

CompositorThreadHolder::Shutdown will only spin an event loop while sFinishedCompositorShutDown is false [1] which will set in the CompositorThreadHolder destructor. And right above we set sCompositorThreadHolder = nullptr; to null [2]
So if there's no other piece holding a reference to the CompositorHolder the destructor will be called pretty early, right after we've dispatched tasks to the CompositorThread that will call ImageBridgeParent::ShutdownInternal() [3]

So we continue the XPCOM shutdown immediately after, which will clear the StaticAutoPtr holding the sImageBridgesLock during the ClearOnShutdown call [4]

[1] https://searchfox.org/mozilla-central/rev/0e09b9191c02097034e46b193930f91c45b7885d/gfx/layers/ipc/CompositorThread.cpp#132
[2] https://searchfox.org/mozilla-central/rev/0e09b9191c02097034e46b193930f91c45b7885d/gfx/layers/ipc/CompositorThread.cpp#127
[3] https://searchfox.org/mozilla-central/rev/0e09b9191c02097034e46b193930f91c45b7885d/gfx/layers/ipc/ImageBridgeParent.cpp#134-138
[4] https://searchfox.org/mozilla-central/rev/0e09b9191c02097034e46b193930f91c45b7885d/gfx/layers/ipc/ImageBridgeParent.cpp#60

Kats, maybe you missed that the blocking of the main thread is only happening while the CompositoThreadHolder is alive; and there could be no other reference than the one in the sCompositorThreadHolder refptr.

The obvious solution would be to have the CompositorThreadHolder StaticRefPtr only be cleared once all tasks on the CompositorThread have been run

Flags: needinfo?(jyavenard)
Assignee: nobody → jyavenard
Duplicate of this bug: 1639131

Ah, interesting. For some reason I assumed that after CompositorThreadHolder::Shutdown returned, the CompositorThreadHolder and in particular mCompositorThread would both be destroyed and so the compositor thread itself would have been terminated, and scheduled tasks wouldn't even run any more. But I guess that's not the case, and the ShutdownInternal call scheduled still gets to run later even though there's no reference to the nsIThread anymore?

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)

But I guess that's not the case, and the ShutdownInternal call scheduled still gets to run later even though there's no reference to the nsIThread anymore?

To answer my own question: looks like other pieces of code might be holding on to the underlying nsIThread, which they can by using static "get current thread" accessors in various places.

Crash Signature: mozilla::detail::MutexImpl::lock | mozilla::layers::ImageBridgeParent::ShutdownInternal] → mozilla::detail::MutexImpl::lock | mozilla::layers::ImageBridgeParent::ShutdownInternal] [@ mozilla::BlockingResourceBase::CheckAcquire()]

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)

Ah, interesting. For some reason I assumed that after CompositorThreadHolder::Shutdown returned, the CompositorThreadHolder and in particular mCompositorThread would both be destroyed and so the compositor thread itself would have been terminated, and scheduled tasks wouldn't even run any more. But I guess that's not the case, and the ShutdownInternal call scheduled still gets to run later even though there's no reference to the nsIThread anymore?

A task dispatched on a thread keeps a reference to that thread.
So there's a reference to the nsIThread until the last task has run.

What makes things more complicated here; is that some code keep a reference to the CompositorThreadHolder; and some others on the actual thread managed by the CompositorThreadHolder.

Imho, the CompositorThreadHolder shouldn't be refcounted. Particularly because the singleton that is used by static methods gets cleared in shutdown.
And so code around that (particularly the APZ and VR code) try to get around that by keeping a reference when they shouldn't.

Accessing the actual thread after CompositorThread::Shutdown has been called is one thing, attempting to keep CompositorThreadHolder alive even after it's been shutdown is unreasonable. It makes the code hard to follow and bugs are easily introduced here.

Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b21ea1dcb21c
Keep a reference to the CompositorThreadHolder until gfx shutdown tasks have run. r=kats
See Also: → 1644883

Comment on attachment 9156493 [details]
Bug 1644254 - Keep a reference to the CompositorThreadHolder until gfx shutdown tasks have run. r?kats

Beta/Release Uplift Approval Request

  • User impact if declined: crash at shutdown
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: Check on the crash report
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): We make sure the compositor thread holder is kept alive until the shutdown tasks dispatched above have all been run.

At a glance, it seems that there's a risk of a hang, but the risk in the previous code was the same.

  • String changes made/needed: none
Attachment #9156493 - Flags: approval-mozilla-beta?
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

Comment on attachment 9156493 [details]
Bug 1644254 - Keep a reference to the CompositorThreadHolder until gfx shutdown tasks have run. r?kats

approved for 78.0b8

Attachment #9156493 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Crash Signature: mozilla::detail::MutexImpl::lock | mozilla::layers::ImageBridgeParent::ShutdownInternal] [@ mozilla::BlockingResourceBase::CheckAcquire()] → mozilla::detail::MutexImpl::lock | mozilla::layers::ImageBridgeParent::ShutdownInternal] [@ mozilla::BlockingResourceBase::CheckAcquire()]
You need to log in before you can comment on or make changes to this bug.