Crash in [@ shutdownhang | PlatformThread::Join]
Categories
(Core :: Graphics, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox107 | --- | wontfix |
firefox108 | + | fixed |
firefox109 | --- | fixed |
People
(Reporter: aryx, Assigned: bradwerth)
References
(Regression)
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(1 file, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
|
Details | Review |
87 of 268 crash reports with this signature for the last week have mozilla::widget::WinWindowOcclusionTracker::ShutDown()` in their stack. The crash signature is new in Firefox 106 (no indication for it being a signature change due to crash-stats changes).
Brad, because you worked on bug 1798651 could you also investigate this one?
Jens, could you check if anything needs to be done for the crashes with this signature in general?
Crash report: https://crash-stats.mozilla.org/report/index/996b4243-4841-4b64-b585-bc0210221121
MOZ_CRASH Reason: Shutdown hanging at step XPCOMShutdown. Something is blocking the main-thread.
Top 10 frames of crashing thread:
0 ntdll.dll ZwWaitForSingleObject
1 KERNELBASE.dll WaitForSingleObjectEx
2 xul.dll PlatformThread::Join ipc/chromium/src/base/platform_thread_win.cc:76
2 xul.dll base::Thread::Stop ipc/chromium/src/base/thread.cc:124
3 xul.dll mozilla::widget::WinWindowOcclusionTracker::ShutDown widget/windows/WinWindowOcclusionTracker.cpp:388
4 xul.dll gfxPlatform::ShutdownLayersIPC gfx/thebes/gfxPlatform.cpp:1365
5 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:589
6 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:2066
7 xul.dll mozilla::DefaultDelete<ScopedXPCOMStartup>::operator const mfbt/UniquePtr.h:459
7 xul.dll mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::reset mfbt/UniquePtr.h:301
Comment 1•2 years ago
|
||
Brad, looking at https://searchfox.org/mozilla-central/rev/650c19c96529eb28d081062c1ca274bc50ef3635/nsprpub/pr/src/pthreads/ptsynch.c#268 I wonder if we actually ever get a non NS_OK if we time out?
Comment 2•2 years ago
|
||
But the reports I looked at show WinWindowOcclusionCalc
thread just waiting for events?
Comment 3•2 years ago
•
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #0)
Jens, could you check if anything needs to be done for the crashes with this signature in general?
It would be nice if we could bucket them by the first stack frame called by ShutdownXPCOM
. For example here it would be nsComponentManagerImpl::Shutdown()
(ok, we would not know which component, but still), here it'd be gfxPlatform::ShutdownLayersIPC()
(would contain the WinWindowOcclusionTracker case) and so forth.
Hmm, or we could bucket by the frames that actually call base::Thread::Stop()
? That would give us probably the most details and a direct pointer to the thread in question. In the first example that would be D3DVsyncSource::Shutdown()
.
Comment 4•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 20 desktop browser crashes on beta
:bhood, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 5•2 years ago
|
||
BTW, what puzzles me is that we actually see any shutdown hang at step CCPostLastCycleCollection
- with default setting 1 for toolkit.shutdown.fastShutdownStage
we should never get past this line in ShutdownXPCOM()
. Do people change this setting such often?
![]() |
Reporter | |
Comment 6•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #3)
It would be nice if we could bucket them by the first stack frame called by
ShutdownXPCOM
. For example here it would bensComponentManagerImpl::Shutdown()
(ok, we would not know which component, but still), here it'd begfxPlatform::ShutdownLayersIPC()
(would contain the WinWindowOcclusionTracker case) and so forth.
Will, is this feasible?
Comment 7•2 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #6)
(In reply to Jens Stutte [:jstutte] from comment #3)
It would be nice if we could bucket them by the first stack frame called by
ShutdownXPCOM
. For example here it would bensComponentManagerImpl::Shutdown()
(ok, we would not know which component, but still), here it'd begfxPlatform::ShutdownLayersIPC()
(would contain the WinWindowOcclusionTracker case) and so forth.Will, is this feasible?
Signature generation can't do this currently--it iterates over the frames and doesn't have the ability to look ahead.
We could add more frames to the prefix list. For 712687a3-aaf0-4130-9638-474420221122, the frames are:
- PlatformThread::Join
- base::Thread::Stop
- D3DVsyncSource::Shutdown
- gfxPlatform::Shutdown
- nsLayoutModuleDtor
- nsComponentManagerImpl::Shutdown
- mozilla::ShutdownXPCOM
So we could add PlatformThread::Join
, base::Thread::Stop
, D3DVsyncSource::Shutdown
, gfxPlatform::Shutdown
, and nsLayoutModuleDtor
to the prefix list and that would get:
app@socorro:/app$ socorro-cmd signature 712687a3-aaf0-4130-9638-474420221122
Crash id: 712687a3-aaf0-4130-9638-474420221122
Original: shutdownhang | PlatformThread::Join
New: shutdownhang | PlatformThread::Join | base::Thread::Stop | D3DVsyncSource::Shutdown | gfxPlatform::Shutdown | nsLayoutModuleDtor | nsComponentManagerImpl::Shutdown
Same?: False
Does that work here?
If not, we should write up a bug for this in Socorro :: Signature and Gabriele and I can scheme. Maybe we turn shutdownhang into more of a special case with a different signature generation algorithm?
Comment 8•2 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #7)
Does that work here?
If not, we should write up a bug for this in Socorro :: Signature and Gabriele and I can scheme. Maybe we turn shutdownhang into more of a special case with a different signature generation algorithm?
I think we can just prefix up to base::Thread::Stop
and keep the remaining detail. Actually as I think of it it seems to be the better approach, pointing directly to the thread in question. Thanks!
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 9•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #1)
Brad, looking at https://searchfox.org/mozilla-central/rev/650c19c96529eb28d081062c1ca274bc50ef3635/nsprpub/pr/src/pthreads/ptsynch.c#268 I wonder if we actually ever get a non NS_OK if we time out?
This is the root cause. ReentrantMonitor
calls down to pt_TimedWait
, which elides the return value. Unfortunately, this means the changes in D161195 need to be undone since the new timed wait function will return NS_OK even in timeout. One potential fix is to build an alternative to SynchronousTask
which instead uses Monitor
. I'll pursue that.
Assignee | ||
Comment 10•2 years ago
|
||
ReentrantMonitor (used by SynchronousTask) calls through to pt_TimedWait
which elides timeout information. This patch creates a workalike
NonReentrantTimedSynchronousTask which instead uses Monitor. It also
changes the Windows thread Shutdown methods to use this new task.
Updated•2 years ago
|
Assignee | ||
Comment 11•2 years ago
|
||
Updated•2 years ago
|
Comment 12•2 years ago
|
||
Updated•2 years ago
|
![]() |
||
Comment 13•2 years ago
|
||
Brad, is this something we might uplift to 108? If not, please update the tracking flag to wontfix. Thanks!
Updated•2 years ago
|
![]() |
||
Updated•2 years ago
|
Assignee | ||
Comment 14•2 years ago
|
||
Comment on attachment 9304956 [details]
Bug 1801819: Make SynchronousTask double-check its completion flag to identify timeouts.
Beta/Release Uplift Approval Request
- User impact if declined: In unusual conditions, Windows clients may crash on shutdown.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce: Unknown conditions to reproduce.
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This enables an already-landed less-risky path for thread shutdown that may leak memory but definitely won't crash.
- String changes made/needed:
- Is Android affected?: No
Assignee | ||
Updated•2 years ago
|
Comment 15•2 years ago
|
||
bugherder |
Comment 16•2 years ago
|
||
Comment on attachment 9304956 [details]
Bug 1801819: Make SynchronousTask double-check its completion flag to identify timeouts.
Approved for 108.0b9
Comment 17•2 years ago
|
||
bugherder uplift |
Description
•