Closed Bug 1679368 Opened 3 years ago Closed 3 years ago

Crash in [@ mozilla::detail::MutexImpl::~MutexImpl | nsChildView::~nsChildView]

Categories

(Core :: Graphics: WebRender, defect)

Unspecified
macOS
defect

Tracking

()

RESOLVED DUPLICATE of bug 1700848
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- affected
firefox84 --- affected
firefox85 --- affected

People

(Reporter: aryx, Assigned: bradwerth)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Hit release builds for the first time with 83.0 (crash stats has 52 crashes stored for that version). Related to bug 1665411?

Crash report: https://crash-stats.mozilla.org/report/index/d590969d-36e5-4411-9437-446e80201126

MOZ_CRASH Reason: MOZ_CRASH(mozilla::detail::MutexImpl::~MutexImpl: pthread_mutex_destroy failed)

Top 10 frames of crashing thread:

0 libmozglue.dylib mozilla::detail::MutexImpl::~MutexImpl mozglue/misc/Mutex_posix.cpp:90
1 XUL nsChildView::~nsChildView widget/cocoa/nsChildView.mm:274
2 XUL nsChildView::~nsChildView widget/cocoa/nsChildView.mm:241
3 XUL nsBaseWidget::Release widget/nsBaseWidget.cpp:146
4 XUL mozilla::detail::RunnableMethodImpl<nsChildView*, void  xpcom/threads/nsThreadUtils.h:1215
5 XUL XUL@0xeca68f 
6 XUL mozilla::CancelableRunnable::Release xpcom/threads/nsThreadUtils.cpp:90
7 XUL XUL@0x3fce9f 
8 XUL mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:515
9 XUL NS_HasPendingEvents xpcom/threads/nsThreadUtils.cpp:491
Flags: needinfo?(gsvelto)

I already saw these kind of crashes in the past so it's likely that bug 1665411 just made it more visible. Unfortunately we don't record the return value for the error in the crash reason which makes it harder to debug this. Looking at Apple documentation pthread_mutex_destroy() can only fail with EINVALID - if you pass it something that's not a mutex - or EBUSY - if the mutex you're trying to destroy is still held by another thread. My guess is that this is the latter, let me check the affected code.

Flags: needinfo?(gsvelto)

It looks like EBUSY is indeed the cause here. Have a look at this crash: https://crash-stats.mozilla.org/report/index/d590969d-36e5-4411-9437-446e80201126

Thread 13 is in this call to RenderThread::UpdateAndRender(). Thread 22 is also reacting to some compositor IPC call. I noticed that nsChildView::PreRender() acquires the affected lock and is called by compositor code.

Markus is it possible that we're invoking nsChildView's destructor in between a call to nsChildView::PreRender() and nsChildView::PostRender()? That would explain why pthread_mutex_destroy() is failing.

Flags: needinfo?(mstange.moz)

Interesting, yes, that sounds like it's probably what's happening.

Initially I thought we could address this with more locking in the destructor. However, if ~nsChildView can run between PreRender and PostRender, that means that PostRender will be called on a dangling pointer. So we actually have a bigger problem here.

We need to make sure that the widget outlives any compositing that happens on it. And we also need to make sure that the widget's destructor runs on the main thread.

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #0)

Hit release builds for the first time with 83.0 (crash stats has 52 crashes stored for that version). Related to bug 1665411?

83 is the first version that has WebRender enabled by default on macOS. RenderThread::UpdateAndRender() is part of WebRender.

Blocks: wr-mac
Flags: needinfo?(mstange.moz)

Moving over to WebRender to prioritize appropriately.

Component: Widget: Cocoa → Graphics: WebRender
Severity: -- → S3

One option would be to make a new threadsafe-refcounted object that both the InProcessCompositorWidget and nsChildView hold on to, and move the lock and any other needed data into that object. That object would not have a pointer back to the nsChildView.

Or maybe InProcessCompositorWidget should be that object. So then the change would be to make nsChildView supply InProcessCompositorWidget with all the needed data (widget size etc.) and notify it whenever that data changes, so that InProcessCompositorWidget no longer needs a pointer to nsChildView.
And then get that change to build on all platforms.

Blocks: 1674142

(In reply to Markus Stange [:mstange] from comment #8)

Or maybe InProcessCompositorWidget should be that object. So then the change would be to make nsChildView supply InProcessCompositorWidget with all the needed data (widget size etc.) and notify it whenever that data changes, so that InProcessCompositorWidget no longer needs a pointer to nsChildView.
And then get that change to build on all platforms.

I'll try to implement this.

Assignee: nobody → bwerth

(In reply to Brad Werth [:bradwerth] from comment #9)

(In reply to Markus Stange [:mstange] from comment #8)

Or maybe InProcessCompositorWidget should be that object. So then the change would be to make nsChildView supply InProcessCompositorWidget with all the needed data (widget size etc.) and notify it whenever that data changes, so that InProcessCompositorWidget no longer needs a pointer to nsChildView.
And then get that change to build on all platforms.

I'll try to implement this.

crashes died out at same time/rate as bug 1674142. So we're done here?

Flags: needinfo?(bwerth)

(In reply to Wayne Mery (:wsmwk) from comment #10)

crashes died out at same time/rate as bug 1674142. So we're done here?

Yes, thanks for following up. Bug 1700848 was the one that resolved this.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwerth)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.