Intermittent usercss/usercss-moz-document.html | application terminated with exit code 11 and "mozilla::detail::MutexImpl::~MutexImpl: pthread_mutex_destroy failed: Device or resource busy"

RESOLVED FIXED in Firefox 60

Status

()

defect
P5
normal
RESOLVED FIXED
Last year
Last year

People

(Reporter: intermittent-bug-filer, Assigned: sotaro)

Tracking

({intermittent-failure})

unspecified
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox60 fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

The log's first symptom of an issue here is:
 "mozilla::detail::MutexImpl::~MutexImpl: pthread_mutex_destroy failed: Device or resource busy"
Summary: Intermittent usercss/usercss-moz-document.html | application terminated with exit code 11 → Intermittent usercss/usercss-moz-document.html | application terminated with exit code 11 and "mozilla::detail::MutexImpl::~MutexImpl: pthread_mutex_destroy failed: Device or resource busy"
Depends on: 1442730
Assignee: nobody → sotaro.ikeda.g
Attachment #8956327 - Attachment is obsolete: true
Attachment #8956332 - Attachment description: patch - Use lock for accessing mCompositorVsyncDispatcher → patch - Protect mCompositorVsyncDispatcher by lock
Attachment #8956354 - Flags: review?(bugmail)
Comment on attachment 8956354 [details] [diff] [review]
patch - Protect mCompositorVsyncDispatcher by lock

Review of attachment 8956354 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see how this will help. The CompositorVsyncDispatcher implementation uses threadsafe refcounting [1] and the functions you're adding the lock to are all using proper RefPtr handling, as far as I can tell, so we should never end up with dangling pointers. In other words, I think that no matter when nsBaseWidget::GetCompositorVsyncDispatcher() is called, it should either return a strong reference to a valid CompositorVsyncDispatcher, or it should return a null pointer.

[1] https://searchfox.org/mozilla-central/rev/bffd3e0225b65943364be721881470590b9377c1/widget/VsyncDispatcher.h#48
Attachment #8956354 - Flags: review?(bugmail) → review-
attachment 8956354 [details] [diff] [review] should help to address the problem. attachment 8956354 [details] [diff] [review] protects for accessing mCompositorVsyncDispatcher.

"Threadsafe refcounting" makes mRefCnt of refcounted object to be thread safe. It does not make RefPtr<CompositorVsyncDispatcher> to be thread safe. RefPtr::mRawPtr is not protected atomically.
Therefore if mCompositorVsyncDispatcher is update on main thread and its RefPtr is copied on Compositor Thread, mCompositorVsyncDispatcher needs to be protected by lock.
attachment 8956731 [details] [diff] [review] is a reduced test case by using gtest. By running the test, I could reproduced the similar crash. The gtest log was like the following.

--------------------------------------------------------

Running GTest tests...
Note: Google Test filter = RefPtr.*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from RefPtr
[ RUN      ] RefPtr.MultiThreadedTest
mozilla::detail::MutexImpl::lock: pthread_mutex_lock failed: Invalid argument
ExceptionHandler::GenerateDump cloned child 16556
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
With attachment 8956354 [details] [diff] [review], I did not saw the crash like  Comment 11.
I run attachment 8956731 [details] [diff] [review] on linux for testing with "./mach gtest RefPtr.*"
Component: Layout → Graphics
:kats, can you comment to Comment 8?
Flags: needinfo?(bugmail)
Aha, you're right. Thanks for making the gtest, that helped me understand the problem better!

Just to state my understanding: GetCompositorVsyncDispatcher() [1] runs dispatcher.assign_with_AddRef(mCompositorVsyncDispatcher.mRawPtr) as one of the steps in creating the RefPtr copy. And DestroyCompositor() [2] runs mCompositorVsyncDispatcher.assign_assuming_AddRef(nullptr) when it nulls out mCompositorVsyncDispatcher. If the first call gets interrupted before it actually executes any of the function, say at [3], and the second call runs at that point, it can release and free the object whose pointer was passed to the first call. So when the first call resumes it continues with what is now a bad pointer, on which it does an AddRef. But that doesn't change the fact that the object is already gone. So with this interleaving you can end up with a RefPtr that has a garbage mRawPtr in this scenario, and that would result in the mutex locking failures we've been seeing.

[1] https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/widget/nsBaseWidget.cpp#1277
[2] https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/widget/nsBaseWidget.cpp#267
[3] https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/mfbt/RefPtr.h#53
Flags: needinfo?(bugmail)
Comment on attachment 8956354 [details] [diff] [review]
patch - Protect mCompositorVsyncDispatcher by lock

Review of attachment 8956354 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this patch should do the job then

::: widget/nsBaseWidget.h
@@ +7,5 @@
>  
>  #include "InputData.h"
>  #include "mozilla/EventForwards.h"
>  #include "mozilla/RefPtr.h"
> +#include "mozilla/Mutex.h"

Move this include up one line to keep it alphabetical
Attachment #8956354 - Flags: review- → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/76a8f710da41
Protect mCompositorVsyncDispatcher by lock. r=kats
^ Went ahead and landed it, with the change in comment 16 plus fixing a typo in the commit message.
Although... is there any particular reason you used a UniquePtr instead of just keeping the mutex as an instance variable on nsBaseWidget?
https://hg.mozilla.org/mozilla-central/rev/76a8f710da41
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
> Although... is there any particular reason you used a UniquePtr instead of
> just keeping the mutex as an instance variable on nsBaseWidget?

It is for reducing Mutex instances.

nsBaseWidget needs the Mutex only when the nsBaseWidget owns CompositorSession. nsBaseWidget is ancestor of PuppetWidget and nsWindow. PuppetWidget does not own CompositorSession, then the Mutex is not necessary. nsWindow not always own CompositorSession. Only nsWindow created by kCChildCID has a change to own CompositorSession like the following.

  https://github.com/sotaroikeda/firefox-diagrams/blob/master/widget/widget_nsWindow_FirefoxOS_2_2.pdf
You need to log in before you can comment on or make changes to this bug.