Filed by: aiakab [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=164551776&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/JydRIJBjQSq7A5_vpZmZVA/runs/0/artifacts/public/logs/live_backing.log https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/JydRIJBjQSq7A5_vpZmZVA/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
The log's first symptom of an issue here is: "mozilla::detail::MutexImpl::~MutexImpl: pthread_mutex_destroy failed: Device or resource busy"
Summary: Intermittent usercss/usercss-moz-document.html | application terminated with exit code 11 → Intermittent usercss/usercss-moz-document.html | application terminated with exit code 11 and "mozilla::detail::MutexImpl::~MutexImpl: pthread_mutex_destroy failed: Device or resource busy"
Attachment #8956332 - Attachment description: patch - Use lock for accessing mCompositorVsyncDispatcher → patch - Protect mCompositorVsyncDispatcher by lock
Comment on attachment 8956354 [details] [diff] [review] patch - Protect mCompositorVsyncDispatcher by lock Review of attachment 8956354 [details] [diff] [review]: ----------------------------------------------------------------- I don't see how this will help. The CompositorVsyncDispatcher implementation uses threadsafe refcounting  and the functions you're adding the lock to are all using proper RefPtr handling, as far as I can tell, so we should never end up with dangling pointers. In other words, I think that no matter when nsBaseWidget::GetCompositorVsyncDispatcher() is called, it should either return a strong reference to a valid CompositorVsyncDispatcher, or it should return a null pointer.  https://searchfox.org/mozilla-central/rev/bffd3e0225b65943364be721881470590b9377c1/widget/VsyncDispatcher.h#48
Attachment #8956354 - Flags: review?(bugmail) → review-
attachment 8956354 [details] [diff] [review] should help to address the problem. attachment 8956354 [details] [diff] [review] protects for accessing mCompositorVsyncDispatcher. "Threadsafe refcounting" makes mRefCnt of refcounted object to be thread safe. It does not make RefPtr<CompositorVsyncDispatcher> to be thread safe. RefPtr::mRawPtr is not protected atomically.
Therefore if mCompositorVsyncDispatcher is update on main thread and its RefPtr is copied on Compositor Thread, mCompositorVsyncDispatcher needs to be protected by lock.
attachment 8956731 [details] [diff] [review] is a reduced test case by using gtest. By running the test, I could reproduced the similar crash. The gtest log was like the following. -------------------------------------------------------- Running GTest tests... Note: Google Test filter = RefPtr.* [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from RefPtr [ RUN ] RefPtr.MultiThreadedTest mozilla::detail::MutexImpl::lock: pthread_mutex_lock failed: Invalid argument ExceptionHandler::GenerateDump cloned child 16556 ExceptionHandler::SendContinueSignalToChild sent continue signal to child ExceptionHandler::WaitForContinueSignal waiting for continue signal...
I run attachment 8956731 [details] [diff] [review] on linux for testing with "./mach gtest RefPtr.*"
:kats, can you comment to Comment 8?
Aha, you're right. Thanks for making the gtest, that helped me understand the problem better! Just to state my understanding: GetCompositorVsyncDispatcher()  runs dispatcher.assign_with_AddRef(mCompositorVsyncDispatcher.mRawPtr) as one of the steps in creating the RefPtr copy. And DestroyCompositor()  runs mCompositorVsyncDispatcher.assign_assuming_AddRef(nullptr) when it nulls out mCompositorVsyncDispatcher. If the first call gets interrupted before it actually executes any of the function, say at , and the second call runs at that point, it can release and free the object whose pointer was passed to the first call. So when the first call resumes it continues with what is now a bad pointer, on which it does an AddRef. But that doesn't change the fact that the object is already gone. So with this interleaving you can end up with a RefPtr that has a garbage mRawPtr in this scenario, and that would result in the mutex locking failures we've been seeing.  https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/widget/nsBaseWidget.cpp#1277  https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/widget/nsBaseWidget.cpp#267  https://searchfox.org/mozilla-central/rev/efce4ceddfbe5fe4e2d74f1aed93894bada9b273/mfbt/RefPtr.h#53
Comment on attachment 8956354 [details] [diff] [review] patch - Protect mCompositorVsyncDispatcher by lock Review of attachment 8956354 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this patch should do the job then ::: widget/nsBaseWidget.h @@ +7,5 @@ > > #include "InputData.h" > #include "mozilla/EventForwards.h" > #include "mozilla/RefPtr.h" > +#include "mozilla/Mutex.h" Move this include up one line to keep it alphabetical
Attachment #8956354 - Flags: review- → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/76a8f710da41 Protect mCompositorVsyncDispatcher by lock. r=kats
^ Went ahead and landed it, with the change in comment 16 plus fixing a typo in the commit message.
Although... is there any particular reason you used a UniquePtr instead of just keeping the mutex as an instance variable on nsBaseWidget?
(In reply to Kartikaya Gupta (email:firstname.lastname@example.org) from comment #19) > Although... is there any particular reason you used a UniquePtr instead of > just keeping the mutex as an instance variable on nsBaseWidget? It is for reducing Mutex instances. nsBaseWidget needs the Mutex only when the nsBaseWidget owns CompositorSession. nsBaseWidget is ancestor of PuppetWidget and nsWindow. PuppetWidget does not own CompositorSession, then the Mutex is not necessary. nsWindow not always own CompositorSession. Only nsWindow created by kCChildCID has a change to own CompositorSession like the following. https://github.com/sotaroikeda/firefox-diagrams/blob/master/widget/widget_nsWindow_FirefoxOS_2_2.pdf
Makes sense, thanks!
You need to log in before you can comment on or make changes to this bug.