replace a bunch of Chromium IPC locks with mozilla::Mutex
Categories
(Core :: IPC, defect, P3)
Tracking
()
People
(Reporter: jaas, Assigned: froydnj)
References
Details
(Keywords: leave-open)
Attachments
(7 files, 4 obsolete files)
9.74 KB,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
3.49 KB,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
2.91 KB,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
3.30 KB,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
2.44 KB,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
6.19 KB,
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
22.52 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Better to be using our own lock types where we can, and this paves the way toward removing the Chromium lock classes. The remaining barrier to removing the Chromium lock classes is that some code depends on being able to get the pthread lock backing a higher-level Lock object. That can't be done with mozilla::Mutex right now.
Comment on attachment 8483714 [details] [diff] [review] fix v1.0 Review of attachment 8483714 [details] [diff] [review]: ----------------------------------------------------------------- Please check out the static locks in particular, e.g. "static mozilla::Mutex list_lock_;"
![]() |
Assignee | |
Comment 2•6 years ago
|
||
Comment on attachment 8483714 [details] [diff] [review] fix v1.0 Review of attachment 8483714 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/base/atomicops_internals_mutex.cc @@ +9,5 @@ > > namespace base { > namespace subtle { > > +mozilla::Mutex gAtomicsMutex("AtomicOps Global"); Bonus points if you nuke all this atomicops stuff in a followup bug. ::: ipc/chromium/src/base/histogram.h @@ +768,5 @@ > > static HistogramMap* histograms_; > > // lock protects access to the above map. > + static mozilla::Mutex* lock_; Static mutexes like this (and gAtomicsMutex) should really be mozilla::StaticMutex, with the corresponding StaticMutexAuto* RAII classes. But I think that can be a followup bug. ::: ipc/chromium/src/base/tracked_objects.cc @@ +78,5 @@ > > // static > ThreadData* ThreadData::first_ = NULL; > // static > +mozilla::Mutex ThreadData::list_lock_("ThreadData List"); StaticMutex here too, in a followup. ::: ipc/chromium/src/base/waitable_event_posix.cc @@ +143,5 @@ > bool WaitableEvent::TimedWait(const TimeDelta& max_time) { > const TimeTicks end_time(TimeTicks::Now() + max_time); > const bool finite_time = max_time.ToInternalValue() >= 0; > > + kernel_->lock_.Lock(); It's strange that this code doesn't use AutoLock a little more liberally. (For avoidance of doubt, I think changing that, if we do it at all, should be left for a followup patch.) ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp @@ +10,3 @@ > #include "vcm.h" > #include "CSFLog.h" > +#include "base/histogram.h" For my own edification, why does this header get moved in the #include order?
(In reply to Nathan Froyd (:froydnj) from comment #2) > Bonus points if you nuke all this atomicops stuff in a followup bug. I already have most of a patch written, just wanted to land this before finishing. > Static mutexes like this (and gAtomicsMutex) should really be > mozilla::StaticMutex, with the corresponding StaticMutexAuto* RAII classes. > But I think that can be a followup bug. If you don't mind I'd like to take care of this in a revision of this patch. > ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp > @@ +10,3 @@ > > #include "vcm.h" > > #include "CSFLog.h" > > +#include "base/histogram.h" > > For my own edification, why does this header get moved in the #include order? There is some include order dependency that involves our IPC code, not doing this causes a build failure with the rest of the patch.
![]() |
Assignee | |
Comment 4•6 years ago
|
||
(In reply to Josh Aas (Mozilla Corporation) from comment #3) > (In reply to Nathan Froyd (:froydnj) from comment #2) > > > Bonus points if you nuke all this atomicops stuff in a followup bug. > > I already have most of a patch written, just wanted to land this before > finishing. Great! > > Static mutexes like this (and gAtomicsMutex) should really be > > mozilla::StaticMutex, with the corresponding StaticMutexAuto* RAII classes. > > But I think that can be a followup bug. > > If you don't mind I'd like to take care of this in a revision of this patch. I think that would be fine. > > ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp > > @@ +10,3 @@ > > > #include "vcm.h" > > > #include "CSFLog.h" > > > +#include "base/histogram.h" > > > > For my own edification, why does this header get moved in the #include order? > > There is some include order dependency that involves our IPC code, not doing > this causes a build failure with the rest of the patch. Lovely.
This updates the patch to trunk, but it has issues. It causes crashes on OS X, and a compilation error on Windows. https://tbpl.mozilla.org/?tree=Try&rev=b781eea5416e I think some of the problematic code can just go away, I'll do that first and then re-visit this patch.
![]() |
||
Updated•4 years ago
|
![]() |
Assignee | |
Comment 6•2 years ago
|
||
I tried updating this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a01daf77c9a100c9b4a37bb5809a2fb2e17e98f and ran into some OS X leak checking problems. I also found out retriggering OS X tests from treeherder is broken, so we'd need to do a push with Mutex leak checks tracked, so we can figure out what is leaking the mutex. But Windows and Linux tests look really good.
Comment 7•2 years ago
|
||
The Chrome locks don't seem to be leak checked, so presumably we're just leaking them all of the time. There's no LSan for OSX, so it is possible that we'd just not be noticing these leaks at all.
![]() |
Assignee | |
Comment 8•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #7) > The Chrome locks don't seem to be leak checked, so presumably we're just > leaking them all of the time. There's no LSan for OSX, so it is possible > that we'd just not be noticing these leaks at all. Well, the leaks are not particularly problematic... Apparently the asan test runs that I did in comment 6 didn't include our leak checking, so retriggering on Linux also showed the leak. Some selective reversion said that it's the ipc_channel_posix changes that were causing leaks: https://treeherder.mozilla.org/#/jobs?repo=try&revision=284dd8bb8f73701a877a2b559187a4cd3f0e8994 which makes sense, because the PipeMap class in that file is statically allocated, which means that it's going to live until libxul is unloaded, thereby leaking the Mutex it owns. What puzzles me is that time_win.cc has essentially the same pattern, but we're not seeing leaks on Windows: https://hg.mozilla.org/try/rev/3a01daf77c9a100c9b4a37bb5809a2fb2e17e98f#l8.24 so either we never call that code, or the tests that I've run happen to not trigger the code. Either possibility is likely. The usual route to fix this would be a StaticMutex, but StaticMutex can't be a class member. So for the PipeMap case, we'd have to do something like: static PipeMap& instance() { static StaticMutex mutex; static PipeMap map(mutex); return map; } which is a little gross, but maybe that's OK?
![]() |
Assignee | |
Comment 9•2 years ago
|
||
And indeed, Windows is showing a leak: https://treeherder.mozilla.org/#/jobs?repo=try&revision=284dd8bb8f73701a877a2b559187a4cd3f0e8994&selectedJob=207776341 which, dollars to donuts, is probably the time_win.cc changes quoted in comment 8.
![]() |
Assignee | |
Comment 10•2 years ago
|
||
Other cases will require some special handling.
![]() |
Assignee | |
Updated•2 years ago
|
![]() |
Assignee | |
Comment 11•2 years ago
|
||
![]() |
Assignee | |
Comment 12•2 years ago
|
||
![]() |
Assignee | |
Comment 13•2 years ago
|
||
Chromium IPC locks have this and the API gets used, so we need to expose it ourselves if we're going to use our locks in place of the Chromium IPC locks. This patch changes the mozglue parts; tweaking the xpcom parts is the next patch.
![]() |
Assignee | |
Comment 14•2 years ago
|
||
This plumbs the code from PlatformMutex up to the xpcom level.
![]() |
Assignee | |
Comment 15•2 years ago
|
||
We are cargo-culting ipc/chromium/ code for this, and ReverbConvolver is essentially the only outside consumer of Chromium IPC locks at this point.
![]() |
Assignee | |
Comment 16•2 years ago
|
||
Some of the code here is more complicated than the straight substitutions from part 1, so it seemed prudent to separate it out into its own patch.
![]() |
Assignee | |
Comment 17•2 years ago
|
||
We can use XPCOM mutexes instead, which feature niceties like leak checking and deadlock detection.
Comment 18•2 years ago
|
||
Comment on attachment 9020177 [details] [diff] [review] part 1 - convert easy cases of chromium IPC locks to mozilla mutexes Review of attachment 9020177 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/base/histogram.cc @@ -26,5 @@ > #define DVLOG(x) CHROMIUM_LOG(ERROR) > #define CHECK_GT DCHECK_GT > #define CHECK_LT DCHECK_LT > -typedef ::Lock Lock; > -typedef ::AutoLock AutoLock; These are just unused? ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp @@ +10,5 @@ > #include <sstream> > #include <vector> > > #include "CSFLog.h" > +#include "base/histogram.h" What is this for?
Comment 19•2 years ago
|
||
Comment on attachment 9020178 [details] [diff] [review] part 2 - use StaticMutex for PipeMap to workaround the leak checker Review of attachment 9020178 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc @@ +154,5 @@ > > private: > + PipeMap(mozilla::StaticMutex& aMutex) > + : lock_(aMutex) > + {} Could we use a debug only static variable to assert that only one is ever created? I suppose there's no real danger of making more of these given that it is never even defined in a header.
Comment 20•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #19) > Could we use a debug only static variable to assert that only one is ever > created? I suppose there's no real danger of making more of these given that > it is never even defined in a header. Oh I see, because of the way this is set up, we'd still only ever have one instance of this mutex, even if we had more than one of the outer class.
Updated•2 years ago
|
Comment 21•2 years ago
|
||
Comment on attachment 9020180 [details] [diff] [review] part 4 - expose a tryLock method for PlatformMutex Review of attachment 9020180 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/misc/Mutex_windows.cpp @@ +35,5 @@ > + > +bool > +mozilla::detail::MutexImpl::mutexTryLock() > +{ > + return !!TryAcquireSRWLockExclusive(&platformData()->lock); If this needs stronger review than "hey, these have the same name" then please get somebody else to review, because I'm unfamiliar with the technical details of these mutexes. ::: mozglue/misc/PlatformMutex.h @@ +33,5 @@ > protected: > MFBT_API void lock(); > MFBT_API void unlock(); > + // We have a separate, forwarding API so internal uses don't have to go > + // through the PLT. What is "the PLT"?
Updated•2 years ago
|
Updated•2 years ago
|
Comment 22•2 years ago
|
||
Comment on attachment 9020183 [details] [diff] [review] part 7 - convert chromium ipc locks in waitable_event_posix.{h,cc} Review of attachment 9020183 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/base/waitable_event_posix.cc @@ +89,5 @@ > signaling_event_(NULL) { > } > > bool Fire(WaitableEvent *signaling_event) override { > + monitor_->Lock(); Would you mind changing this code to use blocks to instead of this magical indentation throughout this file? I mean like this: { m->Lock(); ... m->Unlock(); } The ... part doesn't have to change at all. @@ +99,5 @@ > > if (previous_value) > return false; > > + monitor_->NotifyAll(); Hmm does this really not need to hold the lock when it does the notify?
Comment 23•2 years ago
|
||
Comment on attachment 9020184 [details] [diff] [review] part 8 - remove chromium ipc lock code Review of attachment 9020184 [details] [diff] [review]: ----------------------------------------------------------------- Nice!
![]() |
Assignee | |
Comment 24•2 years ago
|
||
Sadly, it looks like Linux64 debug builds are very unhappy about some timeout issues: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a32604e7c4fb1fb2f8683b71b08abebd20033af&selectedJob=207872952 Curiously, it appears to be *Linux64* only and no other platforms, though Mac debug builds are kind of orange.
![]() |
Assignee | |
Comment 25•2 years ago
|
||
The waitable_event_posix changes appear to be responsible. Taking them out produces a green field as far as the eye can see: https://bugzilla.mozilla.org/show_bug.cgi?id=1499808 I'm not sure what would specifically induce timeouts, unless something is getting scrambled in translation here: https://hg.mozilla.org/try/rev/cdb3023fb51042fecda8c6e6faa3c0a6bfeba87f#l2.181 or there are small differences between Chromium's condition variable layer and ours that are causing issues.
Updated•2 years ago
|
Comment 26•2 years ago
|
||
Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/51cd91552f13 part 1 - convert easy cases of chromium IPC locks to mozilla mutexes; r=mccr8 https://hg.mozilla.org/integration/mozilla-inbound/rev/3ec91f0aa186 part 2 - use StaticMutex for PipeMap to workaround the leak checker; r=mccr8 https://hg.mozilla.org/integration/mozilla-inbound/rev/d3ff5919c6c1 part 3 - use StaticMutex for NowSingleton to work around the leak checker; r=mccr8 https://hg.mozilla.org/integration/mozilla-inbound/rev/c275c8909c57 part 4 - expose a tryLock method for PlatformMutex; r=mccr8 https://hg.mozilla.org/integration/mozilla-inbound/rev/5feee7eb8703 part 5 - add {Mutex,Monitor}::TryLock methods; r=mccr8 https://hg.mozilla.org/integration/mozilla-inbound/rev/785c68522094 part 6 - change ReverbConvolver to use mozilla locking constructs; r=padenot
![]() |
Assignee | |
Comment 27•2 years ago
|
||
I landed the first six parts. The waitable_event_posix changes are left to debug at some later point.
Comment 28•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/51cd91552f13
https://hg.mozilla.org/mozilla-central/rev/3ec91f0aa186
https://hg.mozilla.org/mozilla-central/rev/d3ff5919c6c1
https://hg.mozilla.org/mozilla-central/rev/c275c8909c57
https://hg.mozilla.org/mozilla-central/rev/5feee7eb8703
https://hg.mozilla.org/mozilla-central/rev/785c68522094
![]() |
Assignee | |
Comment 29•2 years ago
|
||
The waitable event issues are being dealt with separately in bug 1524072, where the proposal is to just remove WaitableEvent entirely. Once that's done, we can nuke all the lock code, which this updated commit does. Carrying over r+.
![]() |
Assignee | |
Updated•2 years ago
|
The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?
![]() |
Assignee | |
Comment 31•2 years ago
|
||
This is waiting for bug 1524072 to get figured out, which is on the list of things to do, but way down the list.
The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?
![]() |
Assignee | |
Updated•11 months ago
|
The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?
![]() |
Assignee | |
Updated•5 months ago
|
Description
•