Open Bug 1062533 Opened 7 years ago Updated 5 months ago

replace a bunch of Chromium IPC locks with mozilla::Mutex

Categories

(Core :: IPC, defect, P3)

defect

Tracking

()

People

(Reporter: jaas, Assigned: froydnj)

References

Details

(Keywords: leave-open)

Attachments

(7 files, 4 obsolete files)

9.74 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
3.49 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
2.91 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
3.30 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
2.44 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
6.19 KB, patch
padenot
: review+
Details | Diff | Splinter Review
22.52 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
Attached patch fix v1.0 (obsolete) — Splinter Review
Better to be using our own lock types where we can, and this paves the way toward removing the Chromium lock classes.

The remaining barrier to removing the Chromium lock classes is that some code depends on being able to get the pthread lock backing a higher-level Lock object. That can't be done with mozilla::Mutex right now.
Comment on attachment 8483714 [details] [diff] [review]
fix v1.0

Review of attachment 8483714 [details] [diff] [review]:
-----------------------------------------------------------------

Please check out the static locks in particular, e.g. "static mozilla::Mutex list_lock_;"
Attachment #8483714 - Flags: review?(bent.mozilla)
Attachment #8483714 - Flags: review?(bent.mozilla) → review?(nfroyd)
Comment on attachment 8483714 [details] [diff] [review]
fix v1.0

Review of attachment 8483714 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/base/atomicops_internals_mutex.cc
@@ +9,5 @@
>  
>  namespace base {
>  namespace subtle {
>  
> +mozilla::Mutex gAtomicsMutex("AtomicOps Global");

Bonus points if you nuke all this atomicops stuff in a followup bug.

::: ipc/chromium/src/base/histogram.h
@@ +768,5 @@
>  
>    static HistogramMap* histograms_;
>  
>    // lock protects access to the above map.
> +  static mozilla::Mutex* lock_;

Static mutexes like this (and gAtomicsMutex) should really be mozilla::StaticMutex, with the corresponding StaticMutexAuto* RAII classes.  But I think that can be a followup bug.

::: ipc/chromium/src/base/tracked_objects.cc
@@ +78,5 @@
>  
>  // static
>  ThreadData* ThreadData::first_ = NULL;
>  // static
> +mozilla::Mutex ThreadData::list_lock_("ThreadData List");

StaticMutex here too, in a followup.

::: ipc/chromium/src/base/waitable_event_posix.cc
@@ +143,5 @@
>  bool WaitableEvent::TimedWait(const TimeDelta& max_time) {
>    const TimeTicks end_time(TimeTicks::Now() + max_time);
>    const bool finite_time = max_time.ToInternalValue() >= 0;
>  
> +  kernel_->lock_.Lock();

It's strange that this code doesn't use AutoLock a little more liberally.  (For avoidance of doubt, I think changing that, if we do it at all, should be left for a followup patch.)

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +10,3 @@
>  #include "vcm.h"
>  #include "CSFLog.h"
> +#include "base/histogram.h"

For my own edification, why does this header get moved in the #include order?
Attachment #8483714 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd (:froydnj) from comment #2)

> Bonus points if you nuke all this atomicops stuff in a followup bug.

I already have most of a patch written, just wanted to land this before finishing.

> Static mutexes like this (and gAtomicsMutex) should really be
> mozilla::StaticMutex, with the corresponding StaticMutexAuto* RAII classes. 
> But I think that can be a followup bug.

If you don't mind I'd like to take care of this in a revision of this patch.

> ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
> @@ +10,3 @@
> >  #include "vcm.h"
> >  #include "CSFLog.h"
> > +#include "base/histogram.h"
> 
> For my own edification, why does this header get moved in the #include order?

There is some include order dependency that involves our IPC code, not doing this causes a build failure with the rest of the patch.
(In reply to Josh Aas (Mozilla Corporation) from comment #3)
> (In reply to Nathan Froyd (:froydnj) from comment #2)
> 
> > Bonus points if you nuke all this atomicops stuff in a followup bug.
> 
> I already have most of a patch written, just wanted to land this before
> finishing.

Great!

> > Static mutexes like this (and gAtomicsMutex) should really be
> > mozilla::StaticMutex, with the corresponding StaticMutexAuto* RAII classes. 
> > But I think that can be a followup bug.
> 
> If you don't mind I'd like to take care of this in a revision of this patch.

I think that would be fine.

> > ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
> > @@ +10,3 @@
> > >  #include "vcm.h"
> > >  #include "CSFLog.h"
> > > +#include "base/histogram.h"
> > 
> > For my own edification, why does this header get moved in the #include order?
> 
> There is some include order dependency that involves our IPC code, not doing
> this causes a build failure with the rest of the patch.

Lovely.
Attached patch fix v1.1 (obsolete) — Splinter Review
This updates the patch to trunk, but it has issues. It causes crashes on OS X, and a compilation error on Windows.

https://tbpl.mozilla.org/?tree=Try&rev=b781eea5416e

I think some of the problematic code can just go away, I'll do that first and then re-visit this patch.
Attachment #8483714 - Attachment is obsolete: true
Assignee: jaas → nobody
Priority: -- → P3
I tried updating this patch:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a01daf77c9a100c9b4a37bb5809a2fb2e17e98f

and ran into some OS X leak checking problems.  I also found out retriggering OS X tests from treeherder is broken, so we'd need to do a push with Mutex leak checks tracked, so we can figure out what is leaking the mutex.  But Windows and Linux tests look really good.
The Chrome locks don't seem to be leak checked, so presumably we're just leaking them all of the time. There's no LSan for OSX, so it is possible that we'd just not be noticing these leaks at all.
(In reply to Andrew McCreight [:mccr8] from comment #7)
> The Chrome locks don't seem to be leak checked, so presumably we're just
> leaking them all of the time. There's no LSan for OSX, so it is possible
> that we'd just not be noticing these leaks at all.

Well, the leaks are not particularly problematic...

Apparently the asan test runs that I did in comment 6 didn't include our leak checking, so retriggering on Linux also showed the leak.  Some selective reversion said that it's the ipc_channel_posix changes that were causing leaks:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=284dd8bb8f73701a877a2b559187a4cd3f0e8994

which makes sense, because the PipeMap class in that file is statically allocated, which means that it's going to live until libxul is unloaded, thereby leaking the Mutex it owns.  What puzzles me is that time_win.cc has essentially the same pattern, but we're not seeing leaks on Windows:

https://hg.mozilla.org/try/rev/3a01daf77c9a100c9b4a37bb5809a2fb2e17e98f#l8.24

so either we never call that code, or the tests that I've run happen to not trigger the code.  Either possibility is likely.

The usual route to fix this would be a StaticMutex, but StaticMutex can't be a class member.  So for the PipeMap case, we'd have to do something like:

  static PipeMap& instance()
  {
    static StaticMutex mutex;
    static PipeMap map(mutex);
    return map;
  }

which is a little gross, but maybe that's OK?
And indeed, Windows is showing a leak:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=284dd8bb8f73701a877a2b559187a4cd3f0e8994&selectedJob=207776341

which, dollars to donuts, is probably the time_win.cc changes quoted in comment 8.
Other cases will require some special handling.
Attachment #9020177 - Flags: review?(continuation)
Attachment #8492685 - Attachment is obsolete: true
Chromium IPC locks have this and the API gets used, so we need to expose
it ourselves if we're going to use our locks in place of the Chromium
IPC locks.  This patch changes the mozglue parts; tweaking the xpcom
parts is the next patch.
Attachment #9020180 - Flags: review?(continuation)
This plumbs the code from PlatformMutex up to the xpcom level.
Attachment #9020181 - Flags: review?(continuation)
We are cargo-culting ipc/chromium/ code for this, and ReverbConvolver is
essentially the only outside consumer of Chromium IPC locks at this
point.
Attachment #9020182 - Flags: review?(padenot)
Some of the code here is more complicated than the straight
substitutions from part 1, so it seemed prudent to separate it out into
its own patch.
Attachment #9020183 - Flags: review?(continuation)
We can use XPCOM mutexes instead, which feature niceties like leak
checking and deadlock detection.
Attachment #9020184 - Flags: review?(continuation)
Comment on attachment 9020177 [details] [diff] [review]
part 1 - convert easy cases of chromium IPC locks to mozilla mutexes

Review of attachment 9020177 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/base/histogram.cc
@@ -26,5 @@
>  #define DVLOG(x) CHROMIUM_LOG(ERROR)
>  #define CHECK_GT DCHECK_GT
>  #define CHECK_LT DCHECK_LT
> -typedef ::Lock Lock;
> -typedef ::AutoLock AutoLock;

These are just unused?

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +10,5 @@
>  #include <sstream>
>  #include <vector>
>  
>  #include "CSFLog.h"
> +#include "base/histogram.h"

What is this for?
Attachment #9020177 - Flags: review?(continuation) → review+
Comment on attachment 9020178 [details] [diff] [review]
part 2 - use StaticMutex for PipeMap to workaround the leak checker

Review of attachment 9020178 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc
@@ +154,5 @@
>  
>   private:
> +  PipeMap(mozilla::StaticMutex& aMutex)
> +    : lock_(aMutex)
> +  {}

Could we use a debug only static variable to assert that only one is ever created? I suppose there's no real danger of making more of these given that it is never even defined in a header.
Attachment #9020178 - Flags: review?(continuation) → review+
(In reply to Andrew McCreight [:mccr8] from comment #19)
> Could we use a debug only static variable to assert that only one is ever
> created? I suppose there's no real danger of making more of these given that
> it is never even defined in a header.

Oh I see, because of the way this is set up, we'd still only ever have one instance of this mutex, even if we had more than one of the outer class.
Attachment #9020179 - Flags: review?(continuation) → review+
Comment on attachment 9020180 [details] [diff] [review]
part 4 - expose a tryLock method for PlatformMutex

Review of attachment 9020180 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/misc/Mutex_windows.cpp
@@ +35,5 @@
> +
> +bool
> +mozilla::detail::MutexImpl::mutexTryLock()
> +{
> +  return !!TryAcquireSRWLockExclusive(&platformData()->lock);

If this needs stronger review than "hey, these have the same name" then please get somebody else to review, because I'm unfamiliar with the technical details of these mutexes.

::: mozglue/misc/PlatformMutex.h
@@ +33,5 @@
>  protected:
>    MFBT_API void lock();
>    MFBT_API void unlock();
> +  // We have a separate, forwarding API so internal uses don't have to go
> +  // through the PLT.

What is "the PLT"?
Attachment #9020180 - Flags: review?(continuation) → review+
Attachment #9020181 - Flags: review?(continuation) → review+
Assignee: nobody → nfroyd
Comment on attachment 9020183 [details] [diff] [review]
part 7 - convert chromium ipc locks in waitable_event_posix.{h,cc}

Review of attachment 9020183 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/base/waitable_event_posix.cc
@@ +89,5 @@
>          signaling_event_(NULL) {
>    }
>  
>    bool Fire(WaitableEvent *signaling_event) override {
> +    monitor_->Lock();

Would you mind changing this code to use blocks to instead of this magical indentation throughout this file? I mean like this:
{
  m->Lock();
  ...
  m->Unlock();
}
The ... part doesn't have to change at all.

@@ +99,5 @@
>  
>      if (previous_value)
>        return false;
>  
> +    monitor_->NotifyAll();

Hmm does this really not need to hold the lock when it does the notify?
Attachment #9020183 - Flags: review?(continuation) → review+
Comment on attachment 9020184 [details] [diff] [review]
part 8 - remove chromium ipc lock code

Review of attachment 9020184 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #9020184 - Flags: review?(continuation) → review+
Sadly, it looks like Linux64 debug builds are very unhappy about some timeout issues:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a32604e7c4fb1fb2f8683b71b08abebd20033af&selectedJob=207872952

Curiously, it appears to be *Linux64* only and no other platforms, though Mac debug builds are kind of orange.
The waitable_event_posix changes appear to be responsible.  Taking them out produces a green field as far as the eye can see:

https://bugzilla.mozilla.org/show_bug.cgi?id=1499808

I'm not sure what would specifically induce timeouts, unless something is getting scrambled in translation here:

https://hg.mozilla.org/try/rev/cdb3023fb51042fecda8c6e6faa3c0a6bfeba87f#l2.181

or there are small differences between Chromium's condition variable layer and ours that are causing issues.
Attachment #9020182 - Flags: review?(padenot) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/51cd91552f13
part 1 - convert easy cases of chromium IPC locks to mozilla mutexes; r=mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ec91f0aa186
part 2 - use StaticMutex for PipeMap to workaround the leak checker; r=mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/d3ff5919c6c1
part 3 - use StaticMutex for NowSingleton to work around the leak checker; r=mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/c275c8909c57
part 4 - expose a tryLock method for PlatformMutex; r=mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/5feee7eb8703
part 5 - add {Mutex,Monitor}::TryLock methods; r=mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/785c68522094
part 6 - change ReverbConvolver to use mozilla locking constructs; r=padenot

I landed the first six parts. The waitable_event_posix changes are left to debug at some later point.

Keywords: leave-open
The waitable event issues are being dealt with separately in bug 1524072, where
the proposal is to just remove WaitableEvent entirely.  Once that's done, we
can nuke all the lock code, which this updated commit does.  Carrying over r+.
Attachment #9041196 - Flags: review+
Attachment #9020183 - Attachment is obsolete: true
Attachment #9020184 - Attachment is obsolete: true

The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?

Flags: needinfo?(nfroyd)

This is waiting for bug 1524072 to get figured out, which is on the list of things to do, but way down the list.

Flags: needinfo?(nfroyd)

The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?

Flags: needinfo?(nfroyd)
Flags: needinfo?(nfroyd)

The leave-open keyword is there and there is no activity for 6 months.
:froydnj, maybe it's time to close this bug?

Flags: needinfo?(nfroyd)
Flags: needinfo?(nfroyd)
You need to log in before you can comment on or make changes to this bug.