Use a lock to protect the list of top-level actors

RESOLVED FIXED in Firefox 40

Status

()

Core
IPC
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: billm, Assigned: billm)

Tracking

unspecified
mozilla40
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 fixed)

Details

Attachments

(2 attachments, 4 obsolete attachments)

Created attachment 8549150 [details] [diff] [review]
top-level-proto-lock

In bug 1057908, we want to split the GMP protocol into PGMPChrome and PGMPContent, both of which would be top-level actors. In single-process Firefox, the main process would use both actors. We'd create the PGMPChrome actor when starting the process and then use that to open PGMPContent. Both actors would run on the GMP thread in the main process.

Peter ran into a threading problem while working on this. Calling PGMPContent::Open echoes the opener message for PGMPContentParent to the PGMPChromeParent actor in the main process. When we receive that opener message in PGMPChromeParent, the generated code calls AddOpenedActor. AddOpenedActor asserts that we're on the main thread. In this case we're on the GMP thread. There's no way to really get around this since AddOpenedActor is called from generated code on the actor's worker thread.

The main thread restriction is kinda crummy anyway. In bug 1057908 it was proposed that we use a lock, but it sounded hard. However, I've prototyped the lock approach and it seems okay. What do you think, Ben?
Attachment #8549150 - Flags: review?(bent.mozilla)
Created attachment 8549180 [details] [diff] [review]
top-level-proto-lock v2

Try shows I missed one use.
Attachment #8549150 - Attachment is obsolete: true
Attachment #8549150 - Flags: review?(bent.mozilla)
Attachment #8549180 - Flags: review?(bent.mozilla)
This is "leaking" in b2g because we're adding an extra Monitor to each top-level protocol, and b2g already leaks one PSharedBufferManagerParent and one PSharedBufferManagerChild. So I'll have to increase the leak threshold a bit.
This is also failing consistently here:

633 INFO TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_NuwaProcessCreation.html | Nuwa process is not launched - expected PASS

642 INFO TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_NuwaProcessDeadlock.html | Nuwa process is not launched - expected PASS

It seems like the patch shouldn't change behavior at all. I'll try adding some assertions and see if anything pops up.
You shouldn't need a Monitor, right? (Monitor == Mutex + CondVar). Just a Mutex should be enough I think.
Created attachment 8565819 [details] [diff] [review]
top-level-proto-lock v3

I finally reproduced the test failure locally (thanks for the b2g advice Ben). Apparently there's some phase of Nuwa where we can't allocate memory or something. That's the best I can figure base on what's happening. Luckily, all threads are stopped during that phase, and Nuwa already imposes a hard limit of 8 top-level protocols. So I was able to work around the issue pretty easily.
Attachment #8549180 - Attachment is obsolete: true
Attachment #8565819 - Flags: review?(bent.mozilla)
Heh, I just figured this out on a local build too. Maybe add a comment to GetProtoFdInfos that it assumes that all threads are stopped before it's called?
Comment on attachment 8565819 [details] [diff] [review]
top-level-proto-lock v3

Ugh. Now this is failing for a different reason.
Attachment #8565819 - Flags: review?(bent.mozilla)
Created attachment 8566056 [details] [diff] [review]
top-level-proto-lock v4

The previous problem was just a typo. I also had to increase the leak threshold since top-level protocols are now a little bigger and we leak three of them.
Attachment #8565819 - Attachment is obsolete: true
Attachment #8566056 - Flags: review?(bent.mozilla)
Comment on attachment 8566056 [details] [diff] [review]
top-level-proto-lock v4

Review of attachment 8566056 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is sufficient. LinkedListElement will auto-remove itself from its list when it gets destroyed, and I don't think there's any protection here against modifying the list with the lock held.

::: dom/ipc/ContentChild.cpp
@@ +2607,5 @@
>          content->GetTransport()->GetFileDescriptor();
>      i++;
>  
> +    IToplevelProtocol* actors[NUWA_TOPLEVEL_MAX];
> +    size_t count = content->GetOpenedActorsUnsafe(actors, NUWA_TOPLEVEL_MAX);

Maybe s/NUWA_TOPLEVEL_MAX/ArrayLength(actors)/ to be future-safe.

::: ipc/glue/ProtocolUtils.cpp
@@ +20,3 @@
>  IToplevelProtocol::~IToplevelProtocol()
>  {
> +  MutexAutoLock al(mOpenActorsLock);

I'm not sure you really need this. If we have two threads racing call the destructor we're in bad shape anyway.

@@ +56,5 @@
> +  size_t count = 0;
> +  for (IToplevelProtocol* actor = mOpenActors.getFirst();
> +       actor;
> +       actor = actor->getNext()) {
> +    MOZ_RELEASE_ASSERT(count < aActorsMax);

Wouldn't it be safer to just make the condition |actor && count < aActorsMax|?

@@ +81,5 @@
> +  nsTArray<IToplevelProtocol*> actors;
> +  aTemplate->GetOpenedActors(actors);
> +
> +  for (size_t i = 0; i < actors.Length(); i++) {
> +    IToplevelProtocol* newactor = actors[i]->CloneToplevel(aFds, aPeerProcess, aCtx);

This worries me... What guarantees that each actor stays alive after you've dropped the lock at the end of GetOpenedActors?

::: ipc/glue/ProtocolUtils.h
@@ +190,3 @@
>  protected:
>      explicit IToplevelProtocol(ProtocolId aProtoId)
> +        : mOpenActorsLock("OpenActorsLock")

Nit "IToplevelProtocol::mOpenActorsLock" maybe?
Attachment #8566056 - Flags: review?(bent.mozilla) → review-
Bill - this blocks bug 1057908 where EME conflicts with e10s. We want to enable Primetime on nightly in the next few days and it would be best if we can avoid the need to disable e10s to make it work. Do you have an ETA?
Flags: needinfo?(wmccloskey)
Created attachment 8580035 [details] [diff] [review]
top-level-proto-lock v5

This is the new version. Having one lock makes things easier, but it's very complicated to manage the destruction of the lock. It gets the job done though.

I stuck with the assertion that we don't overflow NUWA_TOPLEVEL_MAX. I'd rather find that out by crashing rather than having the function silently fail to return all the top-level actors.
Attachment #8566056 - Attachment is obsolete: true
Flags: needinfo?(wmccloskey)
Attachment #8580035 - Flags: review?(bent.mozilla)
Comment on attachment 8580035 [details] [diff] [review]
top-level-proto-lock v5

Review of attachment 8580035 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!
Attachment #8580035 - Flags: review?(bent.mozilla) → review+
Somehow today I randomly ran into the StaticMutex class, which would have made this simpler. I'll fix this soon.
Flags: needinfo?(wmccloskey)
Keywords: leave-open
Created attachment 8581988 [details] [diff] [review]
use static mutex
Flags: needinfo?(wmccloskey)
Attachment #8581988 - Flags: review?(bent.mozilla)
Attachment #8581988 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/4bc92ae5c50b
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox40: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Blocks: 1183972
You need to log in before you can comment on or make changes to this bug.