Closed
Bug 1121676
Opened 9 years ago
Closed 9 years ago
Use a lock to protect the list of top-level actors
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
Tracking | Status | |
---|---|---|
firefox40 | --- | fixed |
People
(Reporter: billm, Assigned: billm)
References
Details
Attachments
(2 files, 4 obsolete files)
13.21 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
4.25 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
In bug 1057908, we want to split the GMP protocol into PGMPChrome and PGMPContent, both of which would be top-level actors. In single-process Firefox, the main process would use both actors. We'd create the PGMPChrome actor when starting the process and then use that to open PGMPContent. Both actors would run on the GMP thread in the main process. Peter ran into a threading problem while working on this. Calling PGMPContent::Open echoes the opener message for PGMPContentParent to the PGMPChromeParent actor in the main process. When we receive that opener message in PGMPChromeParent, the generated code calls AddOpenedActor. AddOpenedActor asserts that we're on the main thread. In this case we're on the GMP thread. There's no way to really get around this since AddOpenedActor is called from generated code on the actor's worker thread. The main thread restriction is kinda crummy anyway. In bug 1057908 it was proposed that we use a lock, but it sounded hard. However, I've prototyped the lock approach and it seems okay. What do you think, Ben?
Attachment #8549150 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 1•9 years ago
|
||
Try shows I missed one use.
Attachment #8549150 -
Attachment is obsolete: true
Attachment #8549150 -
Flags: review?(bent.mozilla)
Attachment #8549180 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 2•9 years ago
|
||
This is "leaking" in b2g because we're adding an extra Monitor to each top-level protocol, and b2g already leaks one PSharedBufferManagerParent and one PSharedBufferManagerChild. So I'll have to increase the leak threshold a bit.
Assignee | ||
Comment 3•9 years ago
|
||
This is also failing consistently here: 633 INFO TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_NuwaProcessCreation.html | Nuwa process is not launched - expected PASS 642 INFO TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_NuwaProcessDeadlock.html | Nuwa process is not launched - expected PASS It seems like the patch shouldn't change behavior at all. I'll try adding some assertions and see if anything pops up.
You shouldn't need a Monitor, right? (Monitor == Mutex + CondVar). Just a Mutex should be enough I think.
Attachment #8549180 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 5•9 years ago
|
||
I finally reproduced the test failure locally (thanks for the b2g advice Ben). Apparently there's some phase of Nuwa where we can't allocate memory or something. That's the best I can figure base on what's happening. Luckily, all threads are stopped during that phase, and Nuwa already imposes a hard limit of 8 top-level protocols. So I was able to work around the issue pretty easily.
Attachment #8549180 -
Attachment is obsolete: true
Attachment #8565819 -
Flags: review?(bent.mozilla)
Comment 6•9 years ago
|
||
Heh, I just figured this out on a local build too. Maybe add a comment to GetProtoFdInfos that it assumes that all threads are stopped before it's called?
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8565819 [details] [diff] [review] top-level-proto-lock v3 Ugh. Now this is failing for a different reason.
Attachment #8565819 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 8•9 years ago
|
||
The previous problem was just a typo. I also had to increase the leak threshold since top-level protocols are now a little bigger and we leak three of them.
Attachment #8565819 -
Attachment is obsolete: true
Attachment #8566056 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 9•9 years ago
|
||
This is finally green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=49e15648f333
Comment on attachment 8566056 [details] [diff] [review] top-level-proto-lock v4 Review of attachment 8566056 [details] [diff] [review]: ----------------------------------------------------------------- I don't think this is sufficient. LinkedListElement will auto-remove itself from its list when it gets destroyed, and I don't think there's any protection here against modifying the list with the lock held. ::: dom/ipc/ContentChild.cpp @@ +2607,5 @@ > content->GetTransport()->GetFileDescriptor(); > i++; > > + IToplevelProtocol* actors[NUWA_TOPLEVEL_MAX]; > + size_t count = content->GetOpenedActorsUnsafe(actors, NUWA_TOPLEVEL_MAX); Maybe s/NUWA_TOPLEVEL_MAX/ArrayLength(actors)/ to be future-safe. ::: ipc/glue/ProtocolUtils.cpp @@ +20,3 @@ > IToplevelProtocol::~IToplevelProtocol() > { > + MutexAutoLock al(mOpenActorsLock); I'm not sure you really need this. If we have two threads racing call the destructor we're in bad shape anyway. @@ +56,5 @@ > + size_t count = 0; > + for (IToplevelProtocol* actor = mOpenActors.getFirst(); > + actor; > + actor = actor->getNext()) { > + MOZ_RELEASE_ASSERT(count < aActorsMax); Wouldn't it be safer to just make the condition |actor && count < aActorsMax|? @@ +81,5 @@ > + nsTArray<IToplevelProtocol*> actors; > + aTemplate->GetOpenedActors(actors); > + > + for (size_t i = 0; i < actors.Length(); i++) { > + IToplevelProtocol* newactor = actors[i]->CloneToplevel(aFds, aPeerProcess, aCtx); This worries me... What guarantees that each actor stays alive after you've dropped the lock at the end of GetOpenedActors? ::: ipc/glue/ProtocolUtils.h @@ +190,3 @@ > protected: > explicit IToplevelProtocol(ProtocolId aProtoId) > + : mOpenActorsLock("OpenActorsLock") Nit "IToplevelProtocol::mOpenActorsLock" maybe?
Attachment #8566056 -
Flags: review?(bent.mozilla) → review-
Bill - this blocks bug 1057908 where EME conflicts with e10s. We want to enable Primetime on nightly in the next few days and it would be best if we can avoid the need to disable e10s to make it work. Do you have an ETA?
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 12•9 years ago
|
||
This is the new version. Having one lock makes things easier, but it's very complicated to manage the destruction of the lock. It gets the job done though. I stuck with the assertion that we don't overflow NUWA_TOPLEVEL_MAX. I'd rather find that out by crashing rather than having the function silently fail to return all the top-level actors.
Attachment #8566056 -
Attachment is obsolete: true
Flags: needinfo?(wmccloskey)
Attachment #8580035 -
Flags: review?(bent.mozilla)
Comment on attachment 8580035 [details] [diff] [review] top-level-proto-lock v5 Review of attachment 8580035 [details] [diff] [review]: ----------------------------------------------------------------- This looks great!
Attachment #8580035 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 14•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca7617c40ed6
Assignee | ||
Comment 15•9 years ago
|
||
Somehow today I randomly ran into the StaticMutex class, which would have made this simpler. I'll fix this soon.
Flags: needinfo?(wmccloskey)
Keywords: leave-open
Assignee | ||
Comment 17•9 years ago
|
||
Flags: needinfo?(wmccloskey)
Attachment #8581988 -
Flags: review?(bent.mozilla)
Attachment #8581988 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 18•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/4bc92ae5c50b
Keywords: leave-open
Comment 19•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/4bc92ae5c50b
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Blocks: 1183972
You need to log in
before you can comment on or make changes to this bug.
Description
•