Last Comment Bug 699319 - Implement cross-thread communication
: Implement cross-thread communication
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: IPC (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla11
Assigned To: Niko Matsakis [:nmatsakis]
:
Mentors:
Depends on: 703317
Blocks: omtc 703320 703322 703323
  Show dependency treegraph
 
Reported: 2011-11-02 20:29 PDT by Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
Modified: 2012-05-19 09:00 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Intermediate patch: sketch how open/send can work (31.31 KB, patch)
2011-11-03 13:57 PDT, Niko Matsakis [:nmatsakis]
no flags Details | Diff | Review
Abstract out the mTransport and I/O thread into the Link abstraction (34.63 KB, patch)
2011-11-17 10:47 PST, Niko Matsakis [:nmatsakis]
no flags Details | Diff | Review
Create threaded version of the Link class (8.02 KB, patch)
2011-11-17 10:50 PST, Niko Matsakis [:nmatsakis]
cjones.bugs: review+
Details | Diff | Review
Update test infrastructure to run tests in either threaded or process mode (25.00 KB, patch)
2011-11-17 10:51 PST, Niko Matsakis [:nmatsakis]
cjones.bugs: review+
Details | Diff | Review
Abstract out the mTransport and I/O thread into the Link abstraction (34.56 KB, patch)
2011-11-23 22:06 PST, Niko Matsakis [:nmatsakis]
cjones.bugs: review+
Details | Diff | Review
Create threaded version of the Link class (8.98 KB, patch)
2011-11-23 22:07 PST, Niko Matsakis [:nmatsakis]
no flags Details | Diff | Review
Create threaded version of the Link class (8.94 KB, patch)
2011-11-23 22:09 PST, Niko Matsakis [:nmatsakis]
cjones.bugs: review+
Details | Diff | Review
Update test infrastructure to run tests in either threaded or process mode (25.50 KB, patch)
2011-11-24 11:23 PST, Niko Matsakis [:nmatsakis]
cjones.bugs: review+
Details | Diff | Review
(unbitrot) Abstract out the mTransport and I/O thread into the Link abstraction (46.11 KB, patch)
2011-11-29 15:31 PST, Benoit Girard (:BenWa)
no flags Details | Diff | Review
Abstract out the mTransport and I/O thread into the Link abstraction (50.16 KB, patch)
2011-11-30 08:26 PST, Niko Matsakis [:nmatsakis]
no flags Details | Diff | Review
Create threaded version of the Link class (12.01 KB, patch)
2011-11-30 08:27 PST, Niko Matsakis [:nmatsakis]
no flags Details | Diff | Review
Update test infrastructure to run tests in either threaded or process mode (33.63 KB, patch)
2011-11-30 08:29 PST, Niko Matsakis [:nmatsakis]
no flags Details | Diff | Review

Description Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-11-02 20:29:56 PDT
The general approach here is to abstract the parts of the *Channel code that deal with the IO thread and socket IO into a "cross-process" implementation of a more abstract class, and then add a "cross-thread" implementation of that class that directly posts tasks wrapping Messages between the two *Channel endpoints (skipping the IO thread indirection).
Comment 1 Niko Matsakis [:nmatsakis] 2011-11-03 13:57:10 PDT
Created attachment 571768 [details] [diff] [review]
Intermediate patch: sketch how open/send can work

At Andreas's suggestion, I'm just posting a snapshot of the current state of the branch.  This sketches out how one might open and link two channels between two different threads.  Trying now to get a test up-and-running to validate the design a bit more.
Comment 2 Niko Matsakis [:nmatsakis] 2011-11-17 10:47:37 PST
Created attachment 575228 [details] [diff] [review]
Abstract out the mTransport and I/O thread into the Link abstraction
Comment 3 Niko Matsakis [:nmatsakis] 2011-11-17 10:50:45 PST
Created attachment 575229 [details] [diff] [review]
Create threaded version of the Link class
Comment 4 Niko Matsakis [:nmatsakis] 2011-11-17 10:51:23 PST
Created attachment 575232 [details] [diff] [review]
Update test infrastructure to run tests in either threaded or process mode
Comment 5 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-11-21 21:52:26 PST
Comment on attachment 575228 [details] [diff] [review]
Abstract out the mTransport and I/O thread into the Link abstraction

>diff -r c56c430b2c88 -r 05156fdf0165 ipc/glue/AsyncChannel.cpp

>+AsyncChannel::Link::Link(AsyncChannel *aChan)
> {
>+    mChan = aChan;

Nit: use initializer syntax.

>diff -r c56c430b2c88 -r 05156fdf0165 ipc/glue/AsyncChannel.h

>+class RefCountedMonitor : public Monitor
>+{
>+public:

Use NS_INLINE_DECL_THREADSAFE_REFCOUNTING(RefCountedMonitor) here.
You can delete pretty much everything else except the superclass ctor
call.

>+    class ProcessLink : public Link, public Transport::Listener {
>+    protected:
>+        mozilla::ipc::Transport* mTransport;

Nit: You don't need the full qualification here.

>+    // Runs on the link thread. Invoked either from the I/O thread methods above
>+    // or directly from the other actor if using a thread-based link.
>+    // 
>+    // n.b.: mMonitor is always held when these methods are invoked.
>+    // In the case of a ProcessLink, it is acquired by the ProcessLink.
>+    // In the case of a ThreadLink, it is acquired by the other actor, 
>+    // which then invokes these methods directly.
>+    NS_OVERRIDE virtual void OnMessageReceivedFromLinkSync(const Message& msg);
>+    NS_OVERRIDE virtual void OnChannelErrorFromLinkSync();

Nit-y, but "Sync" is a bit confusing here.  I think by "Sync", you
mean "synchronized by the monitor"?  Can we call them *Locked()
instead?

Why are these NS_OVERRIDE?  I don't see what decls they're overriding.

>diff -r c56c430b2c88 -r 05156fdf0165 ipc/glue/RPCChannel.cpp

> 
>     if (mChild)
>         NS_RUNTIMEABORT("child tried to block parent");
>+
>+    MonitorAutoLock lock(*mMonitor);
>     SendSpecialMessage(new BlockChildMessage());

Has it become an invariant that the monitor must be held during
SendMessage()?  (Haven't read the thread-link patch yet.)  If so, that
needs to be documented, and those methods need to assert
appropriately.  (The monitor doesn't strictly need to be held here, or
below, or in some other spots, but if it makes other code simpler it's
fine.)

>+RPCChannel::OnChannelErrorFromLinkSync()

Please assert that the monitor is held here.

Nice refactoring :).

>diff -r c56c430b2c88 -r 05156fdf0165 ipc/glue/SyncChannel.cpp

> void
>-SyncChannel::OnMessageReceived(const Message& msg)
>+SyncChannel::OnMessageReceivedFromLinkSync(const Message& msg)
> {
>-    AssertIOThread();
>+    AssertLinkThread();

Assert that the monitor is held too, plz.

Nice patch!  This ended up cleaner than I feared it might be :).
Would like to see a second version with answers to the questions
above, and then let's get this landed.
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-11-21 22:15:47 PST
Comment on attachment 575229 [details] [diff] [review]
Create threaded version of the Link class

>diff -r 05156fdf0165 -r c3f826a29bcb ipc/glue/AsyncChannel.cpp

>+AsyncChannel::ThreadLink::ThreadLink(AsyncChannel *aChan,
>+                                         AsyncChannel *aTargetChan)

Nit: indentation is weird here.

>+void
>+AsyncChannel::ThreadLink::EchoMessage(Message *msg)
>+{
>+    mChan->AssertWorkerThread();
>+    mChan->OnMessageReceivedFromLinkSync(*msg);
>+    delete msg;

This, and the other *Link impl methods below, should assert that the
monitor is held.  Right?

>+bool
>+AsyncChannel::Open(AsyncChannel *aTargetChan, 
>+                   MessageLoop *aTargetLoop,
>+                   AsyncChannel::Side aSide)
>+{

>+    mMonitor = already_AddRefed<RefCountedMonitor>(new RefCountedMonitor());
>+

With the changes in the previous patch, this will be |mMonitor = new
RefCountedMonitor()|.

>diff -r 05156fdf0165 -r c3f826a29bcb ipc/glue/AsyncChannel.h

>+    class ThreadLink : public Link {

>+        virtual void EchoMessage(Message *msg);
>+        virtual void SendMessage(Message *msg);
>+        virtual void SendClose();

NS_OVERRIDE

Nice patch.
Comment 7 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-11-21 22:24:48 PST
Comment on attachment 575232 [details] [diff] [review]
Update test infrastructure to run tests in either threaded or process mode

>diff -r c3f826a29bcb -r fb894d60c282 ipc/ipdl/test/cxx/IPDLUnitTests.template.cpp
>@@ -26,6 +27,13 @@
> 
> void* mozilla::_ipdltest::gChildActor;
> 
>+// Note: in threaded mode, this will be non-null (for both parent and
>+// child, since they share one set of globals).  
>+base::Thread* mozilla::_ipdltest::gChildThread;

Nit: |using namespace base| please.

>+volatile bool gParentDone;
>+volatile bool gChildDone;
>+

gParentDone doesn't need to be |volative|.  Instead of making
gChildDone |volatile| and introducing a (benign) data race, let's have
QuitChild() post a task that sets gChildDone=true and then calls
TryThreadedShutdown().  helgrind will be happier with us.

>diff -r c3f826a29bcb -r fb894d60c282 ipc/ipdl/test/cxx/TestDataStructures.cpp
>+    if (OtherProcess() != 0) {
>+        //FIXME allocation of nsIntRegion uses a global region pool which breaks threads

"FIXME/bug 703317", to leave a breadcrumb.

>diff -r c3f826a29bcb -r fb894d60c282 ipc/ipdl/test/cxx/TestFailedCtor.h

>+    // FIXME Disabled because child calls exit() to end test,

Similarly for here ...

>diff -r c3f826a29bcb -r fb894d60c282 ipc/ipdl/test/cxx/TestHangs.h

>+    // FIXME Disabled because parent kills child proc, not clear

And here

r=me with those changes.
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-11-21 22:25:20 PST
Also, \o/.  Nice work :).
Comment 9 Niko Matsakis [:nmatsakis] 2011-11-22 22:18:59 PST
I am working on incorporating those patches.  One question: I could not find any definition for NS_INLINE_DECL_THREADSAFE_REFCOUNTING.  Was this added recently?  I found a non-threadsafe variant.
Comment 10 Niko Matsakis [:nmatsakis] 2011-11-22 22:26:17 PST
Um, never mind. I see it now.
Comment 11 Niko Matsakis [:nmatsakis] 2011-11-23 22:06:18 PST
Created attachment 576677 [details] [diff] [review]
Abstract out the mTransport and I/O thread into the Link abstraction

Updated to include feedback.  I just dropped the Sync suffix; the fact that the lock should be held is communicated by the assertions.
Comment 12 Niko Matsakis [:nmatsakis] 2011-11-23 22:07:31 PST
Created attachment 576678 [details] [diff] [review]
Create threaded version of the Link class

Updated version including feedback.
Comment 13 Niko Matsakis [:nmatsakis] 2011-11-23 22:09:39 PST
Created attachment 576679 [details] [diff] [review]
Create threaded version of the Link class

Sorry, this is the correct patch that incorporates feedback from review.
Comment 14 Niko Matsakis [:nmatsakis] 2011-11-24 11:23:33 PST
Created attachment 576806 [details] [diff] [review]
Update test infrastructure to run tests in either threaded or process mode

Incorporated feedback.
Comment 15 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-11-26 17:20:20 PST
Comment on attachment 576677 [details] [diff] [review]
Abstract out the mTransport and I/O thread into the Link abstraction

ProcessLink::EchoMessage and ProcessLink::SendMessage need to assert that the monitor is locked.

r=me with that.
Comment 16 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-11-26 17:27:52 PST
Alrighty!  Let's get this landed.

First up, we need to land the leak fix, since there's an incidental dependency here.  To do so, push your patch to our tryserver, and if it's clean then you can set the "checkin-needed" flag on the bug.  After that, same process here.
Comment 17 Benoit Girard (:BenWa) 2011-11-29 15:31:03 PST
Created attachment 577776 [details] [diff] [review]
(unbitrot) Abstract out the mTransport and I/O thread into the Link abstraction

Sharing the simple bitrot fix I had to do to land on OMTC repo.
Comment 18 Benoit Girard (:BenWa) 2011-11-29 16:32:46 PST
Got a build error building with the 3 patches. I'll punt on this piece for now.
Comment 19 Mozilla RelEng Bot 2011-11-29 19:20:58 PST
Try run for 35739a8920fe is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=35739a8920fe
Results (out of 188 total builds):
    exception: 1
    success: 159
    warnings: 20
    failure: 8
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nmatsakis@mozilla.com-35739a8920fe
Comment 20 Mozilla RelEng Bot 2011-11-30 00:40:21 PST
Try run for a19835a8b436 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=a19835a8b436
Results (out of 248 total builds):
    success: 223
    warnings: 19
    failure: 6
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nmatsakis@mozilla.com-a19835a8b436
Comment 21 Niko Matsakis [:nmatsakis] 2011-11-30 08:26:00 PST
Created attachment 577970 [details] [diff] [review]
Abstract out the mTransport and I/O thread into the Link abstraction

Update with build fixes for windows.
Comment 22 Niko Matsakis [:nmatsakis] 2011-11-30 08:27:01 PST
Created attachment 577971 [details] [diff] [review]
Create threaded version of the Link class

New version w/ correct metadata and windows build fixes.
Comment 23 Niko Matsakis [:nmatsakis] 2011-11-30 08:29:18 PST
Created attachment 577973 [details] [diff] [review]
Update test infrastructure to run tests in either threaded or process mode

Updated version w/ correct metadata and windows build fix.
Comment 24 Niko Matsakis [:nmatsakis] 2011-11-30 08:30:13 PST
Comment on attachment 577776 [details] [diff] [review]
(unbitrot) Abstract out the mTransport and I/O thread into the Link abstraction

(Marked "unbitrot" patch as obsolete per discussion w/ Benoit)
Comment 25 Niko Matsakis [:nmatsakis] 2011-11-30 17:23:55 PST
If I understand the try output, in the latest run, all of the failures also had associated bugs indicating intermittent failures.  I requested re-runs but suspect the patch could be safely committed.

Note You need to log in before you can comment on or make changes to this bug.