Closed Bug 545338 Opened 10 years ago Closed 10 years ago

RPCChannel should use events rather than thread messages for NotifyWokerThread

Categories

(Core :: Plug-ins, defect)

x86
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- .4-fixed

People

(Reporter: jimm, Assigned: jimm)

References

Details

(Whiteboard: [fixed-lorentz])

Attachments

(1 file, 2 obsolete files)

RPCChannel's Call currently assumes it's 'ok' to process all pending messages after a single event loop message is received in WaitForNotify. We need a win impl. of EventOccurred() that removes mPending.size() event loop messages since the IO thread posts one event per message. This bug is the cause of some of the current test failures on e10s.
Would it be any easier to (approximately) remove a windows message per mPending.pop()?  EventOccurred() is currently side-effect free; RPCChannel relies on this assumption and will call it N times after receiving N messages.

Regardless of implementation, there are two special cases to keep in mind

 (1) If the IO thread receives a message sent to an RPCChannel when the worker thread isn't waiting, it enqueues the Message into mPending, but doesn't Notify(); instead it enqueues a worker-thread callback event |MaybeDequeueOne()| that ends up calling mPending.pop().  The worker thread doesn't call EventOccurred() in this case either (it just returns if mPending is empty).

 (2) When the sync channel is notified of a sync reply, the reply is directly written to syncChannel.mRecvd, and then Notify() is called.  For sync replies sent to an RPCChannel, the RPCChannel never touches mPending.  Also SyncChannel has its own EventOccurred() method that only looks at mRecvd and connection errors.
Also worth noting that this bug was likely responsible for the old Great White Whale, bug 538239.
There's another special case I just remembered

   590 void
   591 RPCChannel::OnMessageReceived(const Message& msg)
   592 {
[snip]
   605     mPending.push(msg);
   606 
   607     if (0 == StackDepth() && !mBlockedOnParent)
   608         // the worker thread might be idle, make sure it wakes up
   609         mWorkerLoop->PostTask(
   610             FROM_HERE,
   611             NewRunnableMethod(this, &RPCChannel::OnMaybeDequeueOne));
   612     else if (!AwaitingSyncReply())
   613         NotifyWorkerThread();
   614 }

When we're awaiting a sync reply, but get some other Message, we *don't* notify the worker thread.  I made this change for the hang detector.

We'll need to figure something out here for windows.  Maybe we should always notify (i.e. revert this change), but use lparam/wparam to distinguish sync replies from other Messages, and then not return from SyncChannel::WaitForNotify() for non-sync-reply Messages?
(In reply to comment #1)
>  (1) If the IO thread receives a message sent to an RPCChannel when the worker
> thread isn't waiting, it enqueues the Message into mPending, but doesn't
> Notify(); instead it enqueues a worker-thread callback event
> |MaybeDequeueOne()| that ends up calling mPending.pop().  The worker thread
> doesn't call EventOccurred() in this case either (it just returns if mPending
> is empty).
> 

I think we can "fix" this for windows by always Notify()ing, even if we end up enqueuing the main-thread MaybeDequeueOne() task.
Each channel on the stack calls NotifyWorkerThread() 'n' times, then WaitForNotify() once, so if we balance the number of windows events in the queue based on 'n' when exiting WaitForNotify() fix this without WindowsMessageLoop code needing to know what's going on in the core message processing of the channel?
There are cases where the worker thread can WaitForNotify(), be Notify()d of one Message (mPending.size() == 1), but then while processing that Message be Notify()d of K other messages but never call WaitForNotifyAgain() while processing all K+1.  So the next time WaitForNotify() is called, mPending would be empty.  I guess you could keep another counter in the windows code to track this, but this sounds tricky to me.

It seems simpler to me to create an invariant of Notify()-per-Message (that doesn't quite hold currently), then dequeue a windows message per Message dequeue.  But I may not be understanding what you propose, I'll catch up on IRC.
Pushed http://hg.mozilla.org/projects/electrolysis/rev/45dcc1e10d76 as stopgap, until we figure out what we want to do in the long term.
Summary: RPCChannel's EventOccurred needs to sync incoming event loop signal events with incoming pending messages → RPCChannel should sync NotifyWorkerThread calls with pending messages
Depends on: LorentzBeta2
Attached patch signal events v.1 (obsolete) — Splinter Review
Assignee: nobody → jmathies
Comment on attachment 434917 [details] [diff] [review]
signal events  v.1

oops, need to nix a bunch of debug code.
Attachment #434917 - Attachment is obsolete: true
Attached patch signal events v.1 (obsolete) — Splinter Review
Attachment #434919 - Attachment is obsolete: true
Attachment #434966 - Flags: review?(bent.mozilla)
Comment on attachment 434966 [details] [diff] [review]
signal events v.2

>+#ifdef OS_WIN
>+    mEvent(NULL),
>+#endif
>     mTimeoutMs(kNoTimeout)
> {
>   MOZ_COUNT_CTOR(SyncChannel);
>+#ifdef OS_WIN
>+    mEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
>+    NS_ASSERTION(mEvent, "CreateEvent failed! Nothing is going to work!");
>+#endif

No need to double-initialize mEvent here. (also maybe fix indent of the ctor logging?)

>+    // Wait for UI events or a signal from the io thread.
>+    DWORD result = MsgWaitForMultipleObjects(1, &mEvent, FALSE, INFINITE,
>+                                             QS_ALLINPUT);
>+    if (result == WAIT_OBJECT_0) {
>+      // Our NotifyWorkerThread event was signaled
>+      ExitSpinLoop();
>+      return;

Should you reset mEvent here?
Attachment #434966 - Flags: review?(bent.mozilla) → review+
Summary: RPCChannel should sync NotifyWorkerThread calls with pending messages → RPCChannel should use events rather than thread messages for NotifyWokerThread
http://hg.mozilla.org/mozilla-central/rev/e448bbd5f8a9
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: LorentzBeta2
No longer depends on: LorentzBeta2
Blanket approval for Lorentz merge to mozilla-1.9.2
a=beltzner for 1.9.2.4 - please make sure to mark status1.9.2:.4-fixed
You need to log in before you can comment on or make changes to this bug.