Closed Bug 538918 Opened 14 years ago Closed 14 years ago

[OOPP] Modal system dialogs cause UI freeze w/windowless controls

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(status1.9.2 .4-fixed)

VERIFIED FIXED
Tracking Status
status1.9.2 --- .4-fixed

People

(Reporter: jimm, Assigned: jimm)

References

Details

(Whiteboard: [fixed-lorentz])

Attachments

(3 files, 15 obsolete files)

53.36 KB, patch
bent.mozilla
: review+
Details | Diff | Splinter Review
2.97 KB, patch
bent.mozilla
: review+
Details | Diff | Splinter Review
1.02 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
Title is self explanatory. I also get a series of dropped message warnings:

###!!! ASSERTION: Received "nonqueued" message 31 during a synchronous IPC message for window 266472 ("MozillaUIWindowClass"), sending it to DefWindowProc inste
ad of the normal window procedure.: 'Error', file f:/Mozilla/firefox/mozilla-central/ipc/glue/WindowsMessageLoop.cpp, line 282
###!!! ASSERTION: Received "nonqueued" message 10 during a synchronous IPC message for window 266472 ("MozillaUIWindowClass"), sending it to DefWindowProc inste
ad of the normal window procedure.: 'Error', file f:/Mozilla/firefox/mozilla-central/ipc/glue/WindowsMessageLoop.cpp, line 282
###!!! ASSERTION: GetRegionData failed!: 'Error', file f:/Mozilla/firefox/mozilla-central/ipc/glue/WindowsMessageLoop.cpp, line 698
Attached image screen grab (obsolete) —
There also seems to be a problem with common controls, we're getting the ancient user interface in UI generated by the runtime.
The common-controls thing is a separate bug, and probably caused by a lack of a manifest in mozilla-runtime.exe. Please file and cc ted on that one.
(In reply to comment #2)
> The common-controls thing is a separate bug, and probably caused by a lack of a
> manifest in mozilla-runtime.exe. Please file and cc ted on that one.

split out as bug 538990.
(In reply to comment #0)
> Title is self explanatory. I also get a series of dropped message warnings:
> 
> ###!!! ASSERTION: Received "nonqueued" message 31 during a synchronous IPC
> message for window 266472 ("MozillaUIWindowClass"), sending it to DefWindowProc
> inste
> ad of the normal window procedure.: 'Error', file
> f:/Mozilla/firefox/mozilla-central/ipc/glue/WindowsMessageLoop.cpp, line 282
> ###!!! ASSERTION: Received "nonqueued" message 10 during a synchronous IPC
> message for window 266472 ("MozillaUIWindowClass"), sending it to DefWindowProc
> inste
> ad of the normal window procedure.: 'Error', file
> f:/Mozilla/firefox/mozilla-central/ipc/glue/WindowsMessageLoop.cpp, line 282
> ###!!! ASSERTION: GetRegionData failed!: 'Error', file
> f:/Mozilla/firefox/mozilla-central/ipc/glue/WindowsMessageLoop.cpp, line 698

These are WM_CANCELMODE and WM_ENABLE. Unfortunately adding these to the deferred message list doesn't solve the problem, probably due to the delayed WM_ENABLE.
Attachment #421000 - Attachment is obsolete: true
Attached patch use async for user input (obsolete) — Splinter Review
First rev, this addresses print in flash. I think though there are other events (like key input) that we could send asynchronously as well. I need to do a little digging to see what else to include.

The two dropped messages previously mentioned do cause problems, but they are unrelated to the problem this patch addresses. (I'll spin those out as a separate bug.) Basically, modal loops initiated in the child process trigger our deferred event processing which can remain in effect for long periods of time. This totally borks firefox and the plugin process. By sending user input async, we avoid this problem.
Assignee: nobody → jmathies
Blocks: LorentzBeta1
Attached patch use async for user input (obsolete) — Splinter Review
some cleanup / commenting and added keyboard input.
Attachment #421105 - Attachment is obsolete: true
> These are WM_CANCELMODE and WM_ENABLE. Unfortunately adding these to the
> deferred message list doesn't solve the problem, probably due to the delayed
> WM_ENABLE.

Split out to bug 539061.
bent - so the two different thread are in the following state. The parent is in WaitForNotify() and is actively processing / deferring messages. The runtime thread has dropped into flash, which drops into the print dialog routine, and is wrapped up in a modal message processing loop. On creation of the dialog, the following events get deferred - 

8 - 0x001F defer: WM_CANCELMODE
9 - 0x000A defer: WM_ENABLE
10 - 0x0086 defer: WM_NCACTIVATE
11 - 0x0006 defer: WM_ACTIVATE
12 - 0x001C defer: WM_ACTIVATEAPP
13 - 0x001C defer: WM_ACTIVATEAPP
14 - 0x001C defer: WM_ACTIVATEAPP
15 - 0x001C defer: WM_ACTIVATEAPP
16 - 0x001C defer: WM_ACTIVATEAPP
17 - 0x0008 defer: WM_KILLFOCUS
18 - 0x0281 defer: WM_IME_SETCONTEXT

In general windows events are still getting pumped, but the focus is all messed up. The print dialog is app modal, but never receives proper focus, causing what feels like a freeze, but is actually just a modal dialog the user can't interact with.

I can try not deferring some of these events, I'd guess the problem relates to  focus related events. But I'm still leaning toward the async solution. Seems like it might protect us from all sorts of UI related issues like this.
Depends on: 535036
Attachment #421141 - Flags: review?(bent.mozilla)
Attachment #421141 - Flags: review?(jst)
Comment on attachment 421141 [details] [diff] [review]
use async for user input

Tested a number of other system dialogs that use modal loops and found other issues. So I'm going to have to come up with some other solution.
Attachment #421141 - Flags: review?(jst)
Attachment #421141 - Flags: review?(bent.mozilla)
Blocks: 539955
No longer blocks: 539955
Summary: OOPP: Selecting "Print..." option in Flash context menu locks up firefox → [OOPP] Modal system dialogs cause UI freeze w/windowless controls
I've managed to put together a fix that addresses the freezing. It's caused by a set of events (non-queued events that get deferred, and queued events that back up in the browser's main event queue). Handing some of these to defwndproc or deferring queued messages solves the freeze problem. However this doesn't address issues listed in bug 539835, or bug 543103, both of which a pretty serious.

Which makes me think the original proposed solution would be better than a patch to address the specific issue of freezing: we should send certain ui events through as async, and deal with the fall out from that separately. I doubt regressions from async mouse clicks will be as serious as all the issues mentioned in 539835, and 543103.
(In reply to comment #10)
> I've managed to put together a fix that addresses the freezing. It's caused by
> a set of events (non-queued events that get deferred, and queued events that
> back up in the browser's main event queue). Handing some of these to defwndproc
> or deferring queued messages solves the freeze problem. However this doesn't
> address issues listed in bug 539835, or bug 543103, both of which a pretty
> serious.
> 
> Which makes me think the original proposed solution would be better than a
> patch to address the specific issue of freezing: we should send certain ui
> events through as async, and deal with the fall out from that separately. I
> doubt regressions from async mouse clicks will be as serious as all the issues
> mentioned in 539835, and 543103.

Sorry, scratch bug 543103 from that list, I meant bug 543479 - "[OOPP] Deferred event processing causes back-log & sudden flush of windows events".
Attached patch spin msg loop wip v.1 (obsolete) — Splinter Review
Here's the work in progress patch. Unfortunately I'm still seeing deadlocks in synchronous win32 apis calls made by the child procedure that end up messaging back to the parent while the internal event loop in parent is spinning.
If we can't get this working, an alternative to this approach might be to go back to something I was experimenting with last week - basically break the parent/child relationship between the parent (browser) window and plugin. Essentially, create a top level hidden window in child, and hand that into the plugin as the parent for NPNVnetscapeWindow. There are a number of side effects in doing this related to focus and positioning, but we might be able work those out.
Attached patch spin msg loop wip v.2 (obsolete) — Splinter Review
Improved, but there are still some issues with deadlocks on creation and destruction of ui to work on tomorrow.
Attachment #424827 - Attachment is obsolete: true
Attached patch spin msg loop wip v.3 (obsolete) — Splinter Review
Attachment #424927 - Attachment is obsolete: true
Attached patch spin msg loop wip v.4 (obsolete) — Splinter Review
Attachment #425055 - Attachment is obsolete: true
Attached patch spin msg loop wip v.5 (obsolete) — Splinter Review
This new rev deals better with expected race issues. When modal ui is being displayed and the spin loop is running, out-calls can easily race in the parent. This is expected, and RPCChannel deals with the situation pretty well. mStack and mOutOfTurnReplies can grow temporarily when this happens since the channel waits for a reply to the last-in out-call, but ultimately it'll tear things back down. WaitForNotify also has to be smarter about checking stack, pending, and out-of-turn reply stacks when deciding whether or not deferred processing should be turned back on after dropping out of spin loop.
Attachment #425264 - Attachment is obsolete: true
Attached patch spin msg loop wip v.6 (obsolete) — Splinter Review
Addresses a tricky message ordering issue with mStack/mOutOfTurnReplies during parent races.
Attachment #425500 - Attachment is obsolete: true
Blocks: 543479
Blocks: 539835
Attached patch spin msg loop wip v.7 (obsolete) — Splinter Review
This looks to be a final. I'll kick off some reviews tomorrow after running with it for a bit.
Attachment #425675 - Attachment is obsolete: true
Attached patch merged to m-c (obsolete) — Splinter Review
Attachment #425725 - Attachment is obsolete: true
Attachment #425812 - Flags: review?(bent.mozilla)
Comment on attachment 425812 [details] [diff] [review]
merged to m-c

cjones - for the channel changes
Attachment #425812 - Flags: review?(jones.chris.g)
Comment on attachment 425812 [details] [diff] [review]
merged to m-c

Just to be up front, I left the onus of the review of Windows-only
stuff to bent.  I didn't really look over it at all.

>diff --git a/ipc/glue/RPCChannel.cpp b/ipc/glue/RPCChannel.cpp
>--- a/ipc/glue/RPCChannel.cpp
>+++ b/ipc/glue/RPCChannel.cpp
>@@ -95,32 +95,43 @@ RPCChannel::Call(Message* msg, Message* 
>         ReportConnectionError("RPCChannel");
>         return false;
>     }
> 
>     msg->set_seqno(NextSeqno());
>     msg->set_rpc_remote_stack_depth_guess(mRemoteStackDepthGuess);
>     msg->set_rpc_local_stack_depth(1 + StackDepth());
>     mStack.push(*msg);
>+    ReqType aType = msg->is_sync() ? SYNC_REQ:RPC_REQ;
> 

This is always going to be RPC_REQ.  Sync messages are sent through
SyncChannel::Send().

>     mIOLoop->PostTask(
>         FROM_HERE,
>         NewRunnableMethod(this, &RPCChannel::OnSend, msg));
> 
>     while (1) {
>         // now might be the time to process a message deferred because
>         // of race resolution
>         MaybeProcessDeferredIncall();
> 
>         // here we're waiting for something to happen. see long
>         // comment about the queue in RPCChannel.h
>+
>+        // mStack             - current out-calls awaiting reply
>+        // mPending           - pending msgs received from the io thread
>+        // mOutOfTurnReplies  - out-of-turn replies where 
>+        //                      reply.seqno's < mStack.top().seqno
>+        // mDeferred          - deferred in-calls to be processed
>+

Are the (long) comments about these in RPCChannel.h insufficient?  I'd
prefer to keep the documentation in a single place, so it's less
likely to get out of sync with reality.

>         while (Connected() && mPending.empty() &&
>                (mOutOfTurnReplies.empty() ||
>                 mOutOfTurnReplies.top().seqno() < mStack.top().seqno())) {
>-            WaitForNotify();
>+            if (aType == SYNC_REQ)
>+                SyncChannel::WaitForNotify();
>+            else
>+                RPCChannel::WaitForNotify();

This isn't necessary, per above.

> void
> RPCChannel::MaybeProcessDeferredIncall()
> {
>     AssertWorkerThread();
>     mMutex.AssertCurrentThreadOwns();
> 
>     if (mDeferred.empty())
>         return;
>@@ -302,25 +321,28 @@ RPCChannel::Incall(const Message& call, 
>     AssertWorkerThread();
>     mMutex.AssertNotCurrentThreadOwns();
>     RPC_ASSERT(call.is_rpc() && !call.is_reply(), "wrong message type");
> 
>     // Race detection: see the long comment near
>     // mRemoteStackDepthGuess in RPCChannel.h.  "Remote" stack depth
>     // means our side, and "local" means other side.
>     if (call.rpc_remote_stack_depth_guess() != stackDepth) {
>+#ifndef OS_WIN
>         NS_WARNING("RPC in-calls have raced!");
> 
>         RPC_ASSERT(call.rpc_remote_stack_depth_guess() < stackDepth,
>                    "fatal logic error");
>         RPC_ASSERT(1 == (stackDepth - call.rpc_remote_stack_depth_guess()),
>                    "got more than 1 RPC message out of sync???");
>-        RPC_ASSERT(1 == (call.rpc_local_stack_depth() -mRemoteStackDepthGuess),
>+        RPC_ASSERT(1 == (call.rpc_local_stack_depth() - mRemoteStackDepthGuess),
>                    "RPC unexpected not symmetric");
>-

Can you add a short comment here describing why these checks are
skipped on windows, or pointing elsewhere where it's documented?

>+#else
>+        NS_WARNING("RPC in-calls have raced!");
>+#endif

Nit: this warning shouldn't be in the #ifdef'd section.

>diff --git a/ipc/glue/RPCChannel.h b/ipc/glue/RPCChannel.h
>--- a/ipc/glue/RPCChannel.h
>+++ b/ipc/glue/RPCChannel.h
>@@ -47,16 +47,19 @@
> 
> namespace mozilla {
> namespace ipc {
> //-----------------------------------------------------------------------------
> 
> class RPCChannel : public SyncChannel
> {
> public:
>+    typedef enum { SYNC_REQ, RPC_REQ } ReqType;
>+

Nit: use C++-style |enum ReqType { ... };|.

>@@ -81,21 +84,29 @@ public:
>     // Make an RPC to the other side of the channel
>     bool Call(Message* msg, Message* reply);
> 
>     // Override the SyncChannel handler so we can dispatch RPC
>     // messages.  Called on the IO thread only.
>     NS_OVERRIDE virtual void OnMessageReceived(const Message& msg);
>     NS_OVERRIDE virtual void OnChannelError();
> 
>+#ifdef OS_WIN
>+protected:
>+    virtual void WaitForNotify();

Why does this need to virtual?

>diff --git a/ipc/glue/SyncChannel.cpp b/ipc/glue/SyncChannel.cpp
>--- a/ipc/glue/SyncChannel.cpp
>+++ b/ipc/glue/SyncChannel.cpp
>@@ -98,17 +98,17 @@ SyncChannel::Send(Message* msg, Message*
> 
>     // NB: this is a do-while loop instead of a single wait because if
>     // there's a pending RPC out- or in-call below us, and the sync
>     // message handler on the other side sends us an async message,
>     // the IO thread will Notify() this thread of the async message.
>     // See https://bugzilla.mozilla.org/show_bug.cgi?id=538239.
>     do {
>         // wait for the next sync message to arrive
>-        WaitForNotify();
>+        SyncChannel::WaitForNotify();
>     } while(Connected() &&
>             mPendingReply != mRecvd.type() && !mRecvd.is_reply_error());
> 
>     if (!Connected()) {
>         ReportConnectionError("SyncChannel");
>         return false;
>     }
> 
>diff --git a/ipc/glue/SyncChannel.h b/ipc/glue/SyncChannel.h
>--- a/ipc/glue/SyncChannel.h
>+++ b/ipc/glue/SyncChannel.h
>@@ -88,17 +88,17 @@ public:
> 
> protected:
>     // Executed on the worker thread
>     bool ProcessingSyncMessage() {
>         return mProcessingSyncMessage;
>     }
> 
>     void OnDispatchMessage(const Message& aMsg);
>-    void WaitForNotify();
>+    virtual void WaitForNotify();
> 

And here.

>diff --git a/ipc/glue/WindowsMessageLoop.cpp b/ipc/glue/WindowsMessageLoop.cpp
>--- a/ipc/glue/WindowsMessageLoop.cpp
>+++ b/ipc/glue/WindowsMessageLoop.cpp
>+void
>+RPCChannel::PushToOutOfTurnReplies(Message& msg)
>+{
>+  // Under normal circumstances, mOutOfTurnReplies is ordered sequentially in
>+  // descending order. However, we break the RPC rules when children make use
>+  // of modal loops - parent races after a message deep down in mStack is
>+  // blocked indefinitely. The reply to this blocked call can return at any
>+  // point. If the reply (which is considered by RPCChannel out-of-turn) is
>+  // received with messages already stored in mOutOfTurnReplies, those messages
>+  // below the reply will get trapped:
>+  //
>+  //  1) parent races by sending M1, M2, M3 w/M1 getting blocked by child
>+  //    mStack: M3 M2 M1
>+  //    mOutOfTurnReplies: (empty)
>+  //  2) child replies to M2 w/R2, parent waits for M3
>+  //    mStack: M3 M2 M1
>+  //    mOutOfTurnReplies: R2
>+  //  3) child replies to M1 w/R1
>+  //    mStack: M3 M2 M1
>+  //    mOutOfTurnReplies: R1 R2
>+  //  3) child replies to M3 w/R3
>+  //    mStack: M2 M1
>+  //    mOutOfTurnReplies: R1 R2
>+  //    M2.seqno() > R1.seqno(), R2 is trapped
>+  //
>+  // The number of messages trapped under R1 can be large if parent is racing
>+  // heavily. To prevent this, we check the top seqno of mOutOfTurnReplies when
>+  // a reply is pushed. If the ordering is not correct, we flush 'R1' replies
>+  // down the stack to thier proper position.
>+
>+  if (!mOutOfTurnReplies.empty() && mOutOfTurnReplies.top().seqno() > msg.seqno()) {
>+    // Out of turn is out of order. Push msg on the stack and flush it down to the
>+    // correct position.
>+    mOutOfTurnReplies.push(msg);
>+    std::stack<Message> stack;
>+    while (!mOutOfTurnReplies.empty()) {
>+        Message pending = mOutOfTurnReplies.top();
>+        mOutOfTurnReplies.pop();
>+        if (!mOutOfTurnReplies.empty() && pending.seqno() < mOutOfTurnReplies.top().seqno()) {
>+          Message top = mOutOfTurnReplies.top();
>+          mOutOfTurnReplies.pop();
>+          mOutOfTurnReplies.push(pending);
>+          pending = top;
>+        }
>+        stack.push(pending);
>     }
>+    while (!stack.empty()) {
>+      mOutOfTurnReplies.push(stack.top());
>+      stack.pop();
>+    }
>+  }
>+  else {
>+    // Everything is ok, just push msg.
>+    mOutOfTurnReplies.push(msg);
>   }
> }
> 

Hmm, ouch.  Given this insanity, it might be easier to just make
|mOutOfTurnReplies| be a |std::map<int, Message>|.

I'd like to look over v2.
Comment on attachment 425812 [details] [diff] [review]
merged to m-c

Jim, this is looking really good. Most of these comments are minor.

>+PluginInstanceChild::IsUserInputEvent(UINT msg)

I think we need WM_CONTEXTMENU here as well. I just hit that.

>+// these, an ugly stall condition occurs. To insure child stays in sync, we use

Nit: s/insure/ensure

>+#define CHILD_MODALPUMPTIMEOUT 50
>+#define CHILD_MODALLOOPTIMER   654321

Let's move these to the top of the file where people expect to find #define's.

>+  //PLUGIN_LOG_DEBUG(("PumpTimerProc"));

Uncomment?

>+PluginInstanceChild::NestedInputPumpHook(int code,

The docs say you should only do processing here if |code| >= 0, so please put the ScheduleWork call inside an if test for that.

>+// no issue, if ui is created, the parent can't start processing messages in

Nit: New sentence after "issue".

>+static PluginInstanceChild* gTempChildPointer = NULL;

No need to initialize statics.

>+PluginInstanceChild::NestedInputEventHook(int code,

Need to make sure we don't do anything if code < 0 here too.

>+    gTempChildPointer->InternalCallSetNestedEventState(true);
>+    

Nit: extra spaces instead of simple newline

>+PluginInstanceChild::ReleaseNestedEventHooks()

Seems to me this method does much more than only dealing with the nested event hook. Maybe rename it to "ResetNestedHooks"? Also, if you're taking care to clear everything here then why do the SetNestedInputPumpHook and SetNestedInputEventHook not assert that these hooks are unset? Can we reenter those without hitting this reset function?

>+    ::KillTimer(NULL, CHILD_MODALLOOPTIMER);

I've never been a huge fan of prepending :: to system calls, but if you're going to do it yu should apply it consistently (e.g. UnhookWindowsHookEx and others).

>+PluginInstanceChild::WinlessHandleEvent(NPEvent& event)
> ...
>+  if (mNestedEventLevelDepth <= 0) {

This should really never be < 0, might want to assert that.

>diff --git a/dom/plugins/PluginInstanceChild.h b/dom/plugins/PluginInstanceChild.h
> ...
>+    bool IsUserInputEvent(UINT msg);

This doesn't seem like it needs to be a method. Maybe just a static func in the cpp?

> PluginInstanceParent::ActorDestroy(ActorDestroyReason why)
> ...
>+            PostThreadMessage(GetCurrentThreadId(), gOOPPStopNativeLoopEvent, 0, 0);

Nit: You're past 80 chars here.

>+PluginInstanceParent::RecvSetNestedEventState(const bool& aState)
> ...
>+    PostThreadMessage(GetCurrentThreadId(), aState ?
>+      gOOPPSpinNativeLoopEvent : gOOPPStopNativeLoopEvent, 0, 0);

Nit: we're 4 space indented in this file.

>+    NS_NOTREACHED("PluginInstanceParent::AnswerSetNestedEventState not implemented!");

Nit: You're past 80 chars here.

>+        NS_WARNING("RPC in-calls have raced!");

While you're here can you remove this? It happens all the time and spews too much.

> class RPCChannel : public SyncChannel
> {
> public:
>+    typedef enum { SYNC_REQ, RPC_REQ } ReqType;
>+
>+public:

I don't think the extra |public| is useful.

>diff --git a/ipc/glue/WindowsMessageLoop.cpp b/ipc/glue/WindowsMessageLoop.cpp
> ...
>+bool gInnerEventLoopRunning = false;

Add this to the anonymous namespace.

And, I think it should be an int that increases/decreases as I think these calls can nest.

>+void PurgeDeferredMessages()

Nit: |void| should be on its own line.

And I think we should rename this. It doesn't purge anything, just makes it so the deferred messages are scheduled to run.

>+      if (!Connected()) {
>+        gInnerEventLoopRunning = false;
>+        return false;

This will put us back into the deferred loop, right? Is that what we want? Do we need to fake a gEventLoopMessage message here?

>+    if (::PeekMessageW(&msg, NULL, 0, 0, PM_REMOVE)) {

Again, I think inconsistent use of :: prepend is annoying.

>+RPCChannel::WaitForNotify()

First, I don't think this should be a virtual function. We only ever have one channel, and it's an RPC channel. You just want a different function altogether.

Second, there's a lot of duplicated code/comments here that need to be consolidated somehow. Since it mostly uses globals I think it shouldn't be too hard.

>+    MutexAutoLock lock(mMutex);
>+    if (mStack.empty())
>+      return;
> 
>+    if (!mOutOfTurnReplies.empty() && 
>+        mOutOfTurnReplies.top().seqno() == mStack.top().seqno())
>+      return;
>+
>+    if (!mPending.empty() &&
>+        mPending.front().seqno() == mStack.top().seqno())
>+      return;

This block is duplicated below, you should consolidate this in another function too.

>+          // See comments above - never assume MsgWaitForMultipleObjects will
>+          // get signaled.

Crazy idea, would it make code flow easier if the result of this (consolidated) function posted a thred message so that we *could* assume MsgWaitForMultipleObjects will trigger?

>+      bool haveSentMessagesPending =

This needs to happen after the MsgWaitForMultipleObjects call and before the first PeekMessage call. You added a PeekMessage so this needs to move above it.

> void nsWindow::DispatchPendingEvents()
> {
>   if (mPainting) {
>+    // Bug 514061

Is this supposed to be part of another patch?

>+nsWindow::IsAsyncResponseEvent(UINT aMsg, LRESULT& aResult)
>+{
>+  switch(aMsg) {
> ...
>+    case WM_MOUSEACTIVATE:
>+    aResult = 0;
>+    return true;
>+    break;
>+  }

Style in this file looks like you should indent again beginning with |aResult| there. Also, no need for a |break| after |return|.

>+// defined in ipc\glue\WindowsMessageLoop.cpp
>+extern bool gInnerEventLoopRunning;

This feels a little icky... Should we expose this with a getter like |IsPumpingMessages|?

>+nsWindow::IPCWindowProcHandler(HWND& hWnd, UINT& msg, WPARAM& wParam, LPARAM& lParam)
>+    if (!IsAsyncResponseEvent(msg, res)) {
>+      char szBuf[200];
>+      sprintf(szBuf, "An unhandled ISMEX_SEND message was received during spin loop! (%X)", msg);
>+      NS_WARNING(szBuf);

This should have some kind of #ifdef DEBUG wrapping because there's no need to do the sprintf in release builds. And I think NS_ERROR is more appropriate, that should pop a dialog box.

>+  return true;

Why is this returning a value? Doesn't look like you ever return anything other than true... Did you forget a |return false| in the error case?

>+  if (gInnerEventLoopRunning && mPainting)

Ok, I'm sold - let's make this a RPCChannel static getter, please.
Most of the this has been addressed. Comments on some issues below - 

(In reply to comment #24)
> (From update of attachment 425812 [details] [diff] [review])
> 
> >+RPCChannel::WaitForNotify()
> 
> Second, there's a lot of duplicated code/comments here that need to be
> consolidated somehow. Since it mostly uses globals I think it shouldn't be too
> hard.

I pulled some of the commenting from the second method, split the stack check code out, and added some simple "see above" comments. I think though for now splitting the wait/peek code out into functions might be a pain if we need to debug something in one of the wait methods. So I've left that alone for the most part. (I didn't see a clean way to break it out anyway after I moved the queue call up.)
 
> 
> >+    MutexAutoLock lock(mMutex);
> >+    if (mStack.empty())
> >+      return;
> > 
> >+    if (!mOutOfTurnReplies.empty() && 
> >+        mOutOfTurnReplies.top().seqno() == mStack.top().seqno())
> >+      return;
> >+
> >+    if (!mPending.empty() &&
> >+        mPending.front().seqno() == mStack.top().seqno())
> >+      return;
> 
> This block is duplicated below, you should consolidate this in another function
> too.
> 
> >+          // See comments above - never assume MsgWaitForMultipleObjects will
> >+          // get signaled.
> 
> Crazy idea, would it make code flow easier if the result of this (consolidated)
> function posted a thred message so that we *could* assume
> MsgWaitForMultipleObjects will trigger?

I didn't see a good reason for this, why message the loop and drop back down into deferred processing when we can just exit from there directly?

> 
> > void nsWindow::DispatchPendingEvents()
> > {
> >   if (mPainting) {
> >+    // Bug 514061
> 
> Is this supposed to be part of another patch?

Actually the drawing issues I found while testing were related. but no big deal, I pulled it. I have it in my patches if i need it.

> 
> >+nsWindow::IPCWindowProcHandler(HWND& hWnd, UINT& msg, WPARAM& wParam, LPARAM& lParam)
> >+    if (!IsAsyncResponseEvent(msg, res)) {
> >+      char szBuf[200];
> >+      sprintf(szBuf, "An unhandled ISMEX_SEND message was received during spin loop! (%X)", msg);
> >+      NS_WARNING(szBuf);
> 
> This should have some kind of #ifdef DEBUG wrapping because there's no need to
> do the sprintf in release builds. And I think NS_ERROR is more appropriate,
> that should pop a dialog box.

Put this in an ifdef debug, if we run into problems with testers we might consider putting it in a release build temporarily. Knowing the message number is useful. NS_ERROR wouldn't work though, I pass certain send events here untouched that don't interfere with windowing.
 
> 
> >+  return true;
> 
> Why is this returning a value? Doesn't look like you ever return anything other
> than true... Did you forget a |return false| in the error case?

I pulled this, originally it was for dumping certain events to DefWndProc, but as it turned out that wasn't unneeded.

> 
> >+  if (gInnerEventLoopRunning && mPainting)
> 
> Ok, I'm sold - let's make this a RPCChannel static getter, please.

done.
Attached patch spin msg loop wip v.9 (obsolete) — Splinter Review
Attachment #425812 - Attachment is obsolete: true
Attachment #425930 - Flags: review?(jones.chris.g)
Attachment #425812 - Flags: review?(jones.chris.g)
Attachment #425812 - Flags: review?(bent.mozilla)
Attachment #425930 - Flags: review?(bent.mozilla)
Comment on attachment 425930 [details] [diff] [review]
spin msg loop wip v.9

Disclaimer again: I'm only reviewing the non-windows *Channel changes.

>diff --git a/ipc/glue/RPCChannel.h b/ipc/glue/RPCChannel.h
>--- a/ipc/glue/RPCChannel.h
>+++ b/ipc/glue/RPCChannel.h
>@@ -81,17 +81,39 @@ public:
>+    static void PluginModalLoopIsActive(bool aIsActive) {
>+        if (aIsActive)
>+            sInnerEventLoopDepth++;
>+        else
>+            sInnerEventLoopDepth--;
>+        NS_ASSERTION(!(sInnerEventLoopDepth < 0),
>+            "sInnerEventLoopDepth dropped below zero!");
>+    }
>+

Nit: this is generic code, for better or worse, not really plugin
specific.  I'd go with "ModalLoopIsActive()".  We might need to use
some of these hacks for other e10s work.

Nit 2: I also prefer separate "EnterModalLoop()/ExitModalLoop()"
methods.
Attachment #425930 - Flags: review?(jones.chris.g) → review+
Attached patch spin msg loop wip v.10 (obsolete) — Splinter Review
updated to comments
Attachment #425930 - Attachment is obsolete: true
Attachment #425987 - Flags: review?(bent.mozilla)
Attachment #425930 - Flags: review?(bent.mozilla)
Attachment #425987 - Attachment is patch: true
Comment on attachment 425987 [details] [diff] [review]
spin msg loop wip v.10

>+PluginInstanceChild::PumpTimerProc(HWND hwnd,

Nit: extra newline after this method

>+PluginInstanceChild::NestedInputEventHook(int nCode,
>+                                          WPARAM wParam,
>+                                          LPARAM lParam)
>+{
>+    if (!gTempChildPointer) {
>+        NS_ERROR("!gTempChildPointer");
>+        return CallNextHookEx(NULL, nCode, wParam, lParam);
>+    }

I think this should only appear inside the |nCode >= 0| block. Popping a dialog in a hook when we're not supposed to do any processing sounds... bad.

>+  gTempChildPointer = this;

Seems like we should assert !gTempChildPointer before setting here. Just to be safe. Or, actually... Since we can reenter here maybe you should save the old value of gTempChildPointer on the stack and set it back on unwind like we do with SetNestableTasksAllowed?

>+PluginInstanceParent::RecvSetNestedEventState(const bool& aState)
> ...
>+    return true;

Can you move this inside the OS_WIN block?

>+    static void ExitModalLoop() {
>+        sInnerEventLoopDepth--;
>+        NS_ASSERTION(!(sInnerEventLoopDepth < 0),
>+            "sInnerEventLoopDepth dropped below zero!");
>+    }

Ha! Let's assert |sInnerEventLoopDepth >= 0| :)
Attached patch spin msg loop wip v.10 (obsolete) — Splinter Review
Attachment #425987 - Attachment is obsolete: true
Attachment #425987 - Flags: review?(bent.mozilla)
Attachment #421141 - Attachment is obsolete: true
Attached patch merged to m-cSplinter Review
doesn't look like bsmedberg's draw patch is on m-c.
Attachment #426067 - Attachment is obsolete: true
Comment on attachment 426069 [details] [diff] [review]
merged to m-c

r=me with the little assertion fix we talked about on irc.
Attachment #426069 - Flags: review+
Attachment #426351 - Flags: review?(jmathies)
Attachment #426351 - Flags: review?(jmathies) → review+
When we first landed this, I had some stack check code in here that returned early from WaitForNotify() if there were pending messages that could be processed and no corresponding event loop message. When we first landed the hang detector, this caused rpc assertions that aborted, so I pulled to stack checks. Later we ended up pulling the asserts due to other issues, which left this patch with a nasty hang due to a missing final gEventLoopMessage when the stack unwinds from spin loop. Rather than put the stack check code back in, we might as well just fall out of wfn() early, rpc channel will call back in if it needs to. This code could be put back in if we ever decide to fix bug 545338.
Attachment #426351 - Attachment is obsolete: true
Attachment #426499 - Flags: review?(bent.mozilla)
Attachment #426499 - Flags: review?(bent.mozilla) → review+
This may also address: 

bug 543479
bug 539835
bug 545544
Verified Fixed on Trunk.

I do notice that the browser can hang on shutdown with one or more plugins open, but that could just be a fluke on my end or a separate bug.  It does appear to have addressed the three bugs in the last comment.

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.3a2pre) Gecko/20100218 Minefield/3.7a2pre (.NET CLR 3.5.30729) ID:20100218045818
Status: RESOLVED → VERIFIED
Flags: in-litmus?
Blanket approval for Lorentz merge to mozilla-1.9.2
a=beltzner for 1.9.2.4 - please make sure to mark status1.9.2:.4-fixed
We had a bad merge on this and ended up with duplicated code in the OnPaint handler of Windows. (Lines 213 and 247)

http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindowGfx.cpp#211
Attachment #529780 - Flags: review?(benjamin)
Attachment #529780 - Flags: review?(benjamin) → review+
(In reply to comment #41)
> Created attachment 529780 [details] [diff] [review] [review]
> remove duplicate code
> 
> We had a bad merge on this and ended up with duplicated code in the OnPaint
> handler of Windows. (Lines 213 and 247)
> 
> http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindowGfx.
> cpp#211

http://hg.mozilla.org/mozilla-central/rev/495dc2761754
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.