<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Comment 1

•

16 years ago

I can do some testing on this as I have a tablet handy from our touch input work.

Blocks: OOPP

Assignee

Comment 2

•

16 years ago

Related stack: http://crash-stats.mozilla.com/report/index/862f7a05-232b-4297-9184-c821d2100418

Assignee

Comment 3

•

16 years ago

another: http://crash-stats.mozilla.com/report/index/a523936d-4a1b-46c1-ac1f-00ae42100414

timeless

Updated

•

16 years ago

Component: Plug-ins → Silverlight (Microsoft)

Product: Core → Plugins

QA Contact: plugins → microsoft-silverlight

Version: Trunk → unspecified

Assignee

Updated

•

16 years ago

Summary: crash [@ ntdll.dll@0x200fd ] → crash [@ ntdll.dll@0x200fd ] Crashes in TSF/Silverlight on tablets

Assignee

Updated

•

16 years ago

Component: Silverlight (Microsoft) → Plug-ins

Product: Plugins → Core

QA Contact: microsoft-silverlight → plugins

Summary: crash [@ ntdll.dll@0x200fd ] Crashes in TSF/Silverlight on tablets → crash [@ ntdll.dll@0x200fd ] Crashes in TSF with Flash/Silverlight on tablets

Version: unspecified → Trunk

Comment 4

•

15 years ago

Blocking? Based on hang stats, this comes up as about 3.5% of our hangs from yesterday. I'm not so worried about that number directly, but I am worried that people with tablets or other special system drivers will either get this hang all the time, or a significant portion of the time.

blocking1.9.2: --- → ?

https://crash-stats.mozilla.com/report/index/83385e1c-56c7-4730-9737-978bb2100505

Comment 5

•

15 years ago

Updated

•

15 years ago

Summary: crash [@ ntdll.dll@0x200fd ] Crashes in TSF with Flash/Silverlight on tablets → crash [@ ntdll.dll@0x200fd ] [@ hang | KiFastSystemCallRet | NtUserPeekMessage]Crashes in TSF with Flash/Silverlight on tablets

juan becerra [:juanb]

Comment 6

•

15 years ago

I've this today on a Windows 7 vm, so it may not be on tablets only: http://crash-stats.mozilla.com/report/index/bp-7269b4df-783f-4974-9b89-73fac2100506 I was trying to reproduce bug 563936 when I encountered this hang. I had an existing Silverglight 3 installation, and I uninstalled it using the option in the Control Panel. Then I ran a Silverlight 4 installer I had previously downloaded. Then I launched Fx 3.6.4build3 and tried to play one of the videos from the link below, then while it seemed to be loading I tried right clicking to bring up the context menu. Generally that's how I got into that hang. The browser window froze, but I was able to launch another window by double clicking on my Fx desktop icon. http://www.silverlight.net/learn/videos/silverlight-4-videos/

Comment 7

•

15 years ago

That's a different bug yet again, and worth filing separately. Let's keep this one focused on OLE32 hangs where it's spinning a nested PeekMessage loop to handle multi-process COM.

Summary: crash [@ ntdll.dll@0x200fd ] [@ hang | KiFastSystemCallRet | NtUserPeekMessage]Crashes in TSF with Flash/Silverlight on tablets → crash [@ ntdll.dll@0x200fd ] [@ hang | KiFastSystemCallRet | NtUserPeekMessage]Crashes in TSF with Flash/Silverlight on tablets (OLE/COM nested event loop)

Comment 9

•

15 years ago

From the duplicate bug, this happens when trying to exit fullscreen videos on Spike.com, flash 10.1

Comment 10

•

15 years ago

->me for investigation, I don't know how we'd know that this nested loop is entered when it only peeks the special OLE messages.

Assignee: nobody → benjamin

Comment 11

•

15 years ago

Not sure if this is biting us but it sure doesn't fill me with confidence: "Programs May Function Erratically or Close Unexpectedly on the Tablet PC" http://support.microsoft.com/kb/827220

Comment 12

•

15 years ago

Note possibly similar chrome bugs: http://code.google.com/p/chromium/issues/detail?id=3011 http://code.google.com/p/chromium/issues/detail?id=29138

Comment 13

•

15 years ago

Just to clarify, the basic cause of this hang is that CARET::UpdateMSAAEditFieldState (tiptsf.dll) is calling into accessibility code. That code is calling QueryInterface on a COM object which is hosted in a different process. The OLE RPC code communicates using windows messages and enters a nested event loop of sorts here: CRpcChannelBuffer::SwitchAptAndDispatchCall WdtpInterfacePointer_UserUnmarshalWorker ModalLoop CCliModalLoop::BlockFn That code is *probably* waiting for Mozilla to respond to the OLE call, but Mozilla is blocked. It doesn't appear that the plugin is processing the incoming RPC call when the hangs happens, although I'm not sure of that because the stack gets lost in Flash. This means the nested event loop detection mechanism that normally operates within an RPC call is not active, and this is basically just a race where both processes are waiting for RPC responses from the other process, using different RPC mechanisms.

Comment 14

•

15 years ago

That's very helpful thanks. (Aside: the stacks I posted seemed similar to the duped spike.com bug)

nightman362

Comment 15

•

15 years ago

My crash bug is on an HP touchsmart 300-1000 all in one PC. It has a touchscreen, but I was not using it. To my knowlege, I was not running any other program, such as IE. My background programs were: Rainmeter COMODO internet security Microsoft Security Essentuals ATI Caytlist control center v. 10.2

nightman362

Comment 16

•

15 years ago

I meant to say I was not using any other browser. Firefox's UI was also slow, and CPU Usage spiked at 50-60%, while on Opera and Chrome, CPU usage was a good 10-20%. I also had Task manager open, Firefox, and not Flash, via the plugin container was using the extra CPU.

Andy Rivas

Comment 17

•

15 years ago

Does someone have a reproducible crash with a Silvelright application in the stack?

Comment 18

•

15 years ago

STR: * Open the Tablet PC Input Panel and write something in it * Launch Firefox, go to spike.com * View one of the movie trailiers, open it full-screen * click out of full-screen mode I still really really don't know how to fix it.

chris hofmann

Reporter

Comment 19

•

15 years ago

(In reply to comment #11) > Not sure if this is biting us but it sure doesn't fill me with confidence: > "Programs May Function Erratically or Close Unexpectedly on the Tablet PC" > http://support.microsoft.com/kb/827220 maybe the same, but that seems to be talking about an XP tablet version. The ntdll.dll@0x200fd is 100% Windows NT 6.1.7600 (windows 7) for all 152 reports we have seen since the first of may. The hang | KiFastSystemCallRet signature is spread over a larger number of OS versions. 8305 hang | KiFastSystemCallRet Windows NT 5.1.2600 2081 hang | KiFastSystemCallRet Windows NT 6.0.6002 1937 hang | KiFastSystemCallRet Windows NT 6.1.7600 575 hang | KiFastSystemCallRet Windows NT 6.0.6001 390 KiFastSystemCallRet Windows NT 5.1.2600 339 hang | KiFastSystemCallRet Windows NT 6.0.6000 49 hang | KiFastSystemCallRet Windows NT 6.1.7100 23 hang | KiFastSystemCallRet Windows NT 5.2.3790 10 KiFastSystemCallRet Windows NT 6.0.6002 8 KiFastSystemCallRet Windows NT 6.1.7600 7 KiFastSystemCallRet Windows NT 6.0.6001 4 KiFastSystemCallRet Windows NT 6.0.6000

Comment 20

•

15 years ago

Yeah, this is not a crash, and isn't related to that KB article or the chromium crashes much.

Updated

•

15 years ago

status1.9.2: --- → wanted

Comment 21

•

15 years ago

http://crash-stats.mozilla.com/report/index/52248f1d-747e-4d94-9dc5-2828f2100507 Interesting but normal stack soon before the hang, in the parent process: > xul.dll!nsAccessibleWrap::QueryInterface(iid={...}, ppv=0x003dea34) Line 107 C++ xul.dll!nsHyperTextAccessibleWrap::QueryInterface(iid={...}, ppv=0x003dea34) Line 49 C++ xul.dll!nsDocAccessibleWrap::QueryInterface(iid={...}, ppv=0x003dea34) Line 90 C++ ole32.dll!CDestObjectWrapper::MarshalInterface() ole32.dll!_CoMarshalInterface@24() OLEACC.dll!MarshalInterface() OLEACC.dll!_LresultFromObject@12() xul.dll!nsWindow::LresultFromObject(riid={...}, wParam=0x00000b80, pAcc=0x049d055c) Line 6515 C++ xul.dll!__SEH_epilog4_GS() C++ xul.dll!nsWindow::WindowProc(hWnd=0x00000001, msg=0x0000003d, wParam=0x00000b80, lParam=0xfffffffc) Line 3725 C++ USER32.dll!_InternalCallWinProc@20() USER32.dll!_UserCallWinProcCheckWow@32() USER32.dll!_DispatchClientMessage@20() USER32.dll!___fnDWORD@4() ntdll.dll!_KiUserCallbackDispatcher@12() USER32.dll!_NtUserPeekMessage@20() USER32.dll!__PeekMessage@24() USER32.dll!_PeekMessageW@20() ole32.dll!CCliModalLoop::MyPeekMessage() ole32.dll!CCliModalLoop::PeekRPCAndDDEMessage() ole32.dll!CCliModalLoop::FindMessage() ole32.dll!CCliModalLoop::HandleWakeForMsg() ole32.dll!CCliModalLoop::BlockFn() ole32.dll!ModalLoop() ole32.dll!ThreadSendReceive() ole32.dll!CRpcChannelBuffer::SwitchAptAndDispatchCall() ole32.dll!CRpcChannelBuffer::SendReceive2() ole32.dll!CCliModalLoop::SendReceive() ole32.dll!CAptRpcChnl::SendReceive() ole32.dll!CCtxComChnl::SendReceive() RPCRT4.dll!_NdrProxySendReceive@8() RPCRT4.dll!@NdrpProxySendReceive@4() RPCRT4.dll!_NdrClientCall2() RPCRT4.dll!_ObjectStublessClient@8() RPCRT4.dll!_ObjectStubless@0() tiptsf.dll!CCPenIMX::UpdateStateSink() tiptsf.dll!CCPenIMX::ActivateUI() tiptsf.dll!CPenIMX_WindowProc() USER32.dll!_InternalCallWinProc@20() USER32.dll!_UserCallWinProcCheckWow@32() USER32.dll!_DispatchMessageWorker@8() USER32.dll!_DispatchMessageW@4() xul.dll!nsAppShell::ProcessNextNativeEvent(mayWait=0x00000000) Line 179 C++

Updated

•

15 years ago

blocking1.9.2: ? → .4+

Comment 22

•

15 years ago

(In reply to comment #13) > It doesn't appear that the plugin is processing the incoming RPC call when the > hangs happens, although I'm not sure of that because the stack gets lost in > Flash. Can we get debugging help from Adobe to confirm this?

Comment 23

•

15 years ago

So, I tried to understand what this bug report is all about, and how I can help with that. So, first, let me try to explain what I understood from this bug report, because if my understanding is not correct, the rest of my message can be fed to birds (or pigs). The stack in comment 0 is coming from our child process. Our child process's message loop is dispatching a message which is handled by the Tablet PC Input control thing, which calls IAccessible::get_accState, which leads to a call being dispatched to our parent process. That is done using a sync COM message to the message loop (on our main thread?) which is probably sent using some clone of SendMessage, which blocks the caller, and hence our child process. However, in the mean time, the parent is also blocked on something (I don't know what that is, is it the child process? I also don't know how we're being blocked, like are we waiting on a handle?), which leads to a deadlock, which users experience as a hang. If that is the case, we're being bitten by the COM threading model suckiness! I think we can fix this using the approach laid out in this thread (read the whole thread): http://www.tech-archive.net/Archive/Development/microsoft.public.win32.programmer.ole/2007-05/msg00070.html To summarize, it means implementing IMessageFilter in our parent process, and registering it with COM so that it doesn't expect us to respond to messages when we can't. Am I on the right track here?

Mike Beltzner [:beltzner, not reading bugmail]

Comment 24

•

15 years ago

Yes, you are! I have a patch which implements this using IMessageFilter::MessagePending, although it currently does it in a hamfisted way by spinning the chromium event loop in the child if an incoming message races with the COM call. Not only does this sound dangerous, but MessageLoop::RunAllPending doesn't work (it never exits). So I'm going to try a more targeted approach which detects RPC races specifically and only processes those.

Comment 25

•

15 years ago

So is Benjamin or Ehsan working on this particular bug? Or, hope against hope, both?

Comment 26

•

15 years ago

(In reply to comment #24) > Yes, you are! > > I have a patch which implements this using IMessageFilter::MessagePending, > although it currently does it in a hamfisted way by spinning the chromium event > loop in the child if an incoming message races with the COM call. Not only does > this sound dangerous, but MessageLoop::RunAllPending doesn't work (it never > exits). So I'm going to try a more targeted approach which detects RPC races > specifically and only processes those. Could you attach your patch, so that I can take a look at it? (In reply to comment #25) > So is Benjamin or Ehsan working on this particular bug? Or, hope against hope, > both? Benjamin is, but I can step in too if needed. I may need someone to hold my hand a bit as I go through our ipc code for the first time, though. :-) Benjamin, do we have a set of STR to reporduce this problem on non-tablet PCs? Because I don't have access to one...

Comment 27

•

15 years ago

comment 18 is still correct, you don't need a tablet PC (and I don't have one either)

Comment 28

•

15 years ago

Attached patch WIP, broken (obsolete) — Details — Splinter Review

I'm seeing some very strange behavior currently, so I tried #if 0 all the interesting code. So you'll need to remove the #if 0 bits.

Comment 29

•

15 years ago

I'm testing with http://www.spike.com/video/getting-comfy-with/3387077

URL: http://www.spike.com/video/getting-co...

Comment 30

•

15 years ago

Some discussions on the problem and possible solution: http://etherpad.mozilla.org:9000/win32-plugin-rpc-deadlock

Comment 31

•

15 years ago

Attached patch Appears to work!, rev. 2 — Details — Splinter Review

Attachment #445127 - Attachment is obsolete: true

Attachment #445185 - Flags: review?(jones.chris.g)

Attachment #445185 - Flags: review?(jmathies)

Attachment #445185 - Flags: review?(ehsan)

Comment 32

•

15 years ago

Comment on attachment 445185 [details] [diff] [review] Appears to work!, rev. 2 >diff --git a/dom/plugins/COMMessageFilter.cpp b/dom/plugins/COMMessageFilter.cpp >+DWORD >+COMMessageFilter::RetryRejectedCall(HTASK htaskCallee, >+ DWORD dwTickCount, >+ DWORD dwRejectType) >+{ >+ if (mPreviousFilter) >+ return mPreviousFilter->RetryRejectedCall(htaskCallee, dwTickCount, >+ dwRejectType); >+ return 0; MSDN says that the default value here is DWORD(-1). I know that this doesn't matter because we hand this off to the previous filter (which is the OS provided one), but can you change this return value for safety? >diff --git a/dom/plugins/PluginModuleChild.cpp b/dom/plugins/PluginModuleChild.cpp >--- a/dom/plugins/PluginModuleChild.cpp >+++ b/dom/plugins/PluginModuleChild.cpp >@@ -59,16 +59,20 @@ > #include "mozilla/plugins/StreamNotifyChild.h" > #include "mozilla/plugins/BrowserStreamChild.h" > #include "mozilla/plugins/PluginStreamChild.h" > #include "mozilla/plugins/PluginThreadChild.h" > #include "PluginIdentifierChild.h" > > #include "nsNPAPIPlugin.h" > >+#ifdef XP_WIN >+#include "COMMessageFilter.h" >+#endif >+ > using namespace mozilla::plugins; > > #if defined(XP_WIN) > #ifndef WM_MOUSEHWHEEL > #define WM_MOUSEHWHEEL 0x020E > #endif > #endif > >@@ -120,16 +124,18 @@ PluginModuleChild::current() > bool > PluginModuleChild::Init(const std::string& aPluginFilename, > base::ProcessHandle aParentProcessHandle, > MessageLoop* aIOLoop, > IPC::Channel* aChannel) > { > PLUGIN_LOG_DEBUG_METHOD; > >+ COMMessageFilter::Initialize(this); >+ Won't this fail to compile on Mac/Linux? You need to wrap it in a XP_WIN check I think. r=me on with these two issues addressed. I only reviewed the messagefilter code thoroughly, and barely reviewed the message loop change. I didn't look at the rest because I don't understand that code anyways. :-)

Attachment #445185 - Flags: review?(ehsan) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

15 years ago

Attachment #445185 - Flags: review?(jmathies) → review+

Comment 33

•

15 years ago

Comment on attachment 445185 [details] [diff] [review] Appears to work!, rev. 2 >diff --git a/ipc/glue/RPCChannel.cpp b/ipc/glue/RPCChannel.cpp >--- a/ipc/glue/RPCChannel.cpp >+++ b/ipc/glue/RPCChannel.cpp > void >+RPCChannel::FlushPendingRPCQueue() >+{ >+ AssertWorkerThread(); >+ mMutex.AssertNotCurrentThreadOwns(); >+ >+ { >+ MutexAutoLock lock(mMutex); >+ >+ if (mDeferred.empty()) { >+ if (mPending.empty()) >+ return; >+ >+ Message last = mPending.back(); This needs to be |const Message& last = mPending.back();| or else you're going to copy the entire message payload. >+ if (!last.is_rpc() || last.is_reply()) >+ return; This test makes me a bit nervous because of Windows wakeup insanity. IIRC it's possible on the browser side to wake up from an RPC wait because of a windows message (in the spin loop) and then dispatch any arbitrary XPCOM task. If the task sends an an async message, say stream data delivery, then the back of the queue might be an async message even though there's an RPC message earlier in the queue. Also note that this wakeup insanity might result in there being more than one RPC message in the queue. >+bool > RPCChannel::OnMaybeDequeueOne() > { > // XXX performance tuning knob: could process all or k pending > // messages here > > AssertWorkerThread(); > mMutex.AssertNotCurrentThreadOwns(); > > Message recvd; > { > MutexAutoLock lock(mMutex); > > if (!Connected()) { > ReportConnectionError("RPCChannel"); >- return; >+ return false; > } > > if (!mDeferred.empty()) > return MaybeProcessDeferredIncall(); > I think this may be a bug carried over from the patch I sent you. If in the wonderful world of Windows insanity the following sequence of events is possible browser plugin ------------ --------------- rpc Foo--> <--rpc Bar [plugin wins] [defer Foo] process Bar asyc A--> [wakeup for A] call Baz--> <--OLE RPC then mDeferred will be non-empty, but (deferred) Foo won't be processed because the time isn't right. Then we'll return here without processing Baz and deadlock on the OLE RPC. So, we could (1) File this theoretical case away until we see it in the wild (2) Change the logic to if (!mDeferred.empty() && MaybeProcessDeferredIncall()) return true; I'm fine with either. >--- a/ipc/glue/RPCChannel.h >+++ b/ipc/glue/RPCChannel.h >@@ -161,16 +161,22 @@ public: > > // Override the SyncChannel handler so we can dispatch RPC > // messages. Called on the IO thread only. > NS_OVERRIDE > virtual void OnMessageReceived(const Message& msg); > NS_OVERRIDE > virtual void OnChannelError(); > >+ /** >+ * When we detect that an outbound OLE RPC call is being made, we may need to unblock the parent >+ * by processing inbound RPC calls (and any preceding asynchronous messages). >+ */ Two points: first, this method name doesn't describe what your patch does, wrt the comment above. Second, for better or worse, this is a generic interface now, and the comment should describe what the method does rather than how it's used. >+ /** >+ * Process one deferred or pending RPC message. >+ * @return true if a message was processed >+ */ >+ bool OnMaybeDequeueOne(); >+ This comment is wrong, async and sync messages are dequeued and processed as well. Leaving r? pending how you want to address the review comments above. As it stands I would r+ with the Message copy fixed and inaccurate comments and method names edited.

Comment 34

•

15 years ago

> >+ if (!last.is_rpc() || last.is_reply()) > >+ return; > > This test makes me a bit nervous because of Windows wakeup insanity. IIRC it's > possible on the browser side to wake up from an RPC wait because of a windows > message (in the spin loop) and then dispatch any arbitrary XPCOM task. Hrm, I really don't think this is possible. The small set of sent messages that we allow to be processed shouldn't ever re-enter XPCOM tasks or send more messages. And we can't enter the full-on nested loop until we've processed the incoming RPC message (in which case it will no longer be pending!) > I think this may be a bug carried over from the patch I sent you. If in the > wonderful world of Windows insanity the following sequence of events is > possible > > browser plugin > ------------ --------------- > rpc Foo--> <--rpc Bar > [plugin wins] [defer Foo] > process Bar > asyc A--> [wakeup for A] > call Baz--> <--OLE RPC > > then mDeferred will be non-empty, but (deferred) Foo won't be processed because > the time isn't right. Then we'll return here without processing Baz and > deadlock on the OLE RPC. So, we could > > (1) File this theoretical case away until we see it in the wild > (2) Change the logic to > > if (!mDeferred.empty() && MaybeProcessDeferredIncall()) > return true; Unless you're extremely confident that #2 won't mess up in-order delivery, I choose #1 here: I think that the chances of a plugin ending up in an OLE call while in an RPC frame are pretty slim to begin with.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 35

•

15 years ago

Attached patch Updated, rev. 2.1 (obsolete) — Details — Splinter Review

Attachment #445354 - Flags: review?(jones.chris.g)

Comment 36

•

15 years ago

(In reply to comment #34) > > the following sequence of events is > > possible > > > > browser plugin > > ------------ --------------- > > rpc Foo--> <--rpc Bar > > [plugin wins] [defer Foo] > > process Bar > > asyc A--> [wakeup for A] > > call Baz--> <--OLE RPC > > > > then mDeferred will be non-empty, but (deferred) Foo won't be processed because > > the time isn't right. Then we'll return here without processing Baz and > > deadlock on the OLE RPC. So, we could > > > > (1) File this theoretical case away until we see it in the wild > > (2) Change the logic to > > > > if (!mDeferred.empty() && MaybeProcessDeferredIncall()) > > return true; > > Unless you're extremely confident that #2 won't mess up in-order delivery, I > choose #1 here: I think that the chances of a plugin ending up in an OLE call > while in an RPC frame are pretty slim to begin with. It wouldn't, but I'm perfectly fine playing it safe here until crash-stats dictates otherwise.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

15 years ago

Attachment #445354 - Flags: review?(jones.chris.g) → review+

Assignee

Comment 37

•

15 years ago

Attached patch merged to trunk (obsolete) — Details — Splinter Review

- added RetryRejectedCall fix

Attachment #445354 - Attachment is obsolete: true

Assignee

Updated

•

15 years ago

Attachment #445185 - Flags: review?(jones.chris.g)

http://hg.mozilla.org/mozilla-central/rev/65af0d77ff88

Assignee

Comment 38

•

15 years ago

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Assignee

Updated

•

15 years ago

Attachment #445725 - Flags: approval1.9.2.4?

Assignee

Comment 39

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/78923ef103cb follow up build bustage fix for linux/osx.

Daniel Veditz [:dveditz]

Comment 40

•

15 years ago

Comment on attachment 445725 [details] [diff] [review] merged to trunk >+ * The Initial Developer of the Original Code is >+ * the Mozilla Foundation <http://www.mozilla.org/>. Almost all the new files I see seem to use Mozilla Corporation, which seems more accurate (though ultimately equivalent). Did we change our policy?

Comment 41

•

15 years ago

The policy has always been to use "The Mozilla Foundation", which is the actual copyright holder/assignee for all MoCo code.

Assignee

Comment 42

•

15 years ago

Backed out due to various test failure issues. This needs some try server goodness first!

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Assignee

Comment 43

•

15 years ago

Bug 566395 - intermittent orange: modules/plugin/test/test_windowed_invalidate.html | Test timed out. Bug 566392 - REFTEST TEST-UNEXPECTED-FAIL | file:///e:/builds/moz2_slave/mozilla-central-win32-debug-unittest-crashtest/build/reftest/tests/layout/base/crashtests/325967-1.html | timed out waiting for reftest-wait to be removed (after onload fired)

Comment 44

•

15 years ago

... and many of the debug mochitest runs had leaks.

Comment 46

•

15 years ago

Comment on attachment 445725 [details] [diff] [review] merged to trunk a=LegNeato for 1.9.2.4. Please land on both mozilla-1.9.2 default and GECKO1924_20100413_RELBRANCH

Attachment #445725 - Flags: approval1.9.2.4? → approval1.9.2.4+

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 47

•

15 years ago

Also please remember to include/land the follow-up bustage fix.

Comment 49

•

15 years ago

Looks like this also caused a 50% (!) Tp4 regression on Windows.

Updated

•

15 years ago

Assignee: benjamin → jmathies

Assignee

Comment 50

•

15 years ago

Looks like this is our current remaining blocker. I'll do a little investigating today to see if there's something obvious regarding the Tp4 regression. If not I suppose we'll have to look for a different approach, possible an api hook into the call chain.

Comment 51

•

15 years ago

The only thing which could realistically effect Tp is posting WM_NULL in MessagePumpForUI::ScheduleWork.

Assignee

Comment 52

•

15 years ago

(In reply to comment #51) > The only thing which could realistically effect Tp is posting WM_NULL in > MessagePumpForUI::ScheduleWork. I was wondering if there might be heavy COM overhead added when installing the filter?

Comment 53

•

15 years ago

The filter is only installed once per plugin process: even if there were heavy overhead, it wouldn't show up in Tp4.

Assignee

Comment 54

•

15 years ago

(In reply to comment #53) > The filter is only installed once per plugin process: even if there were heavy > overhead, it wouldn't show up in Tp4. Couldn't we test that theory by removing it a posting to try for Tp4 runs? bsmedberg, curious if you're working on this or if you want me to look int it? I have the theming work to continue on so i don't want to overlap here..

Comment 55

•

15 years ago

I just got back in, I probably won't have a chance to look at it until tomorrow. If you have theories, please test them.

Comment 56

•

15 years ago

Also, seems strange this would regress only on XP...

Assignee

Comment 57

•

15 years ago

(In reply to comment #54) > (In reply to comment #53) > > The filter is only installed once per plugin process: even if there were heavy > > overhead, it wouldn't show up in Tp4. > > Couldn't we test that theory by removing it a posting to try for Tp4 runs? with PostMessage(WM_NULL): WINNT 6.0 try talos tp4 tp4: 859.03 (details) WINNT 5.1 try talos tp4 tp4: 554.97 (details) without PostMessage(WM_NULL): WINNT 6.0 try talos tp4 tp4: 769.42 (details) WINNT 5.1 try talos tp4 tp4: 443.22 (details) Theory proven!

Comment 58

•

15 years ago

Do we have any data on how many times that message is posted? Also, do we have any data on how expensive handling it is? Can we special case its handling and avoid doing anything (even calling DefWindowProc)?

Comment 59

•

15 years ago

I pushed http://hg.mozilla.org/try/rev/a1d7df2d945f WINNT 6.0 try talos tp4 tp4: 921.81 WINNT 5.1 try talos tp4 tp4: 523.48 That avoids the rest of the patch and just posts WM_NULL and special-cases WM_NULL when we get it in the chromium hidden windowproc. Life is still bad. We post this WM_NULL at the same frequency we post kMsgHaveWork (WM_USER + 1 in chromium-land): chromium is designed so that there is only ever one of these in the queue at a time. I really don't know where to go from here.

Comment 60

•

15 years ago

You're returning 1 from the window proc on WM_NULL. According to the MSDN <http://msdn.microsoft.com/en-us/library/ms632637%28VS.85%29.aspx>, you're supposed to be returning 0 if you "process" the null message (what's there to process anyway?!). I'm not sure what kinds of crazy things this might be triggering inside windows, but it might be worth trying a version of the patch which returns 0.

Comment 61

•

15 years ago

Here is another idea. Try registering a custom message using RegisterWindowMessage and use that message code instead of WM_NULL.

Comment 62

•

15 years ago

(In reply to comment #61) > Here is another idea. Try registering a custom message using > RegisterWindowMessage and use that message code instead of WM_NULL. The reason I'm suggesting this is because it seems that WM_NULL actually has grown to have a meaning in Windows. Nowadays, apparently task manager sends a WM_NULL message to an application's main thread with a timeout to determine if its "hung" or not. It still doesn't explain the tp4 regression, but it suggests to me that there may be custom processing going on inside windows for WM_NULL.

Comment 63

•

15 years ago

Well, the private message we're currently using (WM_USER + 1) doesn't wake up the MsgWaitForMultipleObjects call that the OLE code is blocked on. It doesn't *look* like RegisterWindowMessage would unblock that call either, although I'm not certain.

Comment 64

•

15 years ago

(In reply to comment #63) > Well, the private message we're currently using (WM_USER + 1) doesn't wake up > the MsgWaitForMultipleObjects call that the OLE code is blocked on. It doesn't > *look* like RegisterWindowMessage would unblock that call either, although I'm > not certain. Ah, yes, you're right. :( I spent a lot of time searching the web for possible problems with WM_NULL, and I found multiple places where WM_NULL overflowing the message queue size was discussed. Like this one: <http://trac.wxwidgets.org/ticket/9053>. Could this be an issue? Could we peek the message queue beforehand, and for example log whether there are WM_NULL messages pending to determine if this could be an issue? Also, do we have any profile data on what happens with and without the WM_NULL message sending?

Assignee

Comment 65

•

15 years ago

By posting two messages here, I think we're basically flooding the message queue with useless wm_nulls. (see ProcessPumpReplacementMessage()) A possible solution would be to get rid of the wm_null post message and instead, change out the kMsgHaveWork schedule work message with something that will wake up COM's MsgWaitForMultipleObjects, which wm_user+1 clearly isn't doing.

Comment 66

•

15 years ago

How are we "flooding" it? The chromium logic guarantees that there is only ever one pending kMsgHaveWork at a time, and therefore there should also be only one pending WM_NULL (since they are posted at the same time and processed at roughly the same time).

Assignee

Comment 67

•

15 years ago

(In reply to comment #66) > How are we "flooding" it? The chromium logic guarantees that there is only ever > one pending kMsgHaveWork at a time, and therefore there should also be only one > pending WM_NULL (since they are posted at the same time and processed at > roughly the same time). Look at ProcessPumpReplacementMessage() - http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_pump_win.cc#368 Aren't we always going to get a WM_NULL in that peek call with the current patch? ProcessMessageHelper() is the main dispatch loop. A less intrusive fix than the previous one might be: in ProcessPumpReplacementMessage(), if we receive a wm_null message in that peek, dump it a peek again for the next message?

Nick Thomas [:nthomas] (UTC+12)

Comment 68

•

15 years ago

Is the 50% regression for all users or only users affected buy the original bug? Does our automation use the accessibility apis such that this always manifests for our tests? Would users without accessibility support enabled still see the regression?

Comment 69

•

15 years ago

I talked to Alice about the talos setup on XP and she said the OS install is pretty vanilla, so whatever accessibility settings are default there apply. We also set a few Firefox prefs http://mxr.mozilla.org/mozilla/source/testing/performance/talos/sample.config#29 To summarize the Talos Tp increase - using [1,2] to look up the test results for the revision before the landing in comment #39 (30960da07185) and the bustage fix(78923ef103cb) yields WINNT 5.1 WINNT 6.1 MacOSX 10.5.8 MacOSX 10.6.2 before 283 661 488 491 after 423 628 489 481 And to break that down by site in the suite (for XP) you can look at http://people.mozilla.org/~jmuizelaar/graph-server/detail.html?host=graphs&old=4018370&new=4018400 So that looks like a slowdown across the board, although nine sites are faster. [1] http://graphs.mozilla.org/api/test/runs/revisions?revision=30960da07185 [2] http://graphs.mozilla.org/api/test/runs/revisions?revision=78923ef103cb

Serge Gautherie (:sgautherie)

Assignee

Comment 70

•

15 years ago

I have a patch that may address this. Unfortunately try server seems to be having a fit at the moment, once things clear up I'll get some tp4 stats on my patch. I've got some other ideas too if that doesn't work.

Updated

•

15 years ago

Blocks: 438871

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Whiteboard: [orange]

Assignee

Comment 71

•

15 years ago

(In reply to comment #57) > with PostMessage(WM_NULL): > WINNT 6.0 try talos tp4 > tp4: 769.42 (details) > WINNT 5.1 try talos tp4 > tp4: 443.22 (details) Seems we lost winnt 6.0 try talos runs at a rather inopportune time, but we still have 5.1 builds to compare. I think my slight modification addressed the problem: old patch: WINNT 6.0 try talos tp4 tp4: 859.03 (details) WINNT 5.1 try talos tp4 tp4: 554.97 (details) new patch: WINNT 6.1 tryserver talos tp4 tp4: 626.44 (details) WINNT 5.1 tryserver talos tp4 tp4: 279.93 (details) some other tryserver build: WINNT 6.1 tryserver talos tp4 tp4: 647.96 (details) WINNT 5.1 tryserver talos tp4 tp4: 269.65 (details)

Assignee

Comment 72

•

15 years ago

Attached patch modified patch v.1 — Details — Splinter Review

On try, this experienced one test failure, bug 559981. Not sure if this patch caused that or if it was random orange. I submitted it again to see if it repeats. The addition I made was: MSG msg; bool have_message = (0 != PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)); + + if (have_message && msg.message == WM_NULL) + have_message = (0 != PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)); + DCHECK(!have_message || kMsgHaveWork != msg.message || msg.hwnd != message_hwnd_); Which skips over the null event in ProcessPumpReplacementMessage(). http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_pump_win.cc#367

Attachment #445725 - Attachment is obsolete: true

Attachment #446693 - Flags: review?(benjamin)

Updated

•

15 years ago

Attachment #446693 - Flags: review?(benjamin) → review+

Wayne Mery (:wsmwk)

Updated

•

15 years ago

Severity: normal → critical

http://hg.mozilla.org/mozilla-central/rev/abedbe690837

Comment 73

•

15 years ago

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Attachment #446693 - Flags: approval1.9.2.4?

Updated

•

15 years ago

Target Milestone: --- → mozilla1.9.3a5

Al Billings [:abillings - ex-MoCo]

Comment 74

•

15 years ago

Is this going into 1.9.2 and the 1.9.2.4 release branch today since we were going to start building at 5:00 PM pacific?

Al Billings [:abillings - ex-MoCo]

Assignee

Comment 75

•

15 years ago

(In reply to comment #74) > Is this going into 1.9.2 and the 1.9.2.4 release branch today since we were > going to start building at 5:00 PM pacific? We need a driver to a+ the patch. I'll be happy to land it this afternoon.

Comment 76

•

15 years ago

Comment on attachment 446693 [details] [diff] [review] modified patch v.1 In lieu of any other available parties, I'll put on my driver hat and approve this. We need this for 1.9.2.4 release.

Attachment #446693 - Flags: approval1.9.2.4? → approval1.9.2.4+

Nick Thomas [:nthomas] (UTC+12)

Comment 77

•

15 years ago

default and GECKO1924_20100413_RELBRANCH branches in mozilla-1.9.2 please.