Closed Bug 1000364 Opened 11 years ago Closed 11 years ago

crash in mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) | mozilla::ipc::MessageChannel::InterruptCall(IPC::Message*, IPC::Message*)

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows NT
defect
Not set
critical

Tracking

(firefox29- wontfix, firefox30+ verified, firefox31+ verified, firefox32 verified, b2g-v1.4 fixed)

VERIFIED FIXED
mozilla32
Tracking Status
firefox29 - wontfix
firefox30 + verified
firefox31 + verified
firefox32 --- verified
b2g-v1.4 --- fixed

People

(Reporter: jbecerra, Assigned: away)

Details

(Keywords: crash, reproducible, Whiteboard: DUPEME)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is report bp-683ec0c1-8856-4233-bdc6-326ad2140423. ============================================================= The QT plugin crashes while testing bug 1000108. I'm using a Surface Pro 2 (hiDPI) and QuickTime 7.7.5 Steps: 1. On Fx29 and above to to trailers.apple.com 2. Try to play any of the videos - If prompted to allow the QT to load, opt to allow. Expected: You can play the trailers for any of the movies listed Actual: The video seems to load and then the plugin crashes. I tried this on Fx29b2, b9, RC as well as nightly. Playing the video by going to directly to its link (http://movietrailers.apple.com/movies/ifc_films/luckythem/luckythem-tlr1_720p.mov) works. It's just within the trailers.apple.com page that the plugin crashes. It works in Firefox 28.
"cannot issue Interrupt call whiel blocked on sync or urgent" http://hg.mozilla.org/releases/mozilla-beta/annotate/da279b9f523a/ipc/glue/MessageChannel.cpp#l730 I don't see any pending sync/urgent request, although the stack is broken without Qt symbols. Please don't ever file bugs in the Plugins product. dmajor, you wanted a fun debugging challenge: can you take a look at this?
Component: QuickTime (Apple) → Plug-ins
Flags: needinfo?(dmajor)
Product: Plugins → Core
Version: 7.x → unspecified
Confirmed repro on latest nightly. Looking.
Flags: needinfo?(dmajor)
IPC_ASSERT(!AwaitingSyncReply() && !AwaitingUrgentReply(), "cannot issue Interrupt call whiel blocked on sync or urgent"); bool AwaitingSyncReply() const { mMonitor->AssertCurrentThreadOwns(); return mPendingSyncReplies > 0; } +0x034 mPendingSyncReplies : 1
(In reply to juan becerra [:juanb] from comment #0) > It works in Firefox 28. 28 also crashes for me, in the same way as 31. These were my steps: 1. Load http://trailers.apple.com/trailers/ifcfilms/luckythem/ 2. Click "Play" under "Trailer" 3. Leave mouse outside the video area. It plays OK. 4. Mouse over the video area. It crashes within a second or two. According to supersearch this started in v27. Juan can you try 28 again to confirm we're seeing the same thing? Maybe your last attempt didn't have the mouse in the video area?
Flags: needinfo?(jbecerra)
mPendingSyncReplies is 1 because there is a MessageChannel::Send frame on the stack. Below is a reconstructed stack from a debug no-opt build. Ben, is this stuff supposed to happen? There is a "nonqueued" message being processed during PeekMessage and coming back into xul, tripping an IPC_ASSERT(!AwaitingSyncReply()) in the plugin process. I see several changes in this area for v27, bug 901789 and friends, which matches up with the rough timeframe that this started. KERNELBASE!DebugBreak xul!RealBreak xul!NS_DebugBreak xul!mozilla::ipc::MessageChannel::DebugAbort xul!mozilla::ipc::MessageChannel::InterruptCall xul!mozilla::ipc::MessageChannel::Call xul!mozilla::plugins::PPluginInstanceChild::CallNPN_GetValue_NPNVWindowNPObject xul!mozilla::plugins::PluginInstanceChild::InternalGetNPObjectForValue xul!mozilla::plugins::PluginInstanceChild::NPN_GetValue xul!mozilla::plugins::child::_getvalue [...Quicktime frames...] USER32!_InternalCallWinProc USER32!UserCallWinProcCheckWow USER32!DispatchClientMessage USER32!__fnDWORD ntdll!KiUserCallbackDispatcher USER32!PeekMessageW xul!mozilla::ipc::MessageChannel::WaitForSyncNotify xul!mozilla::ipc::MessageChannel::SendAndWait > xul!mozilla::ipc::MessageChannel::Send xul!mozilla::plugins::PPluginInstanceChild::SendShow xul!mozilla::plugins::PluginInstanceChild::ShowPluginFrame xul!mozilla::plugins::PluginInstanceChild::InvalidateRectDelayed xul!DispatchToMethod<mozilla::plugins::PluginInstanceChild,void (__thiscall mozilla::plugins::PluginInstanceChild::*)(void)> xul!RunnableMethod<mozilla::plugins::PluginInstanceChild,void (__thiscall mozilla::plugins::PluginInstanceChild::*)(void),Tuple0>::Run xul!MessageLoop::RunTask xul!MessageLoop::DeferOrRunPendingTask xul!MessageLoop::DoWork xul!base::MessagePumpForUI::DoRunLoop xul!base::MessagePumpWin::RunWithDispatcher xul!base::MessagePumpWin::Run xul!MessageLoop::RunInternal xul!MessageLoop::RunHandler xul!MessageLoop::Run xul!XRE_InitChildProcess plugin_container!NS_internal_main plugin_container!wmain plugin_container!__tmainCRTStartup plugin_container!wmainCRTStartup KERNEL32!BaseThreadInitThunk ntdll!__RtlUserThreadStart ntdll!_RtlUserThreadStart
Flags: needinfo?(bent.mozilla)
We're supposed to have hooked the window procedure for the child window there, see the comment at http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#798 Assuming that our hook is succeeding we may just need to add this Quicktime window class to the whitelist in http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#353
Flags: needinfo?(bent.mozilla)
Which window is this message being sent to? And is our mGlobalCallWndProcHook catching this and letting it through anyway? What is the message ID?
(In reply to David Major [:dmajor] (UTC+12) from comment #4) > (In reply to juan becerra [:juanb] from comment #0) > > It works in Firefox 28. > > 28 also crashes for me, in the same way as 31. These were my steps: > 1. Load http://trailers.apple.com/trailers/ifcfilms/luckythem/ > 2. Click "Play" under "Trailer" > 3. Leave mouse outside the video area. It plays OK. > 4. Mouse over the video area. It crashes within a second or two. > > According to supersearch this started in v27. > > Juan can you try 28 again to confirm we're seeing the same thing? Maybe your > last attempt didn't have the mouse in the video area? Having the mouse hover over the window area or player controls didn't seem to make a difference in my case. I cannot reproduce the plugin crash on Fx28 on this machine. It's running Windows 8.1. I also tried this on an old Mac Mini running Windows 7. Fx27 and Fx28 work in that machine as well, but it took a few more times to see the crash on Fx29.
Flags: needinfo?(jbecerra)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > Which window is this message being sent to? And is our > mGlobalCallWndProcHook catching this and letting it through anyway? What is > the message ID? The message is 0xc1f3. I see some references to CallWindowProcHook on the stack but I'm not sure if it's actually being called or if those are just leftovers. The address that user32 passes around on the stack points to a function preamble in QT code which I assume is their wndproc. So I guess that means it was sent to their window? By the way, which hook (if any) is supposed to be used here? Nearby in the codebase I see both a CallWindowProcHook and a CallWindowProcedureHook.
Assignee: nobody → dmajor
I'm seeing two names come through WindowIsDeferredWindow, "QTNSHIDDEN" and "QTIdle: NNN" where NNN are three hex digits that change with each launch of the browser (but appear unrelated to the HWND value). It looks like the QTIdle message causes a message to QTNSHIDDEN, which causes the call back into xul that hits the assertion. (Neutering QTIdle also stops QTNSHIDDEN from reaching WindowIsDeferredWindow, but not vice versa) I can stop the crashes by adding either class name to WindowIsDeferredWindow, but I don't really know if that's a proper fix or just breaking stuff.
> "QTIdle: NNN" where NNN are three hex digits that change with each launch of > the browser (but appear unrelated to the HWND value). Ah, those hex digits are the ID of plugin-container.exe's main thread.
ISTM that our options are: 1) figure out a way to block/delay this message while we're in sync dispatch using hooks. 2) Neuter most or all of NPAPI while we're using sync dispatch. I do think that neutering is the more expedient approach. What I'm not sure is why we aren't neutering all windows. Jmathies, do you remember context about that? Also, did anything about the sync dispatch actually change around bug 901789 ? Or is this just subtle timing issues exposing a latent bug?
Flags: needinfo?(jmathies)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12) > ISTM that our options are: > > 1) figure out a way to block/delay this message while we're in sync dispatch > using hooks. > 2) Neuter most or all of NPAPI while we're using sync dispatch. > > I do think that neutering is the more expedient approach. What I'm not sure > is why we aren't neutering all windows. Jmathies, do you remember context > about that? > > Also, did anything about the sync dispatch actually change around bug 901789 > ? Or is this just subtle timing issues exposing a latent bug? Ben had it right in comment 6 - we neuter our windows (based on string serches of the class name) but we don't neuter 3rd party windows except in the few cases where a message sent to a plugin window tirggered a re-entrant call that caused a lock or crash - http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#391 We can't neuter everything because we don't know what plugins create, so we've only dealt with issues we know of. As you can see we haven't had a lot of problems with this. (I also think you're going to break plugins in subtle ways if we just white list everything we know of in plugins like acrobat and flash.) We should make sure the call that crashes is originating from a 3rd party windowing event. In a couple cases moz engineers have added windows with weird class names and not updated the WindowsMessageLoop causing problems. If it is a 3rd party window, try adding the window in question to the deferred detection routine to see if it fixes it. You might have to add additional message handlers if you need to queue up an event our deferred message processing doesn't handle, although that's unlikely - http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#164
Flags: needinfo?(jmathies)
(In reply to Jim Mathies [:jimm] from comment #13) > We should make sure the call that crashes is originating from a 3rd party > windowing event. In a couple cases moz engineers have added windows with > weird class names and not updated the WindowsMessageLoop causing problems. > > If it is a 3rd party window, try adding the window in question to the > deferred detection routine to see if it fixes it. You might have to add > additional message handlers if you need to queue up an event our deferred > message processing doesn't handle, although that's unlikely - These are Quicktime windows. First there is a "QTIdle" window class getting a nonqueued message. If I add that to WindowIsDeferredWindow here: http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#353 then, NeuterWindowProcedure's call to SetWindowLongPtr here: http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#452 returns some strange return value for the old wndproc, of the form 0xffff????. That doesn't look like a valid wndproc address but it doesn't seem to cause any harm. Eventually we reach NeuteredWindowProc and ProcessOrDeferMessage: http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#164 where the uMsg parameter is in the 0xC000 range from RegisterWindowMessage("QTGenerateNullEvents"). If I don't neuter QTIdle, then there is another nonqueued message that comes through for the QTNSHIDDEN class. This one also has a strange 0xffff???? wndproc address. When the message reaches ProcessOrDeferMessgae, the uMsg is 0x7fff (WM_APP-1). Is neutering one or both of these wndprocs the right thing to do? Would I also need to create a new DeferredSendMessage for them?
Flags: needinfo?(jmathies)
(In reply to David Major [:dmajor] (UTC+12) from comment #14) > returns some strange return value for the old wndproc, of the form > 0xffff????. That doesn't look like a valid wndproc address but it doesn't > seem to cause any harm. I think what you're getting here is a psuedo handle caused by a mismatched unicode/ansi window procedure / hook. I think this is ok as windows handles the translation under the hood. > Eventually we reach NeuteredWindowProc and > ProcessOrDeferMessage: > > http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop. > cpp#164 > > where the uMsg parameter is in the 0xC000 range from > RegisterWindowMessage("QTGenerateNullEvents"). > > If I don't neuter QTIdle, then there is another nonqueued message that comes > through for the QTNSHIDDEN class. This one also has a strange 0xffff???? > wndproc address. When the message reaches ProcessOrDeferMessgae, the uMsg is > 0x7fff (WM_APP-1). Odd, but that's not "wrong" since it's in the WM_USER band - http://msdn.microsoft.com/en-us/library/windows/desktop/ms644930(v=vs.85).aspx > Is neutering one or both of these wndprocs the right thing to do? Would I > also need to create a new DeferredSendMessage for them? We don't have a WM_USER+n handler, and this might be hard to support. What happens if you just drop that message? Also, what's in wParam and lParam? Do they look like they carry some sort of data (like a 'this' pointer or buffer)? We could defer these, although it's a bit risky since we don't know what they carry. Another thing we could look at is failing plugin NPN_GetValue calls when we're in a sync call. Although dumping that WM_APP-1 message might be simplest assuming it doesn't totally break quicktime.
Flags: needinfo?(jmathies)
Attached patch Drop WM_APP-1 (obsolete) — Splinter Review
Is this what you had in mind? From brief testing I don't see anything obviously broken with Quicktime, but that's not enough to make me feel comfortable. Is there a QT expert who can test this further?
Attachment #8417196 - Flags: feedback?(jmathies)
Forgot to mention: the wParam/lParam for this message are always 0, at least in the cases I've seen.
Too late for 29 but tracking for 30 in case a patch lands during this cycle.
Comment on attachment 8417196 [details] [diff] [review] Drop WM_APP-1 Yep, that's basically the fix.
Attachment #8417196 - Flags: feedback?(jmathies) → feedback+
Juan, when this build finishes, can you spend some time with it and tell me what I've broken? (Or if there is a more appropriate QuickTime/Plugins expert then feel free to pass it on) https://tbpl.mozilla.org/?tree=Try&rev=cd37d3318b11 For comparison, the baseline for that patch is: ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/05/2014-05-05-03-02-02-mozilla-central/
Flags: needinfo?(jbecerra)
I haven't been able to crash the QT plugin using the try build above. I am able to play the trailer videos, in different resolutions (after reloading the page) with no problems. I also tried playing Flash videos, in a similar way as the QT videos. In comparison, the nightly build from central crashes almost instantly while I play any of the trailers and I hover the mouse cursor.
Flags: needinfo?(jbecerra)
Attached patch Drop WM_APP-1Splinter Review
Just some comment tweaks. I still feel a little hesitant, not being an expert on this code. Jim is there any testing beyond Juan's comment 21 that you'd like to see done before proceeding? (Keeping in mind that this may be uplifted to aurora and/or beta)
Attachment #8418438 - Flags: review?(jmathies)
Comment on attachment 8418438 [details] [diff] [review] Drop WM_APP-1 One thing that could make it safer is to only drop this message if the class matches the QT one...
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #23) > Comment on attachment 8418438 [details] [diff] [review] > Drop WM_APP-1 > > One thing that could make it safer is to only drop this message if the class > matches the QT one... Was thinking about that too. Although I think this is still pretty safe since our list of deferred 3rd party windows is pretty small: flash and silverlight fullscreen windows and google earth message windows.
Attachment #8418438 - Flags: review?(jmathies) → review+
Attachment #8417196 - Attachment is obsolete: true
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/321376574687 I'll request some testing on those plugins and keep an eye out on the forums.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Juan can you verify the QT fix on mozilla-central and play around with Flash/Silverlight/Google Earth to make sure they haven't regressed? For ideas on things to try, you can find various STR scattered around bug 626975 and bug 550784.
Flags: needinfo?(jbecerra)
Please nominate for uplift, we can wait for Juan's input before approving but good to have it ready.
Comment on attachment 8418438 [details] [diff] [review] Drop WM_APP-1 [Approval Request Comment] Bug caused by (feature/regressing bug #): Caused by an interaction between plugin-container and QuickTime. Crash-stats says this has been happening since FF 27. User impact if declined: Crashes when playing QT videos Testing completed (on m-c, etc.): This has been on m-c for a few days with nothing alarming on the forums. I'd like to wait for the targeted testing in comment 27 before proceeding. Risk to taking this patch (and alternatives if risky): Uncertain. There may be cases where QuickTime does not react well to our discarding their messages. At first glance, things seem okay, but it's possible we may have missed something. String or IDL/UUID changes made by this patch: None
Attachment #8418438 - Flags: approval-mozilla-beta?
Attachment #8418438 - Flags: approval-mozilla-aurora?
Attachment #8418438 - Flags: approval-mozilla-beta?
Attachment #8418438 - Flags: approval-mozilla-beta+
Attachment #8418438 - Flags: approval-mozilla-aurora?
Attachment #8418438 - Flags: approval-mozilla-aurora+
Comment on attachment 8418438 [details] [diff] [review] Drop WM_APP-1 Let's wait for the results of testing m-c. Resetting the flags so that this doesn't get auto-uplifted by the scripts.
Attachment #8418438 - Flags: approval-mozilla-beta?
Attachment #8418438 - Flags: approval-mozilla-beta+
Attachment #8418438 - Flags: approval-mozilla-aurora?
Attachment #8418438 - Flags: approval-mozilla-aurora+
I have been testing this on Win7 and Win8 with the latest nightly, comparing the behavior to nightlies before 5/7, while using Flash/Silverlight/QT/Google Earth in different sites, but mostly Netflix, Hulu, iTunes trailer site, also taking into account some of the scenarios in bug 626975 and bug 550784, like using multiple monitors, clicking through adds, changing focus between windows. I haven't been able to crash the QT plugin on the latest nightly, and I haven't seen any change in behavior between nightlies with the other plugins (they work in basic use cases). The only thing I can note is that QT is a little choppy at higher resolutions, but this is no different than before the fix. I also spot checked on Mac for good measure.
Flags: needinfo?(jbecerra)
Attachment #8418438 - Flags: approval-mozilla-beta?
Attachment #8418438 - Flags: approval-mozilla-beta+
Attachment #8418438 - Flags: approval-mozilla-aurora?
Attachment #8418438 - Flags: approval-mozilla-aurora+
Keywords: verifyme
Reproduced the crash a few times with the STR from the description using Firefox 30 beta 3 Firefox 29.0.1 on a Surface Pro 2 device running Windows 8.1 . I was not able to crash using Firefox 30 beta 5 (build ID: 20140515140857) on several attempts. I will leave this bug Resolved for a couple of days more so I can monitor the crash reports and assure that no crashes with this signature are submitted for the fixed versions.
There still are several crashes for Beta during the last week: - 30.0b4: 19 crashes - 30.0b5: 9 crashes - 30.0b3: 10 crashes However, the crash numbers for Nightly and Aurora have decreased for the same time period: - Aurora 31.0a2: 4 crashes - Nightly 32.0a1: 1 crash
No crashes for Aurora 31.0a2 during the last week. The things are not quite good for Firefox 30 beta as there still are several crashes reported. - 30.0b7: 11 crashes I'm concerned that the number of crashes will greatly increase if this heads to the release in this state. Any thoughts?
Flags: needinfo?(dmajor)
The fix will not make the crash signature disappear completely. Other plugins can cause the same crash signature. The goal of this bug was only to fix the specific crash with Quicktime.
Flags: needinfo?(dmajor)
Thank you David! I wasn't able to crash the Quicktime plugin on Firefox 30 Beta 7 (build ID: 20140522105902) and Beta 8 (build ID: 20140527133511) to obtain this signature. Marking this issue verified and will continue to monitor Socorro.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: