Closed
Bug 1000364
Opened 11 years ago
Closed 11 years ago
crash in mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::MessageChannel::DebugAbort(char const*, int, char const*, char const*, bool) | mozilla::ipc::MessageChannel::InterruptCall(IPC::Message*, IPC::Message*)
Categories
(Core Graveyard :: Plug-ins, defect)
Tracking
(firefox29- wontfix, firefox30+ verified, firefox31+ verified, firefox32 verified, b2g-v1.4 fixed)
People
(Reporter: jbecerra, Assigned: away)
Details
(Keywords: crash, reproducible, Whiteboard: DUPEME)
Crash Data
Attachments
(1 file, 1 obsolete file)
2.08 KB,
patch
|
jimm
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-683ec0c1-8856-4233-bdc6-326ad2140423.
=============================================================
The QT plugin crashes while testing bug 1000108. I'm using a Surface Pro 2 (hiDPI) and QuickTime 7.7.5
Steps:
1. On Fx29 and above to to trailers.apple.com
2. Try to play any of the videos
- If prompted to allow the QT to load, opt to allow.
Expected: You can play the trailers for any of the movies listed
Actual: The video seems to load and then the plugin crashes.
I tried this on Fx29b2, b9, RC as well as nightly.
Playing the video by going to directly to its link (http://movietrailers.apple.com/movies/ifc_films/luckythem/luckythem-tlr1_720p.mov) works. It's just within the trailers.apple.com page that the plugin crashes.
It works in Firefox 28.
Reporter | ||
Updated•11 years ago
|
status-firefox29:
--- → affected
tracking-firefox29:
--- → ?
Comment 1•11 years ago
|
||
"cannot issue Interrupt call whiel blocked on sync or urgent"
http://hg.mozilla.org/releases/mozilla-beta/annotate/da279b9f523a/ipc/glue/MessageChannel.cpp#l730
I don't see any pending sync/urgent request, although the stack is broken without Qt symbols.
Please don't ever file bugs in the Plugins product.
dmajor, you wanted a fun debugging challenge: can you take a look at this?
Component: QuickTime (Apple) → Plug-ins
Flags: needinfo?(dmajor)
Product: Plugins → Core
Version: 7.x → unspecified
Confirmed repro on latest nightly. Looking.
Flags: needinfo?(dmajor)
IPC_ASSERT(!AwaitingSyncReply() && !AwaitingUrgentReply(),
"cannot issue Interrupt call whiel blocked on sync or urgent");
bool AwaitingSyncReply() const {
mMonitor->AssertCurrentThreadOwns();
return mPendingSyncReplies > 0;
}
+0x034 mPendingSyncReplies : 1
(In reply to juan becerra [:juanb] from comment #0)
> It works in Firefox 28.
28 also crashes for me, in the same way as 31. These were my steps:
1. Load http://trailers.apple.com/trailers/ifcfilms/luckythem/
2. Click "Play" under "Trailer"
3. Leave mouse outside the video area. It plays OK.
4. Mouse over the video area. It crashes within a second or two.
According to supersearch this started in v27.
Juan can you try 28 again to confirm we're seeing the same thing? Maybe your last attempt didn't have the mouse in the video area?
Flags: needinfo?(jbecerra)
mPendingSyncReplies is 1 because there is a MessageChannel::Send frame on the stack. Below is a reconstructed stack from a debug no-opt build.
Ben, is this stuff supposed to happen? There is a "nonqueued" message being processed during PeekMessage and coming back into xul, tripping an IPC_ASSERT(!AwaitingSyncReply()) in the plugin process.
I see several changes in this area for v27, bug 901789 and friends, which matches up with the rough timeframe that this started.
KERNELBASE!DebugBreak
xul!RealBreak
xul!NS_DebugBreak
xul!mozilla::ipc::MessageChannel::DebugAbort
xul!mozilla::ipc::MessageChannel::InterruptCall
xul!mozilla::ipc::MessageChannel::Call
xul!mozilla::plugins::PPluginInstanceChild::CallNPN_GetValue_NPNVWindowNPObject
xul!mozilla::plugins::PluginInstanceChild::InternalGetNPObjectForValue
xul!mozilla::plugins::PluginInstanceChild::NPN_GetValue
xul!mozilla::plugins::child::_getvalue
[...Quicktime frames...]
USER32!_InternalCallWinProc
USER32!UserCallWinProcCheckWow
USER32!DispatchClientMessage
USER32!__fnDWORD
ntdll!KiUserCallbackDispatcher
USER32!PeekMessageW
xul!mozilla::ipc::MessageChannel::WaitForSyncNotify
xul!mozilla::ipc::MessageChannel::SendAndWait
> xul!mozilla::ipc::MessageChannel::Send
xul!mozilla::plugins::PPluginInstanceChild::SendShow
xul!mozilla::plugins::PluginInstanceChild::ShowPluginFrame
xul!mozilla::plugins::PluginInstanceChild::InvalidateRectDelayed
xul!DispatchToMethod<mozilla::plugins::PluginInstanceChild,void (__thiscall mozilla::plugins::PluginInstanceChild::*)(void)>
xul!RunnableMethod<mozilla::plugins::PluginInstanceChild,void (__thiscall mozilla::plugins::PluginInstanceChild::*)(void),Tuple0>::Run
xul!MessageLoop::RunTask
xul!MessageLoop::DeferOrRunPendingTask
xul!MessageLoop::DoWork
xul!base::MessagePumpForUI::DoRunLoop
xul!base::MessagePumpWin::RunWithDispatcher
xul!base::MessagePumpWin::Run
xul!MessageLoop::RunInternal
xul!MessageLoop::RunHandler
xul!MessageLoop::Run
xul!XRE_InitChildProcess
plugin_container!NS_internal_main
plugin_container!wmain
plugin_container!__tmainCRTStartup
plugin_container!wmainCRTStartup
KERNEL32!BaseThreadInitThunk
ntdll!__RtlUserThreadStart
ntdll!_RtlUserThreadStart
Flags: needinfo?(bent.mozilla)
We're supposed to have hooked the window procedure for the child window there, see the comment at http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#798
Assuming that our hook is succeeding we may just need to add this Quicktime window class to the whitelist in http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#353
Flags: needinfo?(bent.mozilla)
Comment 7•11 years ago
|
||
Which window is this message being sent to? And is our mGlobalCallWndProcHook catching this and letting it through anyway? What is the message ID?
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to David Major [:dmajor] (UTC+12) from comment #4)
> (In reply to juan becerra [:juanb] from comment #0)
> > It works in Firefox 28.
>
> 28 also crashes for me, in the same way as 31. These were my steps:
> 1. Load http://trailers.apple.com/trailers/ifcfilms/luckythem/
> 2. Click "Play" under "Trailer"
> 3. Leave mouse outside the video area. It plays OK.
> 4. Mouse over the video area. It crashes within a second or two.
>
> According to supersearch this started in v27.
>
> Juan can you try 28 again to confirm we're seeing the same thing? Maybe your
> last attempt didn't have the mouse in the video area?
Having the mouse hover over the window area or player controls didn't seem to make a difference in my case. I cannot reproduce the plugin crash on Fx28 on this machine. It's running Windows 8.1.
I also tried this on an old Mac Mini running Windows 7. Fx27 and Fx28 work in that machine as well, but it took a few more times to see the crash on Fx29.
Flags: needinfo?(jbecerra)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7)
> Which window is this message being sent to? And is our
> mGlobalCallWndProcHook catching this and letting it through anyway? What is
> the message ID?
The message is 0xc1f3. I see some references to CallWindowProcHook on the stack but I'm not sure if it's actually being called or if those are just leftovers. The address that user32 passes around on the stack points to a function preamble in QT code which I assume is their wndproc. So I guess that means it was sent to their window?
By the way, which hook (if any) is supposed to be used here? Nearby in the codebase I see both a CallWindowProcHook and a CallWindowProcedureHook.
Updated•11 years ago
|
Assignee: nobody → dmajor
![]() |
Assignee | |
Comment 10•11 years ago
|
||
I'm seeing two names come through WindowIsDeferredWindow, "QTNSHIDDEN" and "QTIdle: NNN" where NNN are three hex digits that change with each launch of the browser (but appear unrelated to the HWND value).
It looks like the QTIdle message causes a message to QTNSHIDDEN, which causes the call back into xul that hits the assertion. (Neutering QTIdle also stops QTNSHIDDEN from reaching WindowIsDeferredWindow, but not vice versa)
I can stop the crashes by adding either class name to WindowIsDeferredWindow, but I don't really know if that's a proper fix or just breaking stuff.
![]() |
Assignee | |
Comment 11•11 years ago
|
||
> "QTIdle: NNN" where NNN are three hex digits that change with each launch of
> the browser (but appear unrelated to the HWND value).
Ah, those hex digits are the ID of plugin-container.exe's main thread.
Comment 12•11 years ago
|
||
ISTM that our options are:
1) figure out a way to block/delay this message while we're in sync dispatch using hooks.
2) Neuter most or all of NPAPI while we're using sync dispatch.
I do think that neutering is the more expedient approach. What I'm not sure is why we aren't neutering all windows. Jmathies, do you remember context about that?
Also, did anything about the sync dispatch actually change around bug 901789 ? Or is this just subtle timing issues exposing a latent bug?
Flags: needinfo?(jmathies)
![]() |
||
Comment 13•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12)
> ISTM that our options are:
>
> 1) figure out a way to block/delay this message while we're in sync dispatch
> using hooks.
> 2) Neuter most or all of NPAPI while we're using sync dispatch.
>
> I do think that neutering is the more expedient approach. What I'm not sure
> is why we aren't neutering all windows. Jmathies, do you remember context
> about that?
>
> Also, did anything about the sync dispatch actually change around bug 901789
> ? Or is this just subtle timing issues exposing a latent bug?
Ben had it right in comment 6 - we neuter our windows (based on string serches of the class name) but we don't neuter 3rd party windows except in the few cases where a message sent to a plugin window tirggered a re-entrant call that caused a lock or crash -
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#391
We can't neuter everything because we don't know what plugins create, so we've only dealt with issues we know of. As you can see we haven't had a lot of problems with this. (I also think you're going to break plugins in subtle ways if we just white list everything we know of in plugins like acrobat and flash.)
We should make sure the call that crashes is originating from a 3rd party windowing event. In a couple cases moz engineers have added windows with weird class names and not updated the WindowsMessageLoop causing problems.
If it is a 3rd party window, try adding the window in question to the deferred detection routine to see if it fixes it. You might have to add additional message handlers if you need to queue up an event our deferred message processing doesn't handle, although that's unlikely -
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#164
Flags: needinfo?(jmathies)
![]() |
Assignee | |
Comment 14•11 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #13)
> We should make sure the call that crashes is originating from a 3rd party
> windowing event. In a couple cases moz engineers have added windows with
> weird class names and not updated the WindowsMessageLoop causing problems.
>
> If it is a 3rd party window, try adding the window in question to the
> deferred detection routine to see if it fixes it. You might have to add
> additional message handlers if you need to queue up an event our deferred
> message processing doesn't handle, although that's unlikely -
These are Quicktime windows. First there is a "QTIdle" window class getting a nonqueued message. If I add that to WindowIsDeferredWindow here:
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#353
then, NeuterWindowProcedure's call to SetWindowLongPtr here:
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#452
returns some strange return value for the old wndproc, of the form 0xffff????. That doesn't look like a valid wndproc address but it doesn't seem to cause any harm. Eventually we reach NeuteredWindowProc and ProcessOrDeferMessage:
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#164
where the uMsg parameter is in the 0xC000 range from RegisterWindowMessage("QTGenerateNullEvents").
If I don't neuter QTIdle, then there is another nonqueued message that comes through for the QTNSHIDDEN class. This one also has a strange 0xffff???? wndproc address. When the message reaches ProcessOrDeferMessgae, the uMsg is 0x7fff (WM_APP-1).
Is neutering one or both of these wndprocs the right thing to do? Would I also need to create a new DeferredSendMessage for them?
Flags: needinfo?(jmathies)
![]() |
||
Comment 15•11 years ago
|
||
(In reply to David Major [:dmajor] (UTC+12) from comment #14)
> returns some strange return value for the old wndproc, of the form
> 0xffff????. That doesn't look like a valid wndproc address but it doesn't
> seem to cause any harm.
I think what you're getting here is a psuedo handle caused by a mismatched unicode/ansi window procedure / hook. I think this is ok as windows handles the translation under the hood.
> Eventually we reach NeuteredWindowProc and
> ProcessOrDeferMessage:
>
> http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.
> cpp#164
>
> where the uMsg parameter is in the 0xC000 range from
> RegisterWindowMessage("QTGenerateNullEvents").
>
> If I don't neuter QTIdle, then there is another nonqueued message that comes
> through for the QTNSHIDDEN class. This one also has a strange 0xffff????
> wndproc address. When the message reaches ProcessOrDeferMessgae, the uMsg is
> 0x7fff (WM_APP-1).
Odd, but that's not "wrong" since it's in the WM_USER band -
http://msdn.microsoft.com/en-us/library/windows/desktop/ms644930(v=vs.85).aspx
> Is neutering one or both of these wndprocs the right thing to do? Would I
> also need to create a new DeferredSendMessage for them?
We don't have a WM_USER+n handler, and this might be hard to support. What happens if you just drop that message? Also, what's in wParam and lParam? Do they look like they carry some sort of data (like a 'this' pointer or buffer)? We could defer these, although it's a bit risky since we don't know what they carry.
Another thing we could look at is failing plugin NPN_GetValue calls when we're in a sync call. Although dumping that WM_APP-1 message might be simplest assuming it doesn't totally break quicktime.
Flags: needinfo?(jmathies)
![]() |
Assignee | |
Comment 16•11 years ago
|
||
Is this what you had in mind?
From brief testing I don't see anything obviously broken with Quicktime, but that's not enough to make me feel comfortable. Is there a QT expert who can test this further?
Attachment #8417196 -
Flags: feedback?(jmathies)
![]() |
Assignee | |
Comment 17•11 years ago
|
||
Forgot to mention: the wParam/lParam for this message are always 0, at least in the cases I've seen.
Comment 18•11 years ago
|
||
Too late for 29 but tracking for 30 in case a patch lands during this cycle.
status-firefox30:
--- → affected
status-firefox31:
--- → affected
status-firefox32:
--- → affected
tracking-firefox30:
--- → +
tracking-firefox31:
--- → +
![]() |
||
Comment 19•11 years ago
|
||
Comment on attachment 8417196 [details] [diff] [review]
Drop WM_APP-1
Yep, that's basically the fix.
Attachment #8417196 -
Flags: feedback?(jmathies) → feedback+
![]() |
Assignee | |
Comment 20•11 years ago
|
||
Juan, when this build finishes, can you spend some time with it and tell me what I've broken? (Or if there is a more appropriate QuickTime/Plugins expert then feel free to pass it on)
https://tbpl.mozilla.org/?tree=Try&rev=cd37d3318b11
For comparison, the baseline for that patch is: ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/05/2014-05-05-03-02-02-mozilla-central/
Flags: needinfo?(jbecerra)
Reporter | ||
Comment 21•11 years ago
|
||
I haven't been able to crash the QT plugin using the try build above. I am able to play the trailer videos, in different resolutions (after reloading the page) with no problems.
I also tried playing Flash videos, in a similar way as the QT videos.
In comparison, the nightly build from central crashes almost instantly while I play any of the trailers and I hover the mouse cursor.
Flags: needinfo?(jbecerra)
![]() |
Assignee | |
Comment 22•11 years ago
|
||
Just some comment tweaks. I still feel a little hesitant, not being an expert on this code. Jim is there any testing beyond Juan's comment 21 that you'd like to see done before proceeding? (Keeping in mind that this may be uplifted to aurora and/or beta)
Attachment #8418438 -
Flags: review?(jmathies)
Comment on attachment 8418438 [details] [diff] [review]
Drop WM_APP-1
One thing that could make it safer is to only drop this message if the class matches the QT one...
![]() |
||
Comment 24•11 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #23)
> Comment on attachment 8418438 [details] [diff] [review]
> Drop WM_APP-1
>
> One thing that could make it safer is to only drop this message if the class
> matches the QT one...
Was thinking about that too. Although I think this is still pretty safe since our list of deferred 3rd party windows is pretty small: flash and silverlight fullscreen windows and google earth message windows.
![]() |
||
Updated•11 years ago
|
Attachment #8418438 -
Flags: review?(jmathies) → review+
![]() |
||
Updated•11 years ago
|
Attachment #8417196 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 25•11 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/321376574687
I'll request some testing on those plugins and keep an eye out on the forums.
Comment 26•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
![]() |
Assignee | |
Comment 27•11 years ago
|
||
Juan can you verify the QT fix on mozilla-central and play around with Flash/Silverlight/Google Earth to make sure they haven't regressed? For ideas on things to try, you can find various STR scattered around bug 626975 and bug 550784.
Flags: needinfo?(jbecerra)
Comment 28•11 years ago
|
||
Please nominate for uplift, we can wait for Juan's input before approving but good to have it ready.
![]() |
Assignee | |
Comment 29•11 years ago
|
||
Comment on attachment 8418438 [details] [diff] [review]
Drop WM_APP-1
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Caused by an interaction between plugin-container and QuickTime. Crash-stats says this has been happening since FF 27.
User impact if declined: Crashes when playing QT videos
Testing completed (on m-c, etc.): This has been on m-c for a few days with nothing alarming on the forums. I'd like to wait for the targeted testing in comment 27 before proceeding.
Risk to taking this patch (and alternatives if risky): Uncertain. There may be cases where QuickTime does not react well to our discarding their messages. At first glance, things seem okay, but it's possible we may have missed something.
String or IDL/UUID changes made by this patch: None
Attachment #8418438 -
Flags: approval-mozilla-beta?
Attachment #8418438 -
Flags: approval-mozilla-aurora?
Updated•11 years ago
|
Attachment #8418438 -
Flags: approval-mozilla-beta?
Attachment #8418438 -
Flags: approval-mozilla-beta+
Attachment #8418438 -
Flags: approval-mozilla-aurora?
Attachment #8418438 -
Flags: approval-mozilla-aurora+
![]() |
Assignee | |
Comment 30•11 years ago
|
||
Comment on attachment 8418438 [details] [diff] [review]
Drop WM_APP-1
Let's wait for the results of testing m-c. Resetting the flags so that this doesn't get auto-uplifted by the scripts.
Attachment #8418438 -
Flags: approval-mozilla-beta?
Attachment #8418438 -
Flags: approval-mozilla-beta+
Attachment #8418438 -
Flags: approval-mozilla-aurora?
Attachment #8418438 -
Flags: approval-mozilla-aurora+
Reporter | ||
Comment 31•11 years ago
|
||
I have been testing this on Win7 and Win8 with the latest nightly, comparing the behavior to nightlies before 5/7, while using Flash/Silverlight/QT/Google Earth in different sites, but mostly Netflix, Hulu, iTunes trailer site, also taking into account some of the scenarios in bug 626975 and bug 550784, like using multiple monitors, clicking through adds, changing focus between windows.
I haven't been able to crash the QT plugin on the latest nightly, and I haven't seen any change in behavior between nightlies with the other plugins (they work in basic use cases).
The only thing I can note is that QT is a little choppy at higher resolutions, but this is no different than before the fix.
I also spot checked on Mac for good measure.
Flags: needinfo?(jbecerra)
Updated•11 years ago
|
Attachment #8418438 -
Flags: approval-mozilla-beta?
Attachment #8418438 -
Flags: approval-mozilla-beta+
Attachment #8418438 -
Flags: approval-mozilla-aurora?
Attachment #8418438 -
Flags: approval-mozilla-aurora+
Comment 32•11 years ago
|
||
Comment 33•11 years ago
|
||
Updated•11 years ago
|
status-b2g-v1.4:
--- → fixed
Comment 34•11 years ago
|
||
Reproduced the crash a few times with the STR from the description using Firefox 30 beta 3 Firefox 29.0.1 on a Surface Pro 2 device running Windows 8.1 .
I was not able to crash using Firefox 30 beta 5 (build ID: 20140515140857) on several attempts.
I will leave this bug Resolved for a couple of days more so I can monitor the crash reports and assure that no crashes with this signature are submitted for the fixed versions.
Comment 35•11 years ago
|
||
per comment 31
Comment 36•11 years ago
|
||
There still are several crashes for Beta during the last week:
- 30.0b4: 19 crashes
- 30.0b5: 9 crashes
- 30.0b3: 10 crashes
However, the crash numbers for Nightly and Aurora have decreased for the same time period:
- Aurora 31.0a2: 4 crashes
- Nightly 32.0a1: 1 crash
Comment 37•11 years ago
|
||
No crashes for Aurora 31.0a2 during the last week.
The things are not quite good for Firefox 30 beta as there still are several crashes reported.
- 30.0b7: 11 crashes
I'm concerned that the number of crashes will greatly increase if this heads to the release in this state. Any thoughts?
Flags: needinfo?(dmajor)
![]() |
Assignee | |
Comment 38•11 years ago
|
||
The fix will not make the crash signature disappear completely. Other plugins can cause the same crash signature. The goal of this bug was only to fix the specific crash with Quicktime.
Flags: needinfo?(dmajor)
Comment 39•11 years ago
|
||
Thank you David!
I wasn't able to crash the Quicktime plugin on Firefox 30 Beta 7 (build ID: 20140522105902) and Beta 8 (build ID: 20140527133511) to obtain this signature.
Marking this issue verified and will continue to monitor Socorro.
Updated•3 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•