Last Comment Bug 753274 - Television app crashes runtime
: Television app crashes runtime
Status: VERIFIED FIXED
[appreview-blocker], [qa!]
: crash
Product: Firefox Graveyard
Classification: Graveyard
Component: Webapp Runtime (show other bugs)
: unspecified
: x86_64 Windows 7
: P2 critical
: Firefox 15
Assigned To: Tim Abraldes [:TimAbraldes] [:tabraldes]
: Jason Smith [:jsmith]
Mentors:
https://marketplace.mozilla.org/en-US...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-09 04:06 PDT by Andrew Williamson [:eviljeff]
Modified: 2016-03-21 12:39 PDT (History)
7 users (show)
jsmith: in‑moztrap-
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Patch v1 - Make `nsWindow::SubclassWindow` deal more gracefully with windows that have already been destroyed (2.59 KB, patch)
2012-05-31 16:17 PDT, Tim Abraldes [:TimAbraldes] [:tabraldes]
no flags Details | Diff | Review
Patch v2 - Cleaned up nsWindow::SubclassWindow (3.06 KB, patch)
2012-06-01 13:07 PDT, Tim Abraldes [:TimAbraldes] [:tabraldes]
jmathies: review+
Details | Diff | Review

Description Andrew Williamson [:eviljeff] 2012-05-09 04:06:11 PDT
I can consistently crash the Television app.

Steps to reproduce:
1) Install & run Television app
2) load a live stream (e.g. RT America)
3) wait for it to start
4) click one of the VOD links on the right

Windows reports Nightly has crashed.

As the crash reporter isn't in the runtime yet (bug 745980) I don't have the normal Mozilla crash reports.  I uploaded the windows crash logs here though:
http://www.2shared.com/file/NytoRsNQ/TV_app_crash_logs.html

If I go to the url (http://internet-tv.appspot.com/) in Nightly and try the same STR the live flash stream doesn't play, but I can then switch to a vodcast without Nightly crashing.  
I don't have any plugins disabled in Nightly (I just have Flash and an MS
Office one).
Comment 1 Jason Smith [:jsmith] 2012-05-09 07:45:32 PDT
Myk - Is there enough information here to determine what happened? If not, what else should Andrew get to figure out the root cause of the crash?
Comment 2 Robert Kaiser (not working on stability any more) 2012-05-09 07:57:26 PDT
From my POV, we really need the crash reporter in the runtime to diagnose such things nicely.

That said, you can probably try running the runtime/app in a debugger and try getting a stack that way.
Comment 3 Jason Smith [:jsmith] 2012-05-11 19:29:12 PDT
Flagging for stackwanted - we need a stack trace for this bug.
Comment 4 Tim Abraldes [:TimAbraldes] [:tabraldes] 2012-05-18 15:49:21 PDT
KERNELBASE!DebugBreak+0x2
xul!RealBreak(void)+0xa
xul!Break(char * aMsg = 0x00000000`0032d4e0 "###!!! ASSERTION: Invalid window handle: 'Error', file c:/src/central/obj/widget/windows/../../../widget/windows/nsWindow.cpp, line 904")+0x255
xul!NS_DebugBreak_P(unsigned int aSeverity = 1, char * aStr = 0x000007fe`e42fd5f0 "Invalid window handle", char * aExpr = 0x000007fe`e42fc5a4 "Error", char * aFile = 0x000007fe`e42fd590 "c:/src/central/obj/widget/windows/../../../widget/windows/nsWindow.cpp", int aLine = 904)+0x35a
xul!nsWindow::SubclassWindow(int bState = 0)+0x5d
xul!nsWindow::OnDestroy(void)+0x65
xul!nsWindow::Destroy(void)+0x129
xul!nsPluginInstanceOwner::Destroy(void)+0xf0b
xul!nsObjectLoadingContent::DoStopPlugin(class nsPluginInstanceOwner * aInstanceOwner = 0x00000000`09359ef0, bool aDelayedStop = false)+0x11a
xul!nsObjectLoadingContent::StopPluginInstance(void)+0x148
xul!InDocCheckEvent::Run(void)+0x82
xul!nsBaseAppShell::RunSyncSectionsInternal(bool aStable = true, unsigned int aThreadRecursionLevel = 0)+0x14d
xul!nsBaseAppShell::RunSyncSections(bool stable = true, unsigned int threadRecursionLevel = 0)+0x3e
xul!nsBaseAppShell::OnProcessNextEvent(class nsIThreadInternal * thr = 0x00000000`02281b70, bool mayWait = true, unsigned int recursionDepth = 0)+0x252
xul!nsThread::ProcessNextEvent(bool mayWait = true, bool * result = 0x00000000`0032e110)+0x293
xul!NS_ProcessNextEvent_P(class nsIThread * thread = 0x00000000`02281b70, bool mayWait = true)+0x6b
xul!mozilla::ipc::MessagePump::Run(class base::MessagePump::Delegate * aDelegate = 0x00000000`02271bc0)+0x2db
xul!MessageLoop::RunInternal(void)+0x68
xul!MessageLoop::RunHandler(void)+0x3b
xul!MessageLoop::Run(void)+0x22
xul!nsBaseAppShell::Run(void)+0x63
xul!nsAppStartup::Run(void)+0x8a
xul!XREMain::XRE_mainRun(void)+0xf31
xul!XREMain::XRE_main(int argc = 1, char ** argv = 0x00000000`004c7a80, struct nsXREAppData * aAppData = 0x00000000`004c1950)+0x34b
xul!XRE_main(int argc = 1, char ** argv = 0x00000000`004c7a80, struct nsXREAppData * aAppData = 0x00000000`004c1950)+0x52
TV!`anonymous namespace'::AttemptGRELoadAndLaunch(char * greDir = 0x00000000`0032f1f0 "C:\src\central\obj\dist\bin")+0x635
TV!`anonymous namespace'::AttemptLoadFromDir(char * firefoxDir = 0x00000000`0032f1f0 "C:\src\central\obj\dist\bin")+0x19f
TV!NS_internal_main(int argc = 1, char ** argv = 0x00000000`004c7a80)+0x320
TV!wmain(int argc = 1, wchar_t ** argv = 0x00000000`004d28f0)+0x196
TV!__tmainCRTStartup(void)+0xd2
TV!wmainCRTStartup(void)+0xe
kernel32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x21


Not sure if this is the appropriate place to paste it (maybe pastebin is better?), but here's a stack trace of this error.
Comment 5 Jason Smith [:jsmith] 2012-05-18 16:29:47 PDT
Thanks Tim. Looking at the stack, do you have any ideas of what the root cause of the problem is? I see mention of the plugins & windows in the above stack.
Comment 6 Tim Abraldes [:TimAbraldes] [:tabraldes] 2012-05-21 23:57:29 PDT
From what I can tell, a window related to the flash plugin is being destroyed.  By the time we get to the `SubclassWindow()` call in `nsWindow::OnDestroy`, `mWnd` is no longer a valid window handle.

I'm not sure what causes the `mWnd` handle to become invalid; I'd probably have to set some breakpoints and run through the STR in the debugger a few times to get a better understanding of what's going on here.

I'll try to take another look at this soon but I'll be out of town May 24th-29th and may not get to this until afterward.
Comment 7 Tim Abraldes [:TimAbraldes] [:tabraldes] 2012-05-31 14:48:29 PDT
This bug appears to be timing related.

See https://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?rev=bea5301cf21f#636

When the bug does NOT express itself:
  `nsWindow::Destroy()` is called
  `nsWindow::Destroy()` calls `::DestroyWindow()`
  `nsWindow`s wndProc handles the `WM_DESTROY` message by calling `nsWindow::OnDestroy()`
  `nsWindow::OnDestroy()` validates `mWnd` by calling `::IsWindow()`
  `nsWindow::OnDestroy()` successfully destroys the window
  `nsWindow::Destroy()` returns successfully

When the bug DOES express itself:
  `nsWindow::Destroy()` is called
  `nsWindow::Destroy()` calls `::DestroyWindow()`
  `nsWindow::Destroy()` calls `nsWindow::OnDestroy()`
  `nsWindow::OnDestroy()` attempts to validate `mWnd` by calling `::IsWindow()`
  Assertion fires

My guess is that something along the following lines is happening:
  `::DestroyWindow()` invalidates the `HWND` that `mWnd` points to
  `::DestroyWindow()` posts a `WM_DESTROY` message to the window's message queue
  In the crash case, `nsWindow::Destroy()` continues executing before the `WM_DESTROY` has been processed
Comment 8 Tim Abraldes [:TimAbraldes] [:tabraldes] 2012-05-31 16:17:22 PDT
Created attachment 628947 [details] [diff] [review]
Patch v1 - Make `nsWindow::SubclassWindow` deal more gracefully with windows that have already been destroyed

The attached patch seems to resolve this issue.  It works by modifying `nsWindow::SubclassWindow()` to be more lenient about an invalid `mWnd` if we are removing a subclass rather than adding one.

The attached patch modifies `nsWindow` which is slightly terrifying.  Jimm: Do you see any obvious issues with this approach?
Comment 9 Jim Mathies [:jimm] 2012-05-31 16:52:14 PDT
Comment on attachment 628947 [details] [diff] [review]
Patch v1 - Make `nsWindow::SubclassWindow` deal more gracefully with windows that have already been destroyed

Review of attachment 628947 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/windows/nsWindow.cpp
@@ +850,5 @@
>  // Subclass (or remove the subclass from) this component's nsWindow
>  void nsWindow::SubclassWindow(BOOL bState)
>  {
> +  if (bState) {
> +    if (NULL == mWnd || !::IsWindow(mWnd)) {

lets clean this up while we're here - 

(!mWnd || !IsWindow(mWnd))

@@ +856,5 @@
>      }
>  
> +    if (mUnicodeWidget)
> +      mPrevWndProc = (WNDPROC)::SetWindowLongPtrW(mWnd, GWLP_WNDPROC,
> +                                              (LONG_PTR)nsWindow::WindowProc);

nix the '::' throughout and line things up right. Also bracket this if statement.

@@ +864,5 @@
> +    NS_ASSERTION(mPrevWndProc, "Null standard window procedure");
> +    // connect the this pointer to the nsWindow handle
> +    WinUtils::SetNSWindowPtr(mWnd, this);
> +  } else {
> +    if (::IsWindow(mWnd)) {

This appears to be the only real change here right? This should be ok. Please run this through try.
Comment 10 Jim Mathies [:jimm] 2012-05-31 16:57:24 PDT
(In reply to Tim Abraldes from comment #7)
> When the bug DOES express itself:
>   `nsWindow::Destroy()` is called
>   `nsWindow::Destroy()` calls `::DestroyWindow()`
>   `nsWindow::Destroy()` calls `nsWindow::OnDestroy()`
>   `nsWindow::OnDestroy()` attempts to validate `mWnd` by calling
> `::IsWindow()`
>   Assertion fires
> 
> My guess is that something along the following lines is happening:
>   `::DestroyWindow()` invalidates the `HWND` that `mWnd` points to
>   `::DestroyWindow()` posts a `WM_DESTROY` message to the window's message
> queue
>   In the crash case, `nsWindow::Destroy()` continues executing before the
> `WM_DESTROY` has been processed

I don't see how this is possible, according the MSDN this is a synchronous call. Is it possible something in the window has subclassed our window and is stealing the WM_DESTROY message?
Comment 11 Tim Abraldes [:TimAbraldes] [:tabraldes] 2012-05-31 18:15:30 PDT
(In reply to Jim Mathies [:jimm] from comment #10)
> (In reply to Tim Abraldes from comment #7)
> > When the bug DOES express itself:
> >   `nsWindow::Destroy()` is called
> >   `nsWindow::Destroy()` calls `::DestroyWindow()`
> >   `nsWindow::Destroy()` calls `nsWindow::OnDestroy()`
> >   `nsWindow::OnDestroy()` attempts to validate `mWnd` by calling
> > `::IsWindow()`
> >   Assertion fires
> > 
> > My guess is that something along the following lines is happening:
> >   `::DestroyWindow()` invalidates the `HWND` that `mWnd` points to
> >   `::DestroyWindow()` posts a `WM_DESTROY` message to the window's message
> > queue
> >   In the crash case, `nsWindow::Destroy()` continues executing before the
> > `WM_DESTROY` has been processed
> 
> I don't see how this is possible, according the MSDN this is a synchronous
> call.

I couldn't find anything on the MSDN pages for `DestroyWindow`[1] or `WM_DESTROY`[2] saying that the message would be processed synchronously.  I planned to set breakpoints in `DestroyWindow` to see if it calls `SendMessage` or `PostMessage` but I'd have to mess with my symbol settings to get export symbols for user32.dll.

> Is it possible something in the window has subclassed our window and
> is stealing the WM_DESTROY message?

Yes, I believe that this is possible, though it would have to be happening non-deterministically (sometimes the window gets subclassed, sometimes it doesn't).  I think the attached patch would still be appropriate if this is the case:
  In [3], the window will be destroyed by `DestroyWindow()`
  `mOnDestroyCalled` will be false, so `OnDestroy` will be called from [3]
  `OnDestroy` will call `SubclassWindow(FALSE)` which will now run without asserting

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/ms632682%28v=vs.85%29.aspx
[2] http://msdn.microsoft.com/en-us/library/windows/desktop/ms632620%28v=vs.85%29.aspx
[3] https://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp?rev=bea5301cf21f#636
Comment 12 Tim Abraldes [:TimAbraldes] [:tabraldes] 2012-06-01 13:07:48 PDT
Created attachment 629308 [details] [diff] [review]
Patch v2 - Cleaned up nsWindow::SubclassWindow

(In reply to Jim Mathies [:jimm] from comment #9)
> Comment on attachment 628947 [details] [diff] [review]
> Patch v1 - Make `nsWindow::SubclassWindow` deal more gracefully with windows
> that have already been destroyed
> 
> Review of attachment 628947 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: widget/windows/nsWindow.cpp
> @@ +850,5 @@
> >  // Subclass (or remove the subclass from) this component's nsWindow
> >  void nsWindow::SubclassWindow(BOOL bState)
> >  {
> > +  if (bState) {
> > +    if (NULL == mWnd || !::IsWindow(mWnd)) {
> 
> lets clean this up while we're here - 
> 
> (!mWnd || !IsWindow(mWnd))

Done.

> @@ +856,5 @@
> >      }
> >  
> > +    if (mUnicodeWidget)
> > +      mPrevWndProc = (WNDPROC)::SetWindowLongPtrW(mWnd, GWLP_WNDPROC,
> > +                                              (LONG_PTR)nsWindow::WindowProc);
> 
> nix the '::' throughout and line things up right. Also bracket this if
> statement.

Done. Removed '::' in front of system calls and bracketed all if statements.

> @@ +864,5 @@
> > +    NS_ASSERTION(mPrevWndProc, "Null standard window procedure");
> > +    // connect the this pointer to the nsWindow handle
> > +    WinUtils::SetNSWindowPtr(mWnd, this);
> > +  } else {
> > +    if (::IsWindow(mWnd)) {
> 
> This appears to be the only real change here right? This should be ok.
> Please run this through try.

Running this through try:
  https://tbpl.mozilla.org/?tree=Try&rev=978a52a598d9

I had actually run the previous patch through try but it's taking forEVER:
  https://tbpl.mozilla.org/?tree=Try&rev=1e4aa67b9db5

Also I changed the C-style casts to reinterpret_casts.
Comment 13 Tim Abraldes [:TimAbraldes] [:tabraldes] 2012-06-04 00:19:38 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/a7948da916f2
Comment 14 :Ehsan Akhgari (busy, don't ask for review please) 2012-06-04 09:31:28 PDT
https://hg.mozilla.org/mozilla-central/rev/a7948da916f2
Comment 15 Jason Smith [:jsmith] 2012-06-04 09:57:07 PDT
Andrew - Could you do a re-review of the Television app and see if this bug is fixed?
Comment 16 Andrew Williamson [:eviljeff] 2012-06-05 15:13:03 PDT
(In reply to Jason Smith [:jsmith] from comment #15)
> Andrew - Could you do a re-review of the Television app and see if this bug
> is fixed?

Its fixed for me on my laptop, but I won't be able to test on my desktop machine (that originally exhibited the bug) for a few weeks.  Its Win7, 64 bit also so its probably okay.

Note You need to log in before you can comment on or make changes to this bug.