Closed Bug 535004 Opened 12 years ago Closed 12 years ago

When closing 'cavern' demo while it's loading: "ABORT: actively animated document w/ no window: 'win'", or crash [@ libxul.so@0x7fa124 ], [@ nsSMILCSSProperty::GetBaseValue() const ]

Categories

(Core :: SVG, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dholbert, Assigned: dholbert)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

STEPS TO REPRODUCE:
 1. Load http://svg.kvalitne.cz/cavern/088/cavern.svg as your only Firefox tab
 2. Ctrl-Q or Ctrl-W to close that tab (and quit Firefox) while it's loading.

ACTUAL RESULTS:
* In debug build:
> ###!!! ABORT: actively animated document w/ no window: 'win', file ../../../mozilla/content/smil/nsSMILCSSProperty.cpp, line 62

* In opt build: Crash  [@ libxul.so@0x7fa124 ]

Some crash reports:
http://crash-stats.mozilla.com/report/index/74e67fa0-cfc5-4402-b439-5a36b2091215
http://crash-stats.mozilla.com/report/index/d408ec02-0c25-41df-bb13-206272091215
http://crash-stats.mozilla.com/report/index/fd037351-517d-4c97-b3fc-6893a2091215

The opt builds are almost certainly crashing on a null-deref, on the line right after the debug build's abort:
> 296   NS_ABORT_IF_FALSE(doc, "active animations should only be able to "
> 297                     "target elements that are in a document");
> 298   nsIPresShell* shell = doc->GetPrimaryShell();
Assignee: nobody → dholbert
Actually, the crash is [@ nsSMILCSSProperty::GetBaseValue() const ].  I got the other signature (libxul.so@0x7fa124) only because today's nightly doesn't seem to get symbols in crash reporter for some reason. (filed as Bug 535071)

Here's a crash using yesterday's nightly, with symbols:
http://crash-stats.mozilla.com/report/index/bp-960fd911-0d3b-4cdd-aad4-6706f2091215
Summary: When closing 'cavern' demo while it's loading: "ABORT: actively animated document w/ no window: 'win'", or crash [@ libxul.so@0x7fa124 ] → When closing 'cavern' demo while it's loading: "ABORT: actively animated document w/ no window: 'win'", or crash [@ libxul.so@0x7fa124 ], [@ nsSMILCSSProperty::GetBaseValue() const ]
Status: NEW → ASSIGNED
So this is similar to bug 534975 -- namely:
 (1) We're loading external resources for a document that's being animated
 (2) User closes Firefox
 (3) We call nsDocShell::Destroy, which (through several stack levels):
   (a) calls nsDocument::OnPageHide  (--> pauses animations)
   (b) calls nsLoadGroup::Cancel to cancel the resources that we're loading
     -- THIS CALLS nsDocument::OnPageShow, which resumes the animation.
   (c) Later on (still in Destroy), we clear the document's mWindow pointer
 (4) The animation (running because it was resumed in step 3b) tries to do a sample, and we die because it's got a null window pointer.

In bug 534975, we fixed this by checking "docShell->IsBeingDestroyed()" before calling OnPageShow (step 3b).

However, that doesn't work in this case, because the DocShell that receives that call is *different* from the one that's being destroyed.  Specifically, it's a child of the destroyed docshell.  Here's the (abridged) stack trace:

>#0  nsSMILAnimationController::Resume (this=0xac0fc300, aType=4) at ../../../mozilla/content/smil/nsSMILAnimationController.cpp:135
>#1  0xb282db94 in nsSMILAnimationController::OnPageShow (this=0xac0fc300) at ../../../mozilla/content/smil/nsSMILAnimationController.cpp:174
>#2  0xb2194f38 in nsDocument::OnPageShow (this=0xaa90f000, aPersisted=0, aDispatchStartTarget=0x0) at ../../../../mozilla/content/base/src/nsDocument.cpp:7216
>#3  0xb1e8ff5c in DocumentViewerImpl::LoadComplete (this=0xac0e02c0, aStatus=2152398850) at ../../../mozilla/layout/base/nsDocumentViewer.cpp:1074
>#4  0xb1ca0d94 in nsDocShell::EndPageLoad (this=0xb43536d0, aProgress=0xb43536e4, aChannel=0xae981e50, aStatus=2152398850) at ../../../mozilla/docshell/base/nsDocShell.cpp:5735
[SNIP]
>#12 0xb1cdd2b3 in nsDocLoader::Stop (this=0xb43536d0) at ../../../mozilla/uriloader/base/nsDocLoader.cpp:328
>#13 0xb1cb0795 in nsDocShell::Stop (this=0xb43536d0) at ../../../mozilla/docshell/base/nsDocShell.h:256
>#14 0xb1cdd26c in nsDocLoader::Stop (this=0xb4350280) at ../../../mozilla/uriloader/base/nsDocLoader.cpp:323
>#15 0xb1cb0795 in nsDocShell::Stop (this=0xb4350280) at ../../../mozilla/docshell/base/nsDocShell.h:256
>#16 0xb1c894c1 in nsDocShell::Stop (this=0xb4350280, aStopFlags=3) at ../../../mozilla/docshell/base/nsDocShell.cpp:3972
>#17 0xb1ca3566 in nsDocShell::Destroy (this=0xb4350280) at ../../../mozilla/docshell/base/nsDocShell.cpp:4249

So in stack level 14, we call "loader->Stop()" on all of our child DocLoaders.  So when we receive the call to DocumentViewerImpl::LoadComplete, we're in the context of the *child* docshell.  The child isn't marked as being destroyed, so we happily make the call to OnPageShow (resuming animations) and prepare to shoot ourselves in the foot.
(In reply to comment #2)
> So this is similar to bug 534975 -- namely:
[SNIP]
> In bug 534975, we fixed this by checking "docShell->IsBeingDestroyed()" 

Sorry -- I meant to refer to bug 525153 (not 534975) in that last comment.
More details:
 - The outermost nsDocShell::Destroy is for the chrome UI's docshell.  (Its mContentViewer->mDocument is a nsXULDocument). This outermost nsDocShell::Destroy() makes the following calls, in order:

  (a) FirePageHideNotification(PR_TRUE);
       This calls FirePageHideNotification on each of its DocShell children, including our SVG document's docshell.  This is how we get our nsSMILAnimationController::OnPageHide() call.

  (b) Stop(nsIWebNavigation::STOP_ALL);
      This calls Stop() on each of our child docshells, which ultimately makes the nsSMILAnimationController::OnPageShow() call.

  (c) mContentViewer->Destroy();
      This calls nsDocument::Destroy for the our chrome XUL document, which (down 16 stack levels) calls nsDocShell::Destroy on our SVG document's doc-shell.

Now, you'd think that the 16-stack-level-down call to Destroy (for SVG document's docshell) would make another OnPageHide call and make everything work out okay.  But it doesn't, because its FirePageHideNotification method was already called (in (a)), so mFiredUnloadEvent is true, and so this new call skips the meat of the function.
I wonder if we get anything reasonable as a parameter to DocumentViewerImpl::LoadComplete.
Could we somehow use aStatus to detect whether to call OnPageShow.

Or do we just need to iterate all the ancestor docshells and check if they are being destroyed. That would be easy, but a bit silly, IMO.
This fix (suggested by bz) just broadens the condition for calling OnPageShow.  It checks docShell->GetIsInUnload() (an accessor for mFiredUnloadEvent), instead of checking docShell->IsBeingDestroyed.

This new check will catch not only DocShells that are being destroyed, but also DocShells whose ancestors are being destroyed.  (because the ancestor will have called FirePageHideNotification, which sets mFiredUnloadEvent on all its children)

I also add an ABORT_IF_FALSE to the place where we clear nsDocument::mWindow, to catch this sort of bug a little bit earlier.
Attachment #418034 - Flags: review?(Olli.Pettay)
Blocks: 535403
Blocks: 534975
What happens to pageshow event if load event causes a new page to be loaded?
Though, I think preventing pageshow is the right thing to do even in that case, 
if we're in unload.
Could you still test that case.
(In reply to comment #7)
> What happens to pageshow event if load event causes a new page to be loaded?

We'll still get the pageshow event, because DocumentViewerImpl:::LoadComplete gets called before FirePageHideNotification does.  Details below.

I tested a simple SVG document with <svg onload="window.location='about:blank'"> with an <image> element.  Here's what happens when it loads:

- We get a call to nsDocShell::LoadURI (for the redirect in onload handler)
  * ...and that cancels our image-loading (via nsDocLoader::Stop & nsLoadGroup::Cancel)
  * ...and a few levels down, that puts us in DocumentViewerImpl::LoadComplete
  * ...and that calls mDocument->OnPageShow (passing the GetIsInUnload() check, because we haven't received any FirePageHideNotification calls yet).

- A little later, when we start to receive data from the target URL, we get a call to nsInputStreamPump::OnInputStreamReady, which (several stack-levels down) ends up having us call nsDSURIContentListener::DoContent.
  * ...and that calls nsDocShell::CreateContentViewer
  * ...and that calls FirePageHideNotification
  * ...and that's when we set mFiredUnloadEvent

Incidentally, CreateContentViewer actually clears mFiredUnloadEvent immediately after its call to FirePageHideNotification, since we've got a new document.

But anyway, the key point for the purposes of your question is that we interrupt the old document (and call OnPageShow) **before** we fire unload.
No longer blocks: 535403
Attachment #418034 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 418034 [details] [diff] [review]
fix: Check "GetIsInUnload" instead of "IsBeingDestroyed"

Ok, let's try this then.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
I backed this out because the NS_ABORT_IF_FALSE fired during reftest and crashtest (in the only debug run so far since this patch landed):

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1261619959.1261626808.25409.gz

http://hg.mozilla.org/mozilla-central/rev/03bec0d87154
http://hg.mozilla.org/mozilla-central/rev/7a68e56bb96a
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
D'oh -- thanks for backing out.  Pretty sure I tryserver-tested this, but yeah, that wouldn't catch the ABORT_IF_FALSE failure.
So I hit this NS_ABORT_IF_FALSE if I run reftests on layout/reftests/reftest-sanity/reftest.list.  In particular, if I remove everything in that file except for the "filter-1.xhtml" or "filter-2.xhtml" line, then I hit this ABORT_IF_FALSE on shutdown.

We fail because we calling SetScriptGlobalObject(nsnull) on an nsSVGDocument* that's unpaused (i.e. has mAnimationController::mPauseState = 0).  The stack looks like this:

{
#2  0xb43474b6 in nsDocument::SetScriptGlobalObject (this=0xaf02c000, aScriptGlobalObject=0x0) at ../../../../mozilla/content/base/src/nsDocument.cpp:3612
#3  0xb402ca19 in DocumentViewerImpl::Close (this=0xb56f1580, aSHEntry=0x0) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/layout/base/nsDocumentViewer.cpp:1427
#4  0xb4335f23 in ~ExternalResource (this=0xaf036550, __in_chrg=<value optimized out>) at ../../../../mozilla/content/base/src/nsDocument.cpp:1203
#5  0xb4358921 in ~nsAutoPtr (this=0xb3a6c6bc, __in_chrg=<value optimized out>) at ../../../dist/include/nsAutoPtr.h:104
#6  0xb436094a in ~nsBaseHashtableET (this=0xb3a6c6b4, __in_chrg=<value optimized out>) at ../../../dist/include/nsBaseHashtable.h:312
#7  0xb4360969 in nsTHashtable<nsBaseHashtableET<nsURIHashKey, nsAutoPtr<nsExternalResourceMap::ExternalResource> > >::s_ClearEntry (table=0xb2cd9628, entry=0xb3a6c6b4) at ../../../dist/include/nsTHashtable.h:397
#8  0x001ad96c in ?? ()
#9  0x001adac1 in ?? ()
#10 0xb435d8b3 in nsTHashtable<nsBaseHashtableET<nsURIHashKey, nsAutoPtr<nsExternalResourceMap::ExternalResource> > >::Clear (this=0xb2cd9628) at ../../../dist/include/nsTHashtable.h:251
#11 0xb435d8cb in nsBaseHashtable<nsURIHashKey, nsAutoPtr<nsExternalResourceMap::ExternalResource>, nsExternalResourceMap::ExternalResource*>::Clear (this=0xb2cd9628) at ../../../dist/include/nsBaseHashtable.h:227
#12 0xb435d8ed in nsExternalResourceMap::Shutdown (this=0xb2cd9628) at ../../../../mozilla/content/base/src/nsDocument.h:439
#13 0xb433ce4b in nsDocument::Destroy (this=0xb2cd9400) at ../../../../mozilla/content/base/src/nsDocument.cpp:6987
#14 0xb4032739 in DocumentViewerImpl::Destroy (this=0xb56f0680) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/layout/base/nsDocumentViewer.cpp:1590
#15 0x026d380f in nsSHistory::EvictContentViewersInRange (this=0xb3d7c4c0, aStart=0, aEnd=3) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/docshell/shistory/src/nsSHistory.cpp:881
#16 0x026d3b3f in nsSHistory::EvictAllContentViewers (this=0xb3d7c4c0) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/docshell/shistory/src/nsSHistory.cpp:672
#17 0x0265d7b0 in nsDocShell::Destroy (this=0xb57447d0) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/docshell/base/nsDocShell.cpp:4291
[...]
#34 0xb4659056 in nsGlobalWindow::Close (this=0xb7599090) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/dom/base/nsGlobalWindow.cpp:5676
#35 0x08f2c566 in nsAppStartup::CloseAllWindows (this=0xb3b7df10) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/toolkit/components/startup/src/nsAppStartup.cpp:357
#36 0x08f2c8af in nsAppStartup::Quit (this=0xb3b7df10, aMode=3) at /scratch/work/builds/mozilla-central/mozilla-central.09-04-02.16-24/mozilla/toolkit/components/startup/src/nsAppStartup.cpp:257
}
Re-landed without the NS_ABORT_IF_FALSE, since other issues beyond just this bug (and External Resources in particular) can make us fail that, as indicated by comment 11 thru comment 13.
http://hg.mozilla.org/mozilla-central/rev/80047349a610

I'm filing a new bug on the situation with External Resources that makes us fail this NS_ABORT_IF_FALSE...
Filed bug 536834.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Crash Signature: [@ libxul.so@0x7fa124 ] [@ nsSMILCSSProperty::GetBaseValue() const ]
You need to log in before you can comment on or make changes to this bug.