Closed Bug 525153 Opened 10 years ago Closed 10 years ago

Crash [@nsSMILCSSProperty::GetBaseValue() const ] when closing tab for Adobe "wonder" SVG demo while it's loading

Categories

(Core :: SVG, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: dholbert, Unassigned)

Details

(Keywords: crash)

Crash Data

Attachments

(4 files, 1 obsolete file)

Steps to reproduce:
 1. Have some tabs open (so that closing a tab won't quit out of Firefox)
 2. In a new tab, load URL (or attached testcase)
 3. Shift-reload the animation, and close tab (with Ctrl+W) when it's around 50% loaded according to the pie chart on the top of the tab.
 4. Wait a few seconds.  If you're still alive, repeat steps 2-3.

ACTUAL RESULTS:
Crash [@nsSMILCSSProperty::GetBaseValue() const ]

I can reproduce this pretty reliably. Sample crash report:
http://crash-stats.mozilla.com/report/index/bp-a9d030ae-1b72-465b-b6c6-145b02091028

Crash report says we're dying at this line:
> 75   nsRefPtr<nsComputedDOMStyle>
> 76     computedStyle(doc->GetWindow()->LookupComputedStyleFor(aElem));
http://mxr.mozilla.org/mozilla-central/source/content/smil/nsSMILCSSProperty.cpp?mark=75-76#63
during a timer callback (in nsTimerEvent::Run)

I just caught this in GDB -- we're crashing because doc->GetWindow() (aka doc->mWindow) is null.

I'm not sure if this is because mWindow hasn't been set yet (because the document is loading), or because mWindow was set but is now cleared because the tab was closed... I suspect the latter.

Either way, we should be able to fix this by adding a null-check on doc->GetWindow(), and we can just return PR_FALSE if that fails.
This patch adds the null-check, and that indeed fixes the crash.  However, instead of a crash, I now get thousands of consecutive instances of this warning spammed to the console:
> WARNING: nsISMILAttr::GetBaseValue failed: file /mozilla/content/smil/nsSMILCompositor.cpp, line 159

I got 1316 instances the first time, and 6538 the next time -- both times, the warnings are followed by:
--DOMWINDOW == 12 (0xb2ad0370) [serial = 18] [outer = 0xb2acfe00] [url = http://svg.kvalitne.cz/adobe/wonder.svg]
--DOMWINDOW == 11 (0xb2acfe30) [serial = 16] [outer = (nil)] [url = http://svg.kvalitne.cz/adobe/wonder.svg]
--DOCSHELL 0xb28145e0 == 6
--DOMWINDOW == 10 (0xb2ad01b0) [serial = 19] [outer = (nil)] [url = http://svg.kvalitne.cz/adobe/wonder.svg]

Of course, it makes sense that GetBaseValue would be failing, but I'm not sure why it'd be failing (and trying again) that many times.  Looking into that...
Attachment #409011 - Attachment description: WIP fix → WIP fix (with diff -w)
Attached image testcase 2
Here's a reduced testcase, using a random Fennec image that I found already on bugzilla.

I can reproduce this bug 100% reliably if I close the tab **after** the Fennec logo-image has started loading but **before** it's finished loading. (i.e. when the top of the image is visible, but the bottom isn't.)

To make this easier, you can increase the image's load-time using a traffic-shaper -- I'm using "wondershaper", e.g. "sudo wondershaper eth0 500 500" [and "sudo wondershaper clear eth0" to go back to normal)
Ok, I investigated this a bit more, and here's what I've got.

When we close a tab containing an SVG Animation, nsDocShell::Destroy gets invoked.  This normally does the following (possibly through several indirect stack-levels):
  (1) It sends our nsDocument gets a PageHide notification, which it passes to the nsSMILAnimationController::OnPageHide.  This function Pauses our animations and cancels the pending sample.
  (2) It calls SetScriptGlobalObject(nsnull) on our document, clearing its mWindow pointer.
  (3) A little while later, the <animate> nodes get cycle-collected.

The mWindow-clearing in step (2) normally doesn't cause any problems, because we don't have any samples after that point.

However, in this bug's testcase, we get an additional step that **resumes** the animation after it's been paused.  I'm calling this step "1.5":

  (1.5) nsDocShell::Destroy calls nsDocLoader::Stop to halt the in-progress image-download.
    -- This gives us a nested series of calls including "EndPageLoad", and below that, "OnPageShow".  The OnPageShow call makes us call nsSMILAnimationController::Resume, so we start sampling again!!

We don't get any further OnPadeHide notifications after this point, so our animations just keep on running -- nothing pauses them.  So when we get to step (2) which clears the mWindow pointer, we just crash on the next sample.

The attached WIP patch works around this crash by null-checking the window before using it, but the problem remains that we're still sampling on a now-defunct document.  This is why we get the warnings that I mentioned in comment 1.  The warnings eventually stop when we cycle-collect the <animate> node, but that takes at least a few seconds to happen.
Attachment #409011 - Attachment description: WIP fix (with diff -w) → null-check workaround (with diff -w)
Attached patch fix (obsolete) — Splinter Review
Here's a fix suggested by smaug in IRC -- have DocumentViewer **not** call OnPageShow, if the docshell is being destroyed.

The patch has an NS_ENSURE_TRUE for the docshell as follows:
> +    nsIDocShell *docShell = window->GetDocShell();
> +    NS_ENSURE_TRUE(docShell, NS_ERROR_UNEXPECTED);

I'm including that because the docshell-checking code in the clause above does the same thing:
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsDocumentViewer.cpp?mark=1024-1025#1010

Just submitted this to the try-server. Assuming it passes muster there, I'll request review.
Attached patch fixSplinter Review
This version adds two fixes that smaug suggested in IRC:
 - remove the NS_ENSURE_TRUE, and instead just add |docShell| as a condition for the "if" statement
 - Re-get |window|, since the load event might have changed it

It also adds a build-warning fix in the same file ("warning: format '%p' expects type 'void*', but argument 2 has type 'nsPrintEngine*'"), so that the file compiles without warnings.
Attachment #409375 - Attachment is obsolete: true
Comment on attachment 409393 [details] [diff] [review]
fix

This passed tests on Windows and Mac try-servers.  The Linux builder died with network issues, so it didn't get a Linux cycle. I'll re-push-to-try to get a Linux cycle, though, for thoroughness' sake.

Anyway, given the Mac & Windows success, I'm confident enough that this doesn't break stuff. :)  Requesting review.
Attachment #409393 - Flags: superreview?(bzbarsky)
Attachment #409393 - Flags: review?(Olli.Pettay)
Attachment #409393 - Flags: superreview?(bzbarsky) → superreview+
Attachment #409393 - Flags: review?(Olli.Pettay) → review+
(In reply to comment #6)
> The Linux builder died with
> network issues, so it didn't get a Linux cycle. I'll re-push-to-try to get a
> Linux cycle, though, for thoroughness' sake.

(FWIW, the followup push-to-try passed tests on all platforms)

Pushed to mozilla-central: http://hg.mozilla.org/mozilla-central/rev/2801b05bb7cc
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Crash Signature: [@nsSMILCSSProperty::GetBaseValue() const ]
You need to log in before you can comment on or make changes to this bug.