Closed Bug 1288228 Opened 4 years ago Closed 4 years ago

Crash in nsSVGElement::WillChangeLength, with CC + setTimeout callback to svg length DOM API

Categories

(Core :: SVG, defect, critical)

34 Branch
Unspecified
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- fixed
firefox-esr45 49+ fixed
firefox50 --- fixed

People

(Reporter: dholbert, Assigned: dholbert)

References

Details

(Keywords: crash, regression, testcase)

Crash Data

Attachments

(5 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-83af8d63-d8fc-40e5-8f22-b54492160720.
=============================================================

STR A:
 1. Load "testcase 1" (which I'll attach shortly)
 2. Reload after a second or so.
 3. Repeat.

STR B:
 1. Load attached "automated wrapper for testcase 1"
 2. Wait a few seconds.

ACTUAL RESULTS:
Crash with null deref [@ nsSVGElement::WillChangeLength]

It looks like we must be doing Cycle Collection (and nulling out the DOMSVGLength's "mSVGElement" pointer), and then after that, we run one more setTimeout callback, which tries to do some work with the DOMSVGLength (and in particular, we call convertToSpecifiedUnits in the setTimeout call, and  DOMSVGLength::ConvertToSpecifiedUnits passes in mSVGElement to a SVG length API that implicitly assumes the passed-in pointer is non-null.
If I load the "automated wrapper", I crash after about ~30 seconds, using current Nightly.  (It's nondeterministic, because it depends on exact timing of the cycle collection.)

"testcase 1" by itself is harder to trigger the crash with; it takes on the order of 20 manual reloads, maybe more.  (Fewer in a debug build, I think, though I'm not sure.)

(CC'ing Jesse, in case he's interested in trying to construct a more deterministic testcase using his fuzzers/domfuzz-extension with explicit calls into CC/GC.)
It seems like the problem is that savedBaseVal should be keeping alive whatever the nsSVGElement is.

Offhand, this comment in SVGAnimatedAngle seems wrong:
   nsSVGAngle* mVal; // kept alive because it belongs to content
In other words, I think mVal should be a strong reference.
Summary: Crash in nsSVGElement::WillChangeLength → Crash in nsSVGElement::WillChangeLength, with CC + setTimeout callback to svg length DOM API
Note that SVGAnimatedAngle doesn't come into play at all with this testcase -- but there's a similar line/comment in the SVG Length code:
>  nsSVGLength2* mVal; // kept alive because it belongs to mSVGElement

Presumably that needs to be made strong.
...though nsSVGLength2 isn't refcounted. Maybe it needs to be?

(jwatt, you understand the plan/organization of the SVG DOM wrappers better than I do -- maybe you could take a look at this? Hmm, can't needinfo you at the moment, so I'll just CC you. :))
This seems to be a regression, based on my local testing. (Old builds don't crash after loading the automated wrapper, even if I wait several minutes).

Regression range:
Last good revision: 6cbdd4d523a7 (2014-08-06)
First bad revision: 2f198e81ed98 (2014-08-07)
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6cbdd4d523a7&tochange=2f198e81ed98
Hmm, I noticed that we seem to *only* use mSVGElement (the thing that's null here) inside of a mVal null-check.

So: if we null out mVal (a raw pointer, owned by the object that we were previously tracking in mSVGElement), then we will fail its null-checks and avoid dereferencing mSVGElement. And that lets us avoid this crash, it seems.

Robert, do you know anything about these wrappers & does this seem sane to you?

(This patch perhaps isn't quite complete, because e.g. ~DOMSVGLength has some static-map cleanup code that depends on mVal, and we perhaps need to run that code before we forget about mVal here. So, just posting this as a strawman for now.)
Attachment #8773052 - Flags: feedback?(longsonr)
(In reply to Andrew McCreight [:mccr8] from comment #4)
> Offhand, this comment in SVGAnimatedAngle seems wrong:
>    nsSVGAngle* mVal; // kept alive because it belongs to content

That comment should be correct.

(In reply to Daniel Holbert [:dholbert] from comment #7)
> ...though nsSVGLength2 isn't refcounted. Maybe it needs to be?

This code has changed since I originally wrote it, but the idea is that the referenced nsSVGLength2 is part of the nsSVGElement object. If the nsSVGElement is kept alive, the nsSVGLength2 will be alive too. If we drop our reference to mSVGElement then we should consider it unsafe to access mVal.

(In reply to Daniel Holbert [:dholbert] from comment #10)
> Created attachment 8773052 [details] [diff] [review]
> strawman fix: null out mVal, when we null out the mSVGElement that owns it

So this makes sense to me.

As a side note, the SVG 2 spec changes to SVG*List attributes should allow us to vastly simplify the code. It would be great if someone could find the time to do that.
(In reply to Daniel Holbert [:dholbert] from comment #10)
> (This patch perhaps isn't quite complete, because e.g. ~DOMSVGLength has
> some static-map cleanup code that depends on mVal, and we perhaps need to
> run that code before we forget about mVal here. So, just posting this as a
> strawman for now.)

The s*SVGLengthTearOffTable code is new to me, but it does indeed look like you'd need to remove mVal from the appropriate table.
(In reply to Jonathan Watt [:jwatt] (catching up after vacation) from comment #11)
>  If the
> nsSVGElement is kept alive, the nsSVGLength2 will be alive too.
I guess a question is that if nsSVGLength2 is kept alive, should nsSVGElement kept alive too?

Is it web-observable if element isn't kept alive, but length is?
Version: Trunk → 34 Branch
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
(In reply to Olli Pettay [:smaug] from comment #13)
> I guess a question is that if nsSVGLength2 is kept alive, should
> nsSVGElement kept alive too?

That's not something that actually happens. nsSVGElement strictly owns the nsSVGLength2 object in question.  If a DOM wrapper exists for the element, that DOM wrapper keeps the element alive, with a refcounted reference to the element (and by extension, it keeps the length alive).

> Is it web-observable if element isn't kept alive, but length is?

Again, that shouldn't actually happen, in practice.

(In actual fact, I think that *sort of is* happening in this bug, but it shouldn't be, and I'll be fixing that. More detail in my next comment.)
(In reply to Daniel Holbert [:dholbert] from comment #14)
> nsSVGElement strictly owns the nsSVGLength2 object in question.  If a DOM wrapper exists for the element

Sorry, meant to say "if a DOM wrapper exists for the *length*"
Comment on attachment 8773406 [details]
Bug 1288228 part 1: Move cleanup code from DOMSVGLength destructor into a helper-function.

https://reviewboard.mozilla.org/r/66084/#review63016
Attachment #8773406 - Flags: review?(jwatt) → review+
(In reply to Daniel Holbert [:dholbert] from comment #14)
> > Is it web-observable if element isn't kept alive, but length is?
> 
> Again, that shouldn't actually happen, in practice.
> 
> (In actual fact, I think that *sort of is* happening in this bug, but it
> shouldn't be, and I'll be fixing that. More detail in my next comment.)

Expanding on this: in particular, when we do the CC Unlink() operation on the DOMSVGLength, the expectation is that JS should not have a way to reference that DOMSVGLength object anymore.

HOWEVER, in actual fact, the DOMSVGLength is still stored in the tearoff table, and JS is still able to run and access the tearoff table and pull out the pointer for the old DOMSVGLength.

In particular:
 - So on the final setTimeout operation (after the CC unlink has happened), JS actually has a *different* "SVGAnimatedLength" object (at a different pointer address).  That new SVGAnimatedLength is associated with the same  old nsSVGElement & same old nsSVGLength2, though.
 - To provide the |savedBaseVal| variable for the fatal setTimeout callback execution, that SVGAnimatedLength object gets queried for its baseVal, which ends up querying the tearoff table (via nsSVGLength2::ToDOMBaseVal, which calls static DOMSVGLength::GetTearOff).  That returns the old (unlinked) DOMSVGLength.

So: In part 2, technically we don't even need to clear mVal -- simply calling CleanupWeakRefs is sufficient to fix the bug (based on my local testing).  However, in the spirit of comment 11, I still think it's worth clearing mVal.
(In reply to Olli Pettay [:smaug] from comment #13)
> I guess a question is that if nsSVGLength2 is kept alive, should
> nsSVGElement kept alive too?

The nsSVGLength2 is part of the nsSVGElement:

https://dxr.mozilla.org/mozilla-central/source/dom/svg/SVGSVGElement.h?q=SVGSVGElement&redirect_type=direct#370

so the lifetime of the nsSVGLength2 is strictly tied to the lifetime of the nsSVGElement. So not keeping the nsSVGElement alive is not an option given the current code setup.

> Is it web-observable if element isn't kept alive, but length is?

Not per the specs.
(My "in particular" bullet-points in comment 19 are based on gdb backtraces from the most 2 recent calls to nsSVGAttrTearoffTable::GetTearoff() just before the crash, for the pair of DOMSVGLength/nsSVGLength2 pointers that are involved in the crash. Looking at the backtraces, I can see that the mozilla::dom::SVGAnimatedLength pointer is different, but the DOMSVGLength, nsSVGElement, and nsSVGLength2 pointers are the same.)
(In reply to Daniel Holbert [:dholbert] from comment #19)
> Expanding on this: in particular, when we do the CC Unlink() operation on
> the DOMSVGLength, the expectation is that JS should not have a way to
> reference that DOMSVGLength object anymore.

I still isn't clear to me whether calling Unlink on the object that |window.savedBaseVal| points to is okay given that such an object is clearly still reachable by script. Maybe it's something to do with the fact that the window has been unloaded, and it's only _then_ the setTimeout that accesses |window.savedBaseVal| is run. If the window has been unloaded, should that setTimeout fire though?

An any rate, the other things that you observe and the patch makes sense.
Comment on attachment 8773407 [details]
Bug 1288228 part 2: When a DOMSVGLength is CC'd, remove it from tearoff table and drop its weak ref to wrapped val.

https://reviewboard.mozilla.org/r/66086/#review63018

This seems sane to me.
Attachment #8773407 - Flags: review?(jwatt) → review+
I haven't been able to come up with a reliably-crashing automated regression-test for this bug, but I do have an occasionally-crashing automated testcase (crashes occasionally in unpatched builds)

I sanity-checked that I can get it to crash (occasionally) by pushing to Try and doing a bunch of retriggers:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=ead6fc8c3a55&selectedJob=24621924

I successfully triggered the crash at least once, here:
  https://treeherder.mozilla.org/logviewer.html#?job_id=24621924&repo=try#L10871
(jwatt, I'd appreciate it if you could sanity-check the test; it's a bit hacky/hand-wavy, so I probably shouldn't land it unreviewed. I included some comments in the HTML, which I think explain the idea behind it.)
Comment on attachment 8775357 [details]
Bug 1288228 part 3: Add regression mochitest for this bug.

https://reviewboard.mozilla.org/r/67518/#review64678

The comments seem sufficient to me, and I don't have any worthwhile suggestions on improving the test so this'll have to do. Thanks, Daniel.
Attachment #8775357 - Flags: review?(jwatt) → review+
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/45d06c2c094d
part 1: Move cleanup code from DOMSVGLength destructor into a helper-function. r=jwatt
https://hg.mozilla.org/integration/autoland/rev/4d20e25a4438
part 2: When a DOMSVGLength is CC'd, remove it from tearoff table and drop its weak ref to wrapped val. r=jwatt
https://hg.mozilla.org/integration/autoland/rev/ffc7874fcb7e
part 3: Add regression mochitest for this bug. r=jwatt
Comment on attachment 8773406 [details]
Bug 1288228 part 1: Move cleanup code from DOMSVGLength destructor into a helper-function.

This landed in Firefox 50. Let's backport this patch-stack to Firefox 49 as well (which I guess means "beta" approval since we've just bumped the version-numbers).

Approval Request Comment
[Feature/regressing bug #]: Not a recent regression (may have regressed back in 2014, per comment 9).  "Feature" is our JavaScript-exposed SVG DOM APIs.
[User impact if declined]: Crashes/instability.
[Describe test coverage new/current, TreeHerder]: Automated test included with patch. We have decent test coverage for this region of SVG code as well, I think.
[Risks and why]: Low-risk. This just makes us more aggressively/consistently clean up objects that are in the process of being destroyed.
[String/UUID change made/needed]: None.
Attachment #8773406 - Flags: approval-mozilla-beta?
Comment on attachment 8773406 [details]
Bug 1288228 part 1: Move cleanup code from DOMSVGLength destructor into a helper-function.

This patch fixes a crash and add a test. Let's take it in 49 beta.
Attachment #8773406 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
We should perhaps consider this for ESR45, too, but let's give it some time on beta first.

[Setting needinfo=me to request ESR45 uplift at some point soonish.]
Flags: needinfo?(dholbert)
Comment on attachment 8773406 [details]
Bug 1288228 part 1: Move cleanup code from DOMSVGLength destructor into a helper-function.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This is a fix for a not-known-to-be-security-sensitive crash, but the crash seems like it *might* be able to be massaged into a Use-After-Free.  In other words: this isn't *known* to be sec:crit, but there's a chance there might be something sec:crit-ish hiding here.  My analysis so far says that we happen to be saved from use-after-free, but it's possible I missed something, since the objects' lifetimes here are complex.

User impact if declined: Possible crash (probably just DOS, but as noted above maybe there's a chance of a sec vulnerability lurking here as well).

Fix Landed on Version: Landed in 50; backported to 49 beta.

Risk to taking this patch (and alternatives if risky): Low-risk.  This just makes us more aggressively/consistently clean up objects that are in the process of being destroyed. This has been baking on various branches for 1-2 weeks already, too.

String or UUID changes made by this patch: None.
Attachment #8773406 - Flags: approval-mozilla-esr45?
Attachment #8773406 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
You need to log in before you can comment on or make changes to this bug.