Closed Bug 1000185 (CVE-2014-1541) Opened 10 years ago Closed 10 years ago

ASAN heap-use-after-free in RefreshDriverTimer::TickDriver

Categories

(Core :: SVG, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla32
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 + verified
firefox31 + fixed
firefox32 + fixed
firefox-esr24 30+ verified
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
seamonkey2.26 --- fixed

People

(Reporter: nils, Assigned: dholbert)

Details

(Keywords: csectype-uaf, sec-critical, Whiteboard: [asan][adv-main30+][adv-esr24.6+])

Attachments

(7 files, 3 obsolete files)

627 bytes, text/html
Details
168 bytes, image/svg+xml
Details
9.48 KB, text/plain
Details
527 bytes, text/html
Details
1.08 KB, patch
dholbert
: review+
abillings
: sec-approval+
Details | Diff | Splinter Review
4.63 KB, patch
dholbert
: review+
abillings
: sec-approval+
Details | Diff | Splinter Review
2.53 KB, patch
Details | Diff | Splinter Review
Attached file crash.html
The attached testcase crashes the latest ASAN build. Crash was also confirmed on windows with the current release build. The attached testcase requires ovaling_fast.svg. Takes about 20-30 seconds to trigger.

ASAN output attached in asan.txt
Attached image ovaling_fast.svg
Attached file asan.txt
So nsSMILAnimationController doesn't remove itself from RefreshDriver?
Patch coming, I hope.
Or actually, I don't quite understand the setup nsSMILAnimationController uses for sampling.
Daniel, could you look at this?
Component: DOM → SVG
Flags: needinfo?(dholbert)
Keywords: csectype-uaf
Whiteboard: [asan]
Here's a reduced standalone testcase that reproduces this for me (crashing my debug build, without ASAN required).

STR:
 1. Load testcase.
 2. Close tab (but don't quit browser).
 3. Wait 10 seconds.

ACTUAL RESULTS:
###!!! ABORT: observers should have unregistered: 'ObserverCount() == 0', file layout/base/nsRefreshDriver.cpp, line 698

I abort if I swap my steps 2 and 3, too. (i.e. if I wait and then close the tab, I get an immediate abort)
Flags: needinfo?(dholbert)
From poking around in GDB, it looks like the same nsSMILAnimationController gets registered *twice* with the nsRefreshDriver:
 - First in nsSMILAnimationController::NotifyRefreshDriverCreated
 - Second in nsSMILAnimationController::OnPageShow

And it tries to unregister twice, too, in the teardown equivalent of each of those methods. (NotifyRefreshDriverDestroyed & OnPageHide).

But OnPageHide fails to actually contact the refresh driver, because at that point mDocument::GetPresShell() returns null. So it can't unregister, which is why we we end up still in the observer list.

Anyway -- I think the problem is that we shouldn't be registering twice in the first place.
And we should ensure that the controller will be unregistered if it has been registered.
(In reply to Olli Pettay [:smaug] from comment #7)
> And we should ensure that the controller will be unregistered if it has been
> registered.

We do. That's what the NotifyRefreshDriverCreated/Destroyed code is for. It just doesn't handle the case where we confuse ourselves and accidentally register twice. :)
Actually, this is much more interesting.

* First, we get the call to this stack:
 nsSMILAnimationController::Resume()
 nsSMILAnimationController::OnPageShow()
 nsDocument::OnPageShow()
 nsDocumentViewer::LoadComplete()

At this point, our document *does not have a pres shell* (because it's got 'display:none').  So we're expecting that when Resume() calls MaybeStartSampling(GetRefreshDriver()), it'll be passing in null, and we won't actually register to start sampling.

HOWEVER, before we actually get to that MaybeStartSampling() call, we do a synchronous Sample() call, which sets off a chain of events:
 - Sample() flushes style...
 - which flushes layout in the parent document...
 - which makes us honor the "display" change on the <iframe> in the outer document
 - which makes us call nsSubDocumentFrame::ShowViewer() on the <iframe>'s frame
 - which makes us *create a pres shell* for the inner document
 - which makes us create a refresh driver and call NotifyRefreshDriverCreated() on the nsSMILAnimationController
 - which makes us register with the refresh driver.

Then, we pop off all of that stack, and get back up to our ::Resume() call. Importantly, during its synchronous Sample() call, its document created itself a pres shell. So when it runs the next line of code -- MaybeStartSampling(GetRefreshDriver()) -- it does actually get a refresh driver, and it successfully registers with it. (a second time)
Here's the code I'm talking about in comment 9:

> 82 void
> 83 nsSMILAnimationController::Resume(uint32_t aType)
> 84 {
[...]
> 92   if (wasPaused && !mPauseState && mChildContainerTable.Count()) {
> 93     Sample(); // Run the first sample manually
> 94     MaybeStartSampling(GetRefreshDriver());
> 95   }
> 96 }
http://mxr.mozilla.org/mozilla-central/source/dom/smil/nsSMILAnimationController.cpp#82

With this testcase, GetRefreshDriver() returns null before the Sample() call in line 93, but it returns non-null after that Sample() call. (because Sample()'s style flush has the unexpected consequence of creating a pres shell for us)

Elsewhere in this file, we call Sample() _after_ MaybeStartSampling. I think we might want to do that here, too; I think that might fix this. (I mad a similar change to another call in bug 699143 comment 12, "for consistency", and didn't expect much functional effect. So I'd expect that should be fine to do here, too.)
Attached patch part 1 (obsolete) — Splinter Review
Per previous comment, this makes us do the [Maybe]StartSampling() / Sample() steps in an order that's more consistent with the other places where we start sampling & do a synchronous sample.

In particular, this makes the ordering consistent with these two spots:
http://mxr.mozilla.org/mozilla-central/source/dom/smil/nsSMILAnimationController.cpp?rev=aaabd2d39060#173
http://mxr.mozilla.org/mozilla-central/source/dom/smil/nsSMILAnimationController.cpp?rev=aaabd2d39060#824

This has the happy consequence that Sample()'s side effects won't impact MaybeStartSampling's success/failure. (and won't let it turn into a double-registration)

This fixes the crashes/aborts that I see in both testcases in my [non-ASAN] debug build.
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
Attachment #8411422 - Flags: review?(birtles)
Comment on attachment 8411422 [details] [diff] [review]
part 1

Review of attachment 8411422 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Daniel for the thorough investigation. r=me
Attachment #8411422 - Flags: review?(birtles) → review+
Flags: in-testsuite?
Thanks, Brian!

FWIW, I've satisfied myself that the attached patch fixes this bug's crash. In particular:

 - without this patch, I hit a crash or abort after loading the original testcase once or twice. Tested an ASAN opt build, ASAN debug+opt build, & non-ASAN debug build.

 - *with* this patch, I was unable to trigger any crash or abort on the original testcase (tested same suite of builds)


I also tested with my simplified "testcase 2"; that testcase only aborts in debug builds. (It has no symptoms in opt builds).  I verified that the patch prevents that debug-build abort.
Per IRC chat with smaug: for some more defense-in-depth (and to make bugs like this easier to discover), I think it'd probably be worth adding a bool member-var to nsSMILAnimationController (perhaps debug-only) that tracks whether we're currently registered with a refresh driver. Then, we can assert if we try to register twice, and if we're destructed while still registered.

I'll intend to attach that as a "part 2" patch here.
This patch adds a bool to track whether we're registered with a refresh driver.

It asserts fatally if we ever try to register more than once. (Both of this bug's testcases would've immediately aborted debug builds, if we had this patch applied [without the other patch] -- this makes it easier to catch any bugs events of a similar sort in the future.

This also asserts fatally if we're destructed without unregistering.

Finally, it lets the StopSampling() method be a bit faster, in cases where we make a redundant call to that method (which we do, to be on the safe side & make sure we get unregistered). Now we'll only bother calling RemoveRefreshObserver if we actually know we're registered (which means we can avoid an unnecessary traversal over the array of refresh observers).
Attachment #8411643 - Flags: review?(birtles)
How far back does this go?
It affects all supported branches, I think. At least, I was easily able to make the original testcase crash an old Nightly, ver 20.0a1 (2012-12-01), by loading it in several tabs simultaneously.

UA string: Mozilla/5.0 (X11; Linux x86_64; rv:20.0) Gecko/20121201 Firefox/20.0

If I had to guess about the exact regression, I'd bet it dates back at least as far as http://hg.mozilla.org/mozilla-central/rev/2ea2a3b3df0d for Bug 590425 (in 2010) which is where we added the second way of getting added as a refresh observer (which then lets us get registered twice and only unregistered once). But it probably goes back even further than that -- i.e. I'll bet this testcase might've even made us fail to unregister the first time, before that bug was fixed. (We'd fail because the pres shell would be disconnected when we go to unregister, so we can't get a handle on the refresh driver.)

Anyway, given that it goes back as far as Firefox 20, I think we can assume it affects all supported branches.
Comment on attachment 8411643 [details] [diff] [review]
part 2: track whether we're registered with a refresh driver

Review of attachment 8411643 [details] [diff] [review]:
-----------------------------------------------------------------

Unless you think this part is too security sensitive, a comment in the header explaining what the patch is doing would be helpful (a sanitized version of comment 15 would be fine). Otherwise looks great. Thanks Daniel!

::: dom/smil/nsSMILAnimationController.cpp
@@ -277,2 @@
>      // NOTE: The document might already have been detached from its PresContext
>      // (and RefreshDriver), which would make GetRefreshDriverForDoc return null.

While you're at it, perhaps update the comment s/GetRefreshDriverForDoc/GetRefreshDriver/ ?
Attachment #8411643 - Flags: review?(birtles) → review+
Attachment #8411422 - Attachment description: fix v1 → part 1
Here's part 1 (the fix), ready-to-land with commit message included.

Requesting security approval. (I'm guessing it might not be granted right away, given that we're about to ship a release and we don't want to zero-day ourselves?)

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

 Not easily.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

 No.

Which older supported branches are affected by this flaw?

 All, per comment 17. :(

If not all supported branches, which bug introduced the flaw?

 N/A

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

 This patch should apply cleanly on all affected branches. The lines of code touched by the patch are the same in ESR24 as it is in trunk. (though in ESR24, the file lives in /content/smil instead of in /dom/smil )

How likely is this patch to cause regressions; how much testing does it need?

 Unlikely. Per end of comment 10, this reordering makes us more consistent with a few other pieces of code (so it's been sanity-checked in those other spots). I also did a try run earlier this week (without any hint that it was for a sec bug), which was green aside from a few unrelated intermittent test failures: https://tbpl.mozilla.org/?tree=Try&rev=c5032c8866f1
Attachment #8411422 - Attachment is obsolete: true
Attachment #8412954 - Flags: sec-approval?
Attachment #8412954 - Flags: review+
[Security approval request comment]
How easily could an exploit be constructed based on the patch?

 Not too easily.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

  Yes; this patch suggests that it may be possible to have a nsSMILAnimationController stay in the refresh driver's list (and continue to receive calls to the virtual method WillRefresh()), after the nsSMILAnimationController has been deleted.  (which is a use-after-free, which is the security problem here)

  It doesn't make it clear how you might get into that sort of a situation, though.

Which older supported branches are affected by this flaw?
If not all supported branches, which bug introduced the flaw?
 (see comment 19)
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
 Should be easy / non-risky; this code hasn't changed much recently.

How likely is this patch to cause regressions; how much testing does it need?

 Unlikely to cause regressions. This patch isn't necessary to fix this bug, and it has marginal effect on opt builds; it mostly just makes it easier for us to *catch* this sort of issue (registering twice and/or being destructed without unregistering) in debug builds, with assertions. So maybe we don't want to bother backporting this part (but it's probably worth doing, so that the changes we're backporting for this bug are as close as possible to the changes that get nightly testing).

NOTE: If it's preferred, instead of landing this patch as part of this bug (and to avoid leaking information about this bug), I could also spin this patch off into a public bug, with a bug description like "here's a hypothetical situation that shouldn't be able to happen, but let's help ourselves catch it with assertions just in case", and scope this bug here to just "part 1".

(I suppose a sufficiently-sneaky attacker could still infer a connection to a security bug, by looking at nsSMILAnimationController commit log and noticing the proximity of this bug's fix and the public bug's fix, so maybe it's not worth it to spin this off.)
Attachment #8411643 - Attachment is obsolete: true
Attachment #8412970 - Flags: sec-approval?
Attachment #8412970 - Flags: review+
Attachment #8412970 - Attachment description: part 2 (making this sort of bug easier to catch): track whether we're registered with a refresh driver [r=birtles] → part 2 v2 (making this sort of bug easier to catch): track whether we're registered with a refresh driver [r=birtles]
(clarified commit message for part 2 slightly. see previous comment for sec-approval request)
Attachment #8412970 - Attachment is obsolete: true
Attachment #8412970 - Flags: sec-approval?
Attachment #8412977 - Flags: sec-approval?
Attachment #8412977 - Flags: review+
(In reply to Brian Birtles (:birtles) from comment #18)
> Unless you think this part is too security sensitive, a comment in the
> header explaining what the patch is doing would be helpful (a sanitized
> version of comment 15 would be fine). Otherwise looks great. Thanks Daniel!

I've added a sanitized commit message in the latest version of this patch. (I'm pretty sure that's what you meant by "comment in the header explaining what the patch is doing" [as opposed to e.g. a comment in the .h header-file])

Thanks for the reviews!
I'm giving this sec-approval+ to checkin on 5/20 (or later). We'll want it on the then Aurora and Beta then too.
Whiteboard: [asan] → [asan][checkin on 5/20]
Attachment #8412954 - Flags: sec-approval? → sec-approval+
Attachment #8412977 - Flags: sec-approval? → sec-approval+
needinfo=me to land this on or after 5/20. [Added a calendar reminder to help me remember, too.]
Flags: needinfo?(dholbert)
(In reply to Al Billings [:abillings] from comment #23)
> I'm giving this sec-approval+ to checkin on 5/20 (or later). We'll want it
> on the then Aurora and Beta then too.

Should I request aurora/beta approval now, or after 5/20 (when it's landed on trunk)?
Flags: needinfo?(abillings)
(In reply to Daniel Holbert [:dholbert] from comment #25)
> Should I request aurora/beta approval now, or after 5/20 (when it's landed on trunk)?
You should just wait.
Flags: needinfo?(abillings)
I'd do it then or it becomes confusing since Aurora now and Aurora then are different versions...
Sounds good, thanks.
Flags: needinfo?(dholbert)
Flags: needinfo?(dholbert)
Here's a crashtest patch, to be landed once we've shipped the fix to all release builds & are ready to open up this bug.

This includes both the original test (with the setInterval call removed) and my reduced test. I verified that each of these two tests (on its own) is effective as a crashtest for this bug.

Specifically: in an unpatched build, each of these crashtests trigger this fatal assertion at shutdown, when I run their directory of crashtests:
{
###!!! ABORT: observers should have unregistered: 'ObserverCount() == 0', file /mozilla/layout/base/nsRefreshDriver.cpp, line 698
}

Even better: with my "part 2" patch applied, each crashtest aborts *on load* (instead of at shutdown), with my new "Redundantly registering with refresh driver" assertion.

And if I also apply "part 1" (the fix), then these crashtests don't abort or assert at all.
Group: layout-core-security
https://hg.mozilla.org/integration/mozilla-inbound/rev/66992fa55eb6
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd0a38a9b756

Please nominate this for Aurora/Beta/ESR24 approval as soon as you get a chance :)
Flags: needinfo?(dholbert)
Whiteboard: [asan][checkin on 5/20] → [asan]
Flags: needinfo?(dholbert)
Comment on attachment 8412954 [details] [diff] [review]
part 1 (the fix), now with commit message [r=birtles]

Requesting beta/aurora approval.

[Approval Request Comment]
Bug caused by (feature/regressing bug #):

 Not known, but probably bug 590425 or something earlier than that (in 2010). See comment 17 for more info.

User impact if declined:

 Security risk (use-after-free)

Testing completed (on m-c, etc.):

 Local testing; Try run (without any bug # or security-hinting comments); landed on m-i, so will get m-c/nightly testing shortly.

Risk to taking this patch (and alternatives if risky):

 Low. Targeted change, with no anticipated functional impact (aside from avoiding getting into the unexpected situation triggered by this bug's testcases). See end of comment 19.

String or IDL/UUID changes made by this patch:

 None.
Attachment #8412954 - Flags: approval-mozilla-beta?
Attachment #8412954 - Flags: approval-mozilla-aurora?
Comment on attachment 8412954 [details] [diff] [review]
part 1 (the fix), now with commit message [r=birtles]

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:

 (sec-critical)

User impact if declined:

 Security risk.

Fix Landed on Version:

 32

Risk to taking this patch (and alternatives if risky):

 Low; see previous comment & end of comment 19.

String or UUID changes made by this patch: 

 None.
Attachment #8412954 - Flags: approval-mozilla-esr24?
Flags: needinfo?(dholbert)
Comment on attachment 8412977 [details] [diff] [review]
part 2 v3 (making this sort of bug easier to catch): track whether we're registered with a refresh driver [r=birtles]

Requesting aurora/beta/esr24 approval for part 2.

See previous 2 comments for answers to approval-questions. Though, this part has a slightly different risk/reward calculus than "part 1". Quoting end of comment 20, about this part:
{
This patch isn't necessary to fix this bug, and it has marginal effect on opt builds; it mostly just makes it easier for us to *catch* this sort of issue (registering twice and/or being destructed without unregistering) in debug builds, with assertions. So maybe we don't want to bother backporting this part (but it's probably worth doing, so that the changes we're backporting for this bug are as close as possible to the changes that get nightly testing).
}

I lean on the side of the parenthetical "but it's probably worth doing [backporting]" at the end there, to keep this code in sync on our various branches.  Hence, requesting backport approval.
Attachment #8412977 - Flags: approval-mozilla-esr24?
Attachment #8412977 - Flags: approval-mozilla-beta?
Attachment #8412977 - Flags: approval-mozilla-aurora?
Attachment #8412977 - Flags: approval-mozilla-esr24?
Attachment #8412977 - Flags: approval-mozilla-esr24+
Attachment #8412977 - Flags: approval-mozilla-beta?
Attachment #8412977 - Flags: approval-mozilla-beta+
Attachment #8412977 - Flags: approval-mozilla-aurora?
Attachment #8412977 - Flags: approval-mozilla-aurora+
Attachment #8412954 - Flags: approval-mozilla-esr24?
Attachment #8412954 - Flags: approval-mozilla-esr24+
Attachment #8412954 - Flags: approval-mozilla-beta?
Attachment #8412954 - Flags: approval-mozilla-beta+
Attachment #8412954 - Flags: approval-mozilla-aurora?
Attachment #8412954 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/66992fa55eb6
https://hg.mozilla.org/mozilla-central/rev/bd0a38a9b756
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Group: layout-core-security
Confirmed crash in 2014-05-15, ASan build of Fx30.
Verified fixed in 2014-06-03, ASan builds of Fx24.6.0esr and Fx30.
Status: RESOLVED → VERIFIED
Whiteboard: [asan] → [asan][adv-main30+][adv-esr24.6+]
Alias: CVE-2014-1541
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: