Last Comment Bug 786111 - (CVE-2012-4181) Heap-use-after-free in nsSMILAnimationController::DoSample
(CVE-2012-4181)
: Heap-use-after-free in nsSMILAnimationController::DoSample
Status: VERIFIED FIXED
[asan][qa-][advisory-tracking+]
: assertion, crash, reproducible, sec-critical, testcase
Product: Core
Classification: Components
Component: SVG (show other bugs)
: Trunk
: x86_64 All
: -- critical (vote)
: mozilla18
Assigned To: Daniel Holbert [:dholbert]
: Anthony Hughes (:ashughes) [GFX][QA][Mentor]
Mentors:
Depends on:
Blocks: 775965
  Show dependency treegraph
 
Reported: 2012-08-27 17:07 PDT by Abhishek Arya
Modified: 2014-07-24 13:43 PDT (History)
15 users (show)
abillings: sec‑bounty+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
verified
verified
16+
fixed


Attachments
Testcase (3.07 KB, image/svg+xml)
2012-08-27 17:07 PDT, Abhishek Arya
no flags Details
backtrace of crash (7.06 KB, text/plain)
2012-08-27 22:57 PDT, Daniel Holbert [:dholbert]
no flags Details
backtrace of recursive DoSample() call (10.92 KB, text/plain)
2012-08-28 16:24 PDT, Daniel Holbert [:dholbert]
no flags Details
fix: recognize recursive DoSample calls and bail out (876 bytes, patch)
2012-08-28 16:44 PDT, Daniel Holbert [:dholbert]
bbirtles: review+
roc: superreview+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
lukasblakk+bugs: approval‑mozilla‑esr10+
Details | Diff | Splinter Review

Description Abhishek Arya 2012-08-27 17:07:23 PDT
Created attachment 655825 [details]
Testcase

Reproduces on trunk. Reliable reproduce this by launching multiple instances of firefox with the same testcase. on my fast box, i needed like 15.

=================================================================
==10320== ERROR: AddressSanitizer heap-use-after-free on address 0x7f2197da95d0 at pc 0x7f21c591cd12 bp 0x7ffff4c086b0 sp 0x7ffff4c086a8
READ of size 4 at 0x7f2197da95d0 thread T0
    #0 0x7f21c591cd11 in PL_DHashTableEnumerate src/objdir-ff-asan-sym/xpcom/build/pldhash.cpp:714
    #1 0x7f21c0d411d2 in nsTHashtable<nsSMILCompositor>::EnumerateEntries(PLDHashOperator (*)(nsSMILCompositor*, void*), void*) src/../../dist/include/nsTHashtable.h:237
    #2 0x7f21c0d43762 in nsSMILAnimationController::DoSample(bool) src/content/smil/nsSMILAnimationController.cpp:437
    #3 0x7f21baadbc47 in nsSMILAnimationController::Resample() src/../../../../dist/include/nsSMILAnimationController.h:66
    #4 0x7f21baa4930d in nsSMILAnimationController::FlushResampleRequests() src/../../../../dist/include/nsSMILAnimationController.h:84
    #5 0x7f21baa47fbe in PresShell::FlushPendingNotifications(mozFlushType) src/layout/base/nsPresShell.cpp:3846
    #6 0x7f21bb16d5aa in nsHideViewer::Run() src/layout/generic/nsSubDocumentFrame.cpp:810
    #7 0x7f21bc09650d in nsContentUtils::RemoveScriptBlocker() src/content/base/src/nsContentUtils.cpp:4961
    #8 0x7f21ba679a3a in ~nsAutoScriptBlocker src/../../../dist/include/nsContentUtils.h:2276
    #9 0x7f21ba664432 in ~nsAutoScriptBlocker src/../../../dist/include/nsContentUtils.h:2275
    #10 0x7f21bc2a34a1 in nsDocument::AdoptNode(nsIDOMNode*, nsIDOMNode**) src/content/base/src/nsDocument.cpp:6227
    #11 0x7f21c133969d in nsIDOMDocument_AdoptNode(JSContext*, unsigned int, JS::Value*) src/objdir-ff-asan-sym/js/xpconnect/src/dom_quickstubs.cpp:3467
    #12 0x7f21cc8bf951 in js::CallJSNative(JSContext*, int (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) src/js/src/jscntxtinlines.h:372
    #13 0x7f21cc84ca41 in js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) src/js/src/jsinterp.cpp:2413
    #14 0x7f21cc7b3a35 in js::RunScript(JSContext*, JSScript*, js::StackFrame*) src/js/src/jsinterp.cpp:309
    #15 0x7f21cc8ccc66 in js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::StackFrame*, JS::Value*) src/js/src/jsinterp.cpp:494
    #16 0x7f21cc8cec1e in js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) src/js/src/jsinterp.cpp:531
    #17 0x7f21cc026334 in JS::Evaluate(JSContext*, JS::Handle<JSObject*>, JS::CompileOptions, unsigned short const*, unsigned long, JS::Value*) src/js/src/jsapi.cpp:5665
    #18 0x7f21cc02b2d1 in JS_EvaluateUCScriptForPrincipalsVersionOrigin src/js/src/jsapi.cpp:5746
    #19 0x7f21be34f62f in nsJSContext::EvaluateString(nsAString_internal const&, JSObject*, nsIPrincipal*, nsIPrincipal*, char const*, unsigned int, JSVersion, nsAString_internal*, bool*) src/dom/base/nsJSEnvironment.cpp:1497
    #20 0x7f21be4ff1bf in nsGlobalWindow::RunTimeoutHandler(nsTimeout*, nsIScriptContext*) src/dom/base/nsGlobalWindow.cpp:9590
    #21 0x7f21be4b6b39 in nsGlobalWindow::RunTimeout(nsTimeout*) src/dom/base/nsGlobalWindow.cpp:9851
    #22 0x7f21be4fd21a in nsGlobalWindow::TimerCallback(nsITimer*, void*) src/dom/base/nsGlobalWindow.cpp:10118
    #23 0x7f21c5ca0d22 in nsTimerImpl::Fire() src/xpcom/threads/nsTimerImpl.cpp:473
    #24 0x7f21c5ca25d8 in nsTimerEvent::Run() src/xpcom/threads/nsTimerImpl.cpp:556
    #25 0x7f21c5c659ce in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:624
    #26 0x7f21c5906e27 in NS_ProcessNextEvent_P(nsIThread*, bool) src/objdir-ff-asan-sym/xpcom/build/nsThreadUtils.cpp:220
    #27 0x7f21c46e3595 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:82
    #28 0x7f21c5f12bd9 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:208
    #29 0x7f21c5f12a22 in MessageLoop::RunHandler() src/ipc/chromium/src/base/message_loop.cc:201
    #30 0x7f21c5f12907 in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:175
    #31 0x7f21c3bad7ae in nsBaseAppShell::Run() src/widget/xpwidgets/nsBaseAppShell.cpp:163
    #32 0x7f21c281e448 in nsAppStartup::Run() src/toolkit/components/startup/nsAppStartup.cpp:273
    #33 0x7f21b90259b0 in XREMain::XRE_mainRun() src/toolkit/xre/nsAppRunner.cpp:3800
    #34 0x7f21b902bc24 in XREMain::XRE_main(int, char**, nsXREAppData const*) src/toolkit/xre/nsAppRunner.cpp:3877
    #35 0x7f21b902ecee in XRE_main src/toolkit/xre/nsAppRunner.cpp:3953
    #36 0x40c5bb in do_main(int, char**) src/browser/app/nsBrowserApp.cpp:174
    #37 0x409e1f in main src/browser/app/nsBrowserApp.cpp:279
    #38 0x7f21d60e6c4d in ?? ??:0
0x7f2197da95d0 is located 336 bytes inside of 896-byte region [0x7f2197da9480,0x7f2197da9800)
freed by thread T0 here:
    #0 0x4c3e30 in free ??:0
    #1 0x7f21d2f84572 in moz_free src/memory/mozalloc/mozalloc.cpp:51
    #2 0x7f21c59125d0 in PL_DHashFreeTable src/objdir-ff-asan-sym/xpcom/build/pldhash.cpp:88
    #3 0x7f21c5915ae1 in PL_DHashTableFinish src/objdir-ff-asan-sym/xpcom/build/pldhash.cpp:370
    #4 0x7f21c0d533f1 in ~nsTHashtable src/../../dist/include/nsTHashtable.h:384
    #5 0x7f21c0d53272 in ~nsTHashtable src/../../dist/include/nsTHashtable.h:382
    #6 0x7f21c0d53093 in nsAutoPtr<nsTHashtable<nsSMILCompositor> >::assign(nsTHashtable<nsSMILCompositor>*) src/../../dist/include/nsAutoPtr.h:38
    #7 0x7f21c0d416f7 in nsAutoPtr<nsTHashtable<nsSMILCompositor> >::operator=(nsTHashtable<nsSMILCompositor>*) src/../../dist/include/nsAutoPtr.h:101
    #8 0x7f21c0d43816 in nsSMILAnimationController::DoSample(bool) src/content/smil/nsSMILAnimationController.cpp:449
    #9 0x7f21c0d42e97 in nsSMILAnimationController::DoSample() src/content/smil/nsSMILAnimationController.cpp:351
    #10 0x7f21c0db7c95 in nsSMILTimeContainer::Sample() src/content/smil/nsSMILTimeContainer.cpp:176
    #11 0x7f21c0d3e5c6 in nsSMILAnimationController::Resume(unsigned int) src/content/smil/nsSMILAnimationController.cpp:94
    #12 0x7f21c0d40987 in nsSMILAnimationController::OnPageShow() src/content/smil/nsSMILAnimationController.cpp:193
    #13 0x7f21bc2bf2a8 in nsDocument::OnPageShow(bool, nsIDOMEventTarget*) src/content/base/src/nsDocument.cpp:7306
    #14 0x7f21ba862325 in DocumentViewerImpl::LoadComplete(unsigned int) src/layout/base/nsDocumentViewer.cpp:1045
    #15 0x7f21c229a6da in nsDocShell::EndPageLoad(nsIWebProgress*, nsIChannel*, unsigned int) src/docshell/base/nsDocShell.cpp:6415
    #16 0x7f21c22925e5 in nsDocShell::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, unsigned int) src/docshell/base/nsDocShell.cpp:6246
    #17 0x7f21c22935d4 in non-virtual thunk to nsDocShell::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, unsigned int) src/build/unix/stdc++compat/stdc++compat.cpp:0
    #18 0x7f21c23993e6 in nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, unsigned int) src/uriloader/base/nsDocLoader.cpp:1351
    #19 0x7f21c2396d13 in nsDocLoader::doStopDocumentLoad(nsIRequest*, unsigned int) src/uriloader/base/nsDocLoader.cpp:931
    #20 0x7f21c2390316 in nsDocLoader::DocLoaderIsEmpty(bool) src/uriloader/base/nsDocLoader.cpp:820
    #21 0x7f21c2397329 in nsDocLoader::ChildDoneWithOnload(nsIDocumentLoader*) src/uriloader/base/nsDocLoader.h:193
    #22 0x7f21c2390356 in nsDocLoader::DocLoaderIsEmpty(bool) src/uriloader/base/nsDocLoader.cpp:823
    #23 0x7f21c2394724 in nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned int) src/uriloader/base/nsDocLoader.cpp:704
    #24 0x7f21c2395f7c in non-virtual thunk to nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned int) src/build/unix/stdc++compat/stdc++compat.cpp:0
    #25 0x7f21b9255852 in nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, unsigned int) src/netwerk/base/src/nsLoadGroup.cpp:698
    #26 0x7f21b924c2bc in nsLoadGroup::Cancel(unsigned int) src/netwerk/base/src/nsLoadGroup.cpp:304
    #27 0x7f21c238eeef in nsDocLoader::Stop() src/uriloader/base/nsDocLoader.cpp:328
    #28 0x7f21c2319965 in nsDocShell::Stop() src/docshell/base/nsDocShell.h:189
    #29 0x7f21c2267db7 in nsDocShell::Stop(unsigned int) src/docshell/base/nsDocShell.cpp:4541
previously allocated by thread T0 here:
    #0 0x4c3ef0 in __interceptor_malloc ??:0
    #1 0x7f21d2f849d2 in moz_malloc src/memory/mozalloc/mozalloc.cpp:67
    #2 0x7f21c591245d in PL_DHashAllocTable src/objdir-ff-asan-sym/xpcom/build/pldhash.cpp:82
    #3 0x7f21c5914bd3 in PL_DHashTableInit src/objdir-ff-asan-sym/xpcom/build/pldhash.cpp:229
    #4 0x7f21c0d4e36c in nsTHashtable<nsSMILCompositor>::Init(unsigned int, mozilla::fallible_t const&) src/../../dist/include/nsTHashtable.h:414
    #5 0x7f21c0d45e25 in nsTHashtable<nsSMILCompositor>::Init(unsigned int) src/../../dist/include/nsTHashtable.h:98
    #6 0x7f21c0d435d1 in nsSMILAnimationController::DoSample(bool) src/content/smil/nsSMILAnimationController.cpp:413
    #7 0x7f21baadbc47 in nsSMILAnimationController::Resample() src/../../../../dist/include/nsSMILAnimationController.h:66
    #8 0x7f21baa4930d in nsSMILAnimationController::FlushResampleRequests() src/../../../../dist/include/nsSMILAnimationController.h:84
    #9 0x7f21baa47fbe in PresShell::FlushPendingNotifications(mozFlushType) src/layout/base/nsPresShell.cpp:3846
    #10 0x7f21baaea5e7 in nsRefreshDriver::Notify(nsITimer*) src/layout/base/nsRefreshDriver.cpp:378
    #11 0x7f21c5ca0de6 in nsTimerImpl::Fire() src/xpcom/threads/nsTimerImpl.cpp:476
    #12 0x7f21c5ca25d8 in nsTimerEvent::Run() src/xpcom/threads/nsTimerImpl.cpp:556
    #13 0x7f21c5c659ce in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:624
    #14 0x7f21c5906e27 in NS_ProcessNextEvent_P(nsIThread*, bool) src/objdir-ff-asan-sym/xpcom/build/nsThreadUtils.cpp:220
    #15 0x7f21c46e3595 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:82
    #16 0x7f21c5f12bd9 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:208
    #17 0x7f21c5f12a22 in MessageLoop::RunHandler() src/ipc/chromium/src/base/message_loop.cc:201
    #18 0x7f21c5f12907 in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:175
    #19 0x7f21c3bad7ae in nsBaseAppShell::Run() src/widget/xpwidgets/nsBaseAppShell.cpp:163
    #20 0x7f21c281e448 in nsAppStartup::Run() src/toolkit/components/startup/nsAppStartup.cpp:273
    #21 0x7f21b90259b0 in XREMain::XRE_mainRun() src/toolkit/xre/nsAppRunner.cpp:3800
    #22 0x7f21b902bc24 in XREMain::XRE_main(int, char**, nsXREAppData const*) src/toolkit/xre/nsAppRunner.cpp:3877
    #23 0x7f21b902ecef in XRE_main src/toolkit/xre/nsAppRunner.cpp:3953
Shadow byte and word:
  0x1fe432fb52ba: fd
  0x1fe432fb52b8: fd fd fd fd fd fd fd fd
More shadow bytes:
  0x1fe432fb5298: fd fd fd fd fd fd fd fd
  0x1fe432fb52a0: fd fd fd fd fd fd fd fd
  0x1fe432fb52a8: fd fd fd fd fd fd fd fd
  0x1fe432fb52b0: fd fd fd fd fd fd fd fd
=>0x1fe432fb52b8: fd fd fd fd fd fd fd fd
  0x1fe432fb52c0: fd fd fd fd fd fd fd fd
  0x1fe432fb52c8: fd fd fd fd fd fd fd fd
  0x1fe432fb52d0: fd fd fd fd fd fd fd fd
  0x1fe432fb52d8: fd fd fd fd fd fd fd fd
Stats: 220M malloced (219M for red zones) by 289499 calls
Stats: 41M realloced by 14348 calls
Stats: 192M freed by 181706 calls
Stats: 57M really freed by 116880 calls
Stats: 412M (105547 full pages) mmaped in 103 calls
  mmaps   by size class: 8:196596; 9:40955; 10:16380; 11:14329; 12:2048; 13:1536; 14:1280; 15:256; 16:320; 17:1248; 18:160; 19:40; 20:16;
  mallocs by size class: 8:205900; 9:44146; 10:15477; 11:16402; 12:2266; 13:1677; 14:1399; 15:305; 16:403; 17:1293; 18:173; 19:41; 20:17;
  frees   by size class: 8:114002; 9:35563; 10:12435; 11:13502; 12:1490; 13:1439; 14:1233; 15:266; 16:306; 17:1281; 18:136; 19:39; 20:14;
  rfrees  by size class: 8:78488; 9:19256; 10:8038; 11:9093; 12:676; 13:448; 14:316; 15:147; 16:183; 17:201; 18:28; 19:5; 20:1;
Stats: malloc large: 1524 small slow: 1617
==10320== ABORTING
Comment 1 Mats Palmgren (vacation) 2012-08-27 19:52:14 PDT
I can reproduce the ASAN error easily in a Linux64 debug asan build.

There's an assertion just before the error occurs:
###!!! ASSERTION: RECURSION_LEVEL_SAFE_TO_FINISH(table): 'RECURSION_LEVEL_SAFE_TO_FINISH(table)', file xpcom/build/pldhash.cpp, line 367
Comment 2 Daniel Holbert [:dholbert] 2012-08-27 22:57:35 PDT
Created attachment 655892 [details]
backtrace of crash

I can reproduce the crash, and the assertion that mats mentioned, in a standard debug build.  (I opened three tabs, each visiting a local copy of the testcase, and reloaded them each a few times.  Crashed within 10 sec.)

Here's the backtrace of the crash. (Looks the same as the backtrace in comment 0, but a few levels deeper, because I don't have ASAN turned on, so my build runs until it actually crashes.)

We're crashing because the nsSMILCompositor (for whom we're calling ::CreateSMILAttr() in my stack) has been deleted.  At stack level 2:

(gdb) p this
$9 = (nsSMILCompositor * const) 0x7fd6b13ee038
(gdb) p *this
$10 = {
  <PLDHashEntryHdr> = {
    keyHash = 1515870810
  }, 
  members of nsSMILCompositor: 
  mKey = {
    mElement = {
      mRawPtr = 0x5a5a5a5a5a5a5a5a
    }, 
    mAttributeName = {
      mRawPtr = 0x5a5a5a5a5a5a5a5a
    }, 
    mAttributeNamespaceID = 1515870810, 
    mIsCSS = 90
  }, 
  mAnimationFunctions = {
    <nsTArray_base<nsTArrayDefaultAllocator>> = {
      mHdr = 0x5a5a5a5a5a5a5a5a
    }, 
    <nsTArray_SafeElementAtHelper<nsSMILAnimationFunction*, nsTArray<nsSMILAnimationFunction*, nsTArrayDefaultAllocator> >> = {<No data fields>}, <No data fields>}, 
  mForceCompositing = 90, 
  mCachedBaseValue = {
    mRawPtr = 0x5a5a5a5a5a5a5a5a
  }
}
Comment 3 Daniel Holbert [:dholbert] 2012-08-27 23:05:49 PDT
(I suspect we've freed I suspect we've already deleted nsSMILAnimationController::mLastCompositorTable somehow, based on comment 0 and comment 2)

Flagging as sg:critical.  This also is likely to affect ESR, since this code has remained basically the same for a while. :-/
Comment 4 Mats Palmgren (vacation) 2012-08-28 03:55:06 PDT
FYI, we have security keywords now:
https://wiki.mozilla.org/Security_Severity_Ratings
Comment 5 Daniel Holbert [:dholbert] 2012-08-28 15:54:47 PDT
(ah, thanks)

So what I didn't initially see but do see now is: we're recurring into DoSample(), while we're iterating over mLastCompositorTable.  And in the recursive call, we delete the old mLastCompositorTable. (That's what we do at the end of every sample.)  Then we unwind back up to the upper level, which is still iterating over mLastCompositorTable.

So the underlying problem here is that we're getting a recursive call to DoSample in the middle of a sample.
Comment 6 Daniel Holbert [:dholbert] 2012-08-28 16:24:30 PDT
Created attachment 656250 [details]
backtrace of recursive DoSample() call

Here's the backtrace of the recursive call to DoSample.

Basically: in the upper DoSample() call, we clear animations on a previously-animated attribute, "stop-color".
 * This triggers a call to  nsDOMCSSDeclaration::RemoveProperty()
    ...which instantiates a mozAutoDocConditionalContentUpdateBatch
    ...which (upon destruction) calls nsDocument::EndUpdate()
    ...which in this case, thinks it's a good time to do finalize frame loaders
    ...which destroys the DocShell
    ...which interrupts our loading
    ...which triggers a synchronous sample (in OnPageShow)
Comment 7 Daniel Holbert [:dholbert] 2012-08-28 16:36:27 PDT
So, there are two facts here:

 (a) We shouldn't be receiving recursive DoSample() calls.

 (b) If DoSample() does somehow receive a recursive call, it can & should assert & bail out.

I'm not sure about the Part (a) aspect here -- which part of the stack from the previous comment is busted.  (Maybe bz can help with that?)

But part (b) is really easy to implement -- we already have a member-bool that indicates whether we're currently running a sample, and we can just assert & bail out if we ever hit DoSample with that bool already set.  That should neutralize this attack-vector (and any other attack vectors that can make us recurse like this).

With part (b) fixed, we can file a followup bug in the part (a) department -- figuring out _why_ we're recursing, and hopefully that followup won't be as scary since (b) should neutralize its ill effects.
Comment 8 Daniel Holbert [:dholbert] 2012-08-28 16:44:58 PDT
Created attachment 656261 [details] [diff] [review]
fix: recognize recursive DoSample calls and bail out

This patch is part (b) from the previous comment.
Comment 9 Daniel Holbert [:dholbert] 2012-08-28 16:47:22 PDT
Comment on attachment 656261 [details] [diff] [review]
fix: recognize recursive DoSample calls and bail out

[requesting sr?roc, since fixes for security bugs required sr at some point in the past (not sure if they still do?), and because he might be interested in this bug]
Comment 10 Brian Birtles (:birtles) 2012-08-28 17:25:40 PDT
Comment on attachment 656261 [details] [diff] [review]
fix: recognize recursive DoSample calls and bail out

That seems reasonable to me. Providing the safety net here and splitting off the fix for the underlying issue means we can just land the safety net on ESR etc. which is nice.

I almost wonder if this shouldn't be a fatal assertion but I guess they are only for situations that should never happen and we know that this situation currently can arise?
Comment 11 Daniel Holbert [:dholbert] 2012-08-28 22:37:20 PDT
Landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/ff4523236513
Comment 12 Boris Zbarsky [:bz] 2012-08-28 22:38:36 PDT
> which part of the stack from the previous comment is busted

Well, the part that tears down the frameloader under you from the EndUpdate when you set style around frames 20-27.

I don't quite understand why we're ending up with a document that wants to be finalized before we start running scriptrunners.  Ideally, we'd wrap up that sort of thing before scriptrunners happen!
Comment 13 Boris Zbarsky [:bz] 2012-08-28 22:38:54 PDT
sicking, smaug, thoughts on comment 12?
Comment 14 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-28 23:06:14 PDT
I don't really recall how the frameloader finalization stuff works.

I have a vague recollection that I wanted to move more of it from EndUpdate to scriptrunners, however the stack in comment 0 isn't showing EndUpdate anywhere, though comment 6 is talking about EndUpdate.

What's the relationship between the stack in comment 0 and the one in comment 6?
Comment 15 Boris Zbarsky [:bz] 2012-08-28 23:27:45 PDT
The stack in comment 0 is what you get if you unwind the stack in comment 6.  See comment 5.
Comment 16 Jonas Sicking (:sicking) PTO Until July 5th 2012-08-28 23:44:32 PDT
Ah. Sounds like it would possibly help to move frame finalization to scriptrunners then since we'd do it in frame 7 in the stack in comment 0, which is much after we exit DoSample.

That might not be an easy task though. But I think it's something we really should do. (EndUpdate should really be removed in favor of scriptblockers in general).
Comment 17 Olli Pettay [:smaug] 2012-08-28 23:56:21 PDT
(In reply to Boris Zbarsky (:bz) from comment #12)
> I don't quite understand why we're ending up with a document that wants to
> be finalized before we start running scriptrunners.  Ideally, we'd wrap up
> that sort of thing before scriptrunners happen!
I don't understand this comment.

There is no scriptblocker on the stack so finalization can happen.
Comment 18 Olli Pettay [:smaug] 2012-08-28 23:57:16 PDT
Note, MaybeInitializeFinalizeFrameLoaders does have a check that it is safe to run scripts.
Comment 19 Boris Zbarsky [:bz] 2012-08-29 00:01:49 PDT
The question is why it hasn't _already_ happened before we start running scriptrunners.
Comment 20 Olli Pettay [:smaug] 2012-08-29 00:05:13 PDT
It needs to happen only when it is safe to run scripts, and EndUpdate removes a script blocker
before MaybeInitializeFinalizeFrameLoaders is called...
I'm missing something here.
Comment 21 Boris Zbarsky [:bz] 2012-08-29 00:19:44 PDT
The point is the last scripblocker was removed way before this.  And the question is why we wait until EndUpdate instead of just using a scriptrunner.
Comment 22 Olli Pettay [:smaug] 2012-08-29 00:29:36 PDT
The last scriptblocker is removed in EndUpdate...

We should be able to remove mUpdateNestLevel != 0 check because there is always script blockers when mUpdateNestLevel != 0.
Comment 23 Boris Zbarsky [:bz] 2012-08-29 00:34:38 PDT
> The last scriptblocker is removed in EndUpdate...

What happens in the stack is that the last scriptblocker is removed, scriptrunners start running, one of these scriptrunners does begin/endupdate and boom(!) the frameloader goes away and deletes the class the code is currently running in.

Basically, the current frameloader teardown setup means that any time you're in a scriptrunner (so there are no scriptblockers on stack), doing a no-op begin/end of an update can run arbitrary script or something.  That's _really_ hard to work with.  :(
Comment 24 Olli Pettay [:smaug] 2012-08-29 00:47:12 PDT
Ah, now I see what you mean.

Yeah, the setup is quite complex and haven't been changed for years.
When I was hacking it, it turned out to be very regression risky stuff.
But I guess we could try to force MaybeInitializeFinalizeFrameLoaders to happen only
using scriptrunners.
Comment 25 Olli Pettay [:smaug] 2012-08-29 00:53:42 PDT
Filed Bug 786581
Comment 26 Ed Morley [:emorley] 2012-08-29 07:04:24 PDT
https://hg.mozilla.org/mozilla-central/rev/ff4523236513
Comment 27 Daniel Holbert [:dholbert] 2012-08-29 12:56:13 PDT
OK, so RE whether we should port this to branches -- I tested this on up-to-date debug builds of esr/beta/aurora, with the following results:

 * On ESR (cset 42ee0c5b46ec), I can't reproduce, but that appears to be because the testcase trips over something that ESR doesn't support, or some other sort of incompatibility. I get this output in my terminal when I load the testcase in ESR:
{
JavaScript error: , line 0: uncaught exception: [Exception... "Component returned failure code: 0x80004003 (NS_ERROR_INVALID_POINTER) [nsIDOMHTMLHtmlElement.appendChild]"  nsresult: "0x80004003 (NS_ERROR_INVALID_POINTER)"  location: "JS frame :: file:///path/to/test.svg :: CFcrash :: line 26"  data: no]
}

 * On Beta (cset b88ab611becb), I can't reproduce, but I don't get that ESR js error.  Just, no crash.

 * On Aurora (cset b02ba8b87cc48), I can reproduce very reliably. (within a few seconds, after opening 3-5 tabs of the testcase)

So: Aurora (17) is definitely affected.  As for beta -- I'm guessing it's unaffected because there was a code-change in non-SMIL code (frame finalizer, docshell, etc) between Beta and Aurora that lets us get into this situation, and without that change, we don't get the PageShow-during-a-SMIL-sample behavior.  And ESR is unaffected because (at least) the testcase appears to rely on a feature that it doesn't support, or something.

I wouldn't say we're absolutely certain that ESR and Beta are safe, though.  I think it'd still be worth landing the fix there, since there's very low risk and it gives us a line of defense just in case.
Comment 28 Alex Keybl [:akeybl] 2012-08-29 16:25:21 PDT
Although it sounds like this wouldn't block the 16 release, we'll track for 16 and ESR 16+ since this is low risk and expected to be nominated shortly. Sounds like it would be best to keep this from falling through the cracks.
Comment 29 Daniel Holbert [:dholbert] 2012-08-29 17:10:40 PDT
I did some targeted debug builds to find out what changed between our Beta code & our Aurora code (making Aurora crash but Beta not crash, in Comment 27).

LAST GOOD BUILD: https://hg.mozilla.org/mozilla-central/rev/9c75f0428f8a

FIRST BAD BUILD: https://hg.mozilla.org/mozilla-central/rev/bcac58cbf328
> Mon Aug 13 21:07:09 2012 -0700	bcac58cbf328	Chris Pearce — Bug 775965 - Ensure presentation persists across nsSubDocumentFrame reframes. r=roc

That cset touched nsFrameLoader.h, which seems relevant per comment 12.  Flagging this as a regression from that bug.
Comment 30 Daniel Holbert [:dholbert] 2012-08-29 17:13:33 PDT
RE lsblakk's flagging of "status-firefox-esr10: --- → affected": It actually appears that ESR is _not_ affected, as noted in comment 27.  Current Aurora (FF17) appears to be the first release where the testcase crashes. (and in fact, this seems to have regressed in the last few weeks, per comment 29)
Comment 31 Daniel Holbert [:dholbert] 2012-08-29 17:20:13 PDT
Comment on attachment 656261 [details] [diff] [review]
fix: recognize recursive DoSample calls and bail out

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  It's sec-crit on trunk, at least.
User impact if declined:
 On Aurora: users exposed to exploitable crashes from this bug.
 On Beta/ESR: If there another testcase could trigger this code-path (hopefully not, but unknown), then users would be exposed to similar exploitable crashes.
Fix Landed on Version: 18 (current trunk)
Risk to taking this patch (and alternatives if risky):
 Very low. Adds an early-return for a condition that should never happen (and if it does happen, we'll be about to crash unsafely, and this fix saves us from that.)
String or UUID changes made by this patch:  none

COMMENTARY:
On Aurora, this is a no-brainer -- low-risk patch, fixes known vulnerability.
On Beta/ESR, this is more of a defense-in-depth thing -- there's no known vulnerability that this would be fixing there, but it's low-risk enough that I still think we should still take it.
Comment 32 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-30 16:16:08 PDT
Comment on attachment 656261 [details] [diff] [review]
fix: recognize recursive DoSample calls and bail out

If it turned out this was affecting Beta/ESR that would be quite unfortunate so let's cover all our bases and take it on beta/esr anyway.
Comment 34 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-21 14:27:37 PDT
Verified reproducible with an ASan build created from the parent of the changeset in comment 26.

Verified fixed with:
 * Firefox 18.0a1 2012-09-20 ASan debug
 * Firefox 17.0a2 2012-09-20 ASan debug

Flagging qa- for verification against Beta and ESR since we don't have builds.
Comment 35 Al Billings [:abillings] 2013-01-14 15:14:18 PST
Created attachment 702023 [details]
Bounty Awarded $3000 [paid]

Note You need to log in before you can comment on or make changes to this bug.