233 bytes, image/svg+xml
20.16 KB, text/plain
79.54 KB, image/png
1.10 KB, patch
|Details | Diff | Splinter Review|
14.36 KB, patch
|Details | Diff | Splinter Review|
8.96 KB, patch
|Details | Diff | Splinter Review|
1.32 KB, patch
|Details | Diff | Splinter Review|
Created attachment 574456 [details] Minimized SVG file (load from local file to test) When opening a specific SVG file (created by a mutation fuzzer), the browser will crash with EIP=0x00000000. This was tested on Firefox 3.6.24 / Ubuntu 10.04 LTS and Firefox 8.0 / WinXP SP3. Under Linux : a call to *(%eax) at _ZN20txStylesheetCompiler15flushCharactersEv+26 will set EIP to 0x000000. An annotated gdb session is attached. A similar behavior is constated on Firefox 8.0 (a screenshot of windbg !exploitable is attached too).
I'm not seeing a crash on current trunk (on Mac). Just an XML parser error from the failed XSLT load....
I initially just saw an XML parser error, too, but then I was able to crash. bp-890b1eac-dd1e-4bd9-b3d8-2660e2111114 With a locally-saved version of the testcase, I was able to crash on the first try w/ a fresh profile, in my nightly as well as in my debug build.
We crash here: > 494 rv = (mHandlerTable->mTextHandler)(mCharacters, *this); http://hg.mozilla.org/mozilla-central/annotate/b51eb4007926/content/xslt/src/xslt/txStylesheetCompiler.cpp#l494 ...because mHandlerTable->mTextHandler is a null pointer.
OK, this looks kinda bad... We get the faulty mHandlerTable (mentioned in comment 5) from a call to "popHandlerTable", which pops a raw pointer off of "mOtherStack" (a stack of void*'s) and casts this raw pointer to a txHandlerTable*. However, the raw pointer is *not* really a txHandlerTable* -- it's in fact a txElementHandler*, which was pushed onto the stack here: http://hg.mozilla.org/mozilla-central/annotate/eb84780783ed/content/xslt/src/xslt/txStylesheetCompiler.cpp#l345 So we're effectively casting a txElementHandler* to a txHandlerTable*, and then trying to work with the txHandlerTable* pointer, and crashing.
Note that we never actually get a call to "pushHandlerTable" on this testcase (though we do get popHandlerTable as noted above. I don't know our XSLT code very well, but I'd expect that we'd intend for our pushHandlerTable/popHandlerTable calls to be balanced...
So the evil popHandlerTable() call comes from txFnEndEmbed(), which calls txFnEndStylesheet(), which calls popHandlerTable(). Balancing this, we have an earlier call to txFnStartEmbed(), which in an ideal world would call txFnStartStylesheet(), which would call pushHandlerTable(). (this is all nicely balanced, with just "start" vs "end" and "push" vs "pop") However, with this bug's testcase, txFnStartEmbed takes an early-return, because 'aState.handleEmbeddedSheet()' fails. So basically, we're in a sort of error state, and we bail out early & never get to the pushHandlerTable() call. So, it looks like we need to detect this same error state when we're cleaning up, so we can skip the unbalanced popHandlerTable call there.
Ah, so we actually do have the same "handleEmbeddedSheet()" check in txFnEndEmbed() -- but we pass that check there, because the state has changed between txFnStartEmbed and txFnEndEmbed. In particular, txStylesheetCompiler::startElementInternal() sets mEmbedStatus = eInEmbed... http://hg.mozilla.org/mozilla-central/annotate/eb84780783ed/content/xslt/src/xslt/txStylesheetCompiler.cpp#l317 ...which will make "handleEmbeddedSheet()" pass, when we get to txFnEndEmbed().
Beyond the above, I don't really understand what's going on here or what the right fix is, since this XSLT code is pretty complex and not-really-touched-for-a-long-time. I'm hoping that perhaps sicking knows what to do (or can suggest what to do)... Hopefully the info I've posted above will help him or someone else familiar with this code to figure out what needs fixing. :)
Is it possible to get a non-null txElementHandler* ? This is a very small testcase so it's hard to tell if it's null because that's the only bogus value possible or just due to the testcase.
John will be looking into this bug, re-assigning.
Sicking agrees this is probably exploitable -> sg:critical
Unrealistic to fix for fx10 with the release next week (builds already in QA hands I think) but we need to fix by Fx11. Discovered independently by two researchers on the same weekend, we should assume it's fairly easy to fine should anyone else also be poking in this area.
I want to understand something here: Are we planning to ship with this bug unfixed because of the train schedule?
This affects 126.96.36.199 as well. We should respin to take this if we're going to take it in the other simultaneous releases.
Created attachment 592312 [details] [diff] [review] WIP - Add type checking to pushPtr/popPtr in XSLT code I'm not sure I like this, but it works. Right now it just calls NS_ABORT_IF_FALSE, but it should ideally gracefully fail on the xslt compile. The string typenames should probably be swapped with an enum or somesuch. I had a nicer version that used macros to avoid having any additional arguments to pushPtr, but I gave up on finding a simple way to tokenize the typenames in a macro.
Comment on attachment 592312 [details] [diff] [review] WIP - Add type checking to pushPtr/popPtr in XSLT code Yeah, something like an enum is needed here. These string operations will be too slow.
Created attachment 592333 [details] [diff] [review] Fix state corruption when embedded stylesheet is of wrong node type The issue as I understand it is this: 1) mEmbedStatus is eNeedEmbed, as the stylesheet declaration referenced an embedded stylesheet 2) txFnStartEmbed handler is matched for a parent element, but because eInEmbed was not set it leaves without touching it. 3) An element with the ID matching said embedded stylesheet is found, and mEmbedStatus is set to eInEmbed 4) The element does not otherwise match an embedded stylesheet, so txFnStartEmbed is not one of the handlers that gets called. mEmbedStatus is still eInEmbed 5) The stack begins unwinding, calling endElement on all elements 6) Because mEmbedStatus is still eInEmbed, the exit handler for the embed handler on the stack from part 2 believes it is exiting from *it's* embed, and calls txFnEndStylesheet. 7) Mismatched stack. This patch should fix this by adding a new state, eTryEmbed, which is set before calling the element handlers. If a startembed handler is reached, it calls startEmbedding() to move state into inEmbed, otherwise the startElement call moves state back to eNeedEmbed to attempt to match other elements (the XML document is invalid at this point, so we could also move to eNoEmbed). ... That is, assuming I fully understand what is going on here.
Created attachment 592334 [details] [diff] [review] Fix state corruption when embedded stylesheet is of wrong node type (Attached wrong version of patch)
Created attachment 592350 [details] [diff] [review] Fix state corruption when embedded stylesheet is of wrong node type
Created attachment 592357 [details] [diff] [review] Fix state corruption by directing LRE nodes through the handler After talking to Jonas, he believes the only way the handler is skipped is due to LREs. Because the handler checks namespace, directing LREs to the embed handler should also fix this issue. This patch updates the HandlerTable to point at Stop/EndEmbed regardless.
Created attachment 592358 [details] [diff] [review] Validate stack usage throughout the compiler This validates the stack pushes/pops throughout the parser - which would catch other similar unforseen edge cases.
Comment on attachment 592357 [details] [diff] [review] Fix state corruption by directing LRE nodes through the handler Review of attachment 592357 [details] [diff] [review]: ----------------------------------------------------------------- Yeah, this should work fine. I'm pretty sure this was how I intended it to be written originally.
Comment on attachment 592358 [details] [diff] [review] Validate stack usage throughout the compiler Review of attachment 592358 [details] [diff] [review]: ----------------------------------------------------------------- r=me
Either of these patches seem very safe. Safe enough that I think we should take them in FF10. The variable-stack patch is probably slightly safer in that it won't affect any behavior other than when things have gone really really bad and we'd likely crash (exploitably) anyway. The LRE patch will only adjust behavior in very rare conditions, even for XSLT usage. It only affects embedded XSLT stylesheets which I don't think I've never seen in the wild, and it only does so with erroneous such stylesheets. In fact, we've likely always crashed for such stylesheets so it's unlikely going to affect anyone in a negative way.
(In reply to Jonas Sicking (:sicking) from comment #30) > Either of these patches seem very safe. Safe enough that I think we should > take them in FF10. Are both of these patches together necessary to plug the exploit hole? It's not clear if they're both targeting the same problem or serving different purposes. If it's the same problem, I'd prefer to take less code change. We should also check to make sure that these patches apply to the 1.9.2 branch (marked as affected).
Either patch by itself will fix the security hole, but both should eventually be landed. The "Validate stack usage" patch adds more safety to the XSLT compiler by type checking all elements pushed/pop'd from it's stack. Thus, stack desync in the compiler (such as caused by this bug) just cause an un-exploitable browser abort: > ###!!! ABORT: Expected type does not match top element type on stack: 'type == aType', file /home/johns/moz/mozilla-central/content/xslt/src/xslt/txStylesheetCompiler.cpp, line 727 The other patch fixes this specific bug, but doesn't address the lack of safety for the stack, meaning any undiscovered bugs in the compiler could easily become exploitable. I'll ensure these apply to 1.9.2 and attach an alternate version if necessary
Created attachment 592363 [details] [diff] [review] Validate stack usage throughout the compiler. r=sicking Fix commit message
Created attachment 592367 [details] [diff] [review] [1.9.2] Validate stack usage throughout the compiler Rebased the validate patch on 1.9.2 -- I don't have a working 3.6 build environment right now, and it's 11pm, so I can't test it other than the-involved-files-still-compile. The other patch applies cleanly to 1.9.2
Pushed the "Validate stack usage throughout the compiler" patch as https://hg.mozilla.org/mozilla-central/rev/dba0b31edfbf
I applied the 1.9.2 patch and it does convert the crash from this exploitable one to the ABORT mentioned in comment 32 -- in a debug build. but NS_ABORT_IF_FALSE is debug-only so I'm not sure this fixed anything. About to test the LRE patch.
Created attachment 592444 [details] [diff] [review] Abort in release builds too
Comment on attachment 592444 [details] [diff] [review] Abort in release builds too r=dbaron (trusting Jonas that the *intent* was to abort in release builds)
Checked in the "Abort in release builds" patch. https://hg.mozilla.org/mozilla-central/rev/79d0b6168a53
I don't think the LRE patch made any difference. Either I crash on the abort added in the first patch, or if I comment that out I get basically the same stack as the one Nicolas got on Ubuntu with 3.6.24. The only difference is mine ends at txStylesheetCompiler::endElement (txStylesheetCompiler.cpp:379), his goes one more frame to txStylesheetCompiler::flushCharacters. If you convert the NS_ABORT_IF_FALSE to NS_RUNTIMEABORT as in Jonas's patch then that's where we die, only under control. This is throwing ourselves on the grenade to save our users. Should we leave this bug open until we get a fix for whatever is causing the stack to get unbalanced, clone this into a new bug for the now non-exploitable DoS (my preference), or just forget it as too rare to worry about?
Comment on attachment 592444 [details] [diff] [review] Abort in release builds too [Triage Comment] Approved for 188.8.131.52 (applies with fuzz), a=dveditz Approved for Aurora and Beta.
Comment on attachment 592367 [details] [diff] [review] [1.9.2] Validate stack usage throughout the compiler r=dveditz for the backport approved for 184.108.40.206, a=dveditz
Daniel - the crash in this bug should be fixed entirely by the LRE patch, whereas any other bugs that break the stack trigger the fall-on-the-grenade behaviour (that is, the abort should never be 'proper' behaviour). If the LRE patch on 3.6 is still reaching an abort case, it's possible there's a separate issue there, which we should spin off into a new bug. The abort patch should mean it is a DoS at worst.
Landed on 1.9.2 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/4e2214da4a82 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/45b10d6e1176 and GECKO19226_2012012406_RELBRANCH http://hg.mozilla.org/releases/mozilla-1.9.2/rev/2202afecc4e7 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/308e9289894d Marking status1.9.2 and status-firefox12 "fixed" to represent the security bug fixes that landed.
Thanks for fixing the patches and verifying that the security hole is now plugged up, Dan! John - would you mind putting together a list of areas (and strategies) that QA should focus on when testing for functional regressions?
(In reply to John Schoenick [:johns] from comment #43) > If the LRE patch on 3.6 is still reaching an abort case, Turns out I hand-applied it wrong ("just one line," I thought, "how hard can it be?"). Actually using 'patch' did work to stop the crash on 3.6.
Comment on attachment 592357 [details] [diff] [review] Fix state corruption by directing LRE nodes through the handler [Triage Comment] Talked this over with akeybl and we're going to approve the LRE fix for all the branches, too.
Comment on attachment 592363 [details] [diff] [review] Validate stack usage throughout the compiler. r=sicking [Triage Comment] Approved for Aurora and Beta.
Checked in the LRE fix [attachment 592357 [details] [diff] [review]] on 1.9.2 and the 220.127.116.11 relbranch http://hg.mozilla.org/releases/mozilla-1.9.2/rev/177e42a7014e http://hg.mozilla.org/releases/mozilla-1.9.2/rev/18ac94cc350c
And the LRE fix checked into m-c https://hg.mozilla.org/mozilla-central/rev/aee879a3190a both m-c and 1.9.2 now have three patches applied and should be done.
http://hg.mozilla.org/releases/mozilla-aurora/rev/4a961cfc4408 http://hg.mozilla.org/releases/mozilla-aurora/rev/1c2291b526cb http://hg.mozilla.org/releases/mozilla-aurora/rev/95dd0a69c672 http://hg.mozilla.org/releases/mozilla-beta/rev/97e6aa5f0523 http://hg.mozilla.org/releases/mozilla-beta/rev/26ec9f4367b2 http://hg.mozilla.org/releases/mozilla-beta/rev/c5898e7a8d36
(In reply to Alex Keybl [:akeybl] from comment #45) > Thanks for fixing the patches and verifying that the security hole is now > plugged up, Dan! > > John - would you mind putting together a list of areas (and strategies) that > QA should focus on when testing for functional regressions? The crash does not occur in 100% of cases, but I've been able to reproduce it nearly 100% via 1) Save the crash .svg file locally 2) Visit the local file in firefox on a fresh profile Make sure you can reproduce the crash reliably on a known-affected build - the builds with all three patches should merely print a parse error message. For regressions - This code is 100% in XSLT parsing, so any regressions would be in XSLT documents. It specifically affects embedded stylesheets inside of XSLT documents - something which is extremely rare. I can't even find a page using them as an example, and a regression there would probably not be particularly severe. XSLT in general is actually pretty rare - you'll mostly find it on tutorial sites explaining how to use it, e.g. http://www.w3schools.com/xsl/xsl_examples.asp The most likely regressions would be: 1 - Valid XSLT documents with embedded stylesheets fail to render, instead giving a parse error. 2 - XSLT documents of any kind cause the browser to immediately close with an ABORT message on console. Such a crash, if found, would not be a security risk -- It would indicate our patches had exposed another bug in the XSLT compiler.
(In reply to John Schoenick [:johns] from comment #52) > an ABORT message on console. Such a crash, if found, would not be a security > risk -- It would indicate our patches had exposed another bug in the XSLT > compiler. Such a crash would signal the presence of a previously unknown lurking vulnerability mitigated by this patch, in other words also a success.
Sorry if the accidental flag unsetting alarmed folks. I was trying to agree with John that regressions like #2 would not be a security problem but in fact would be the opposite: symptoms of a security problem that has been de-fanged.
Verified on latest nightly, aurora, and 10RC on Win7, Mac OS X, I tried the test case on previous builds and the application crashed, then I tried on the latest, fixed, builds, and I got an XSLT parsing error message. I tried several of the examples in comment #52 aftewards, and I saw no parsing problems, and all of the XSLT edits applied as expected on the sample XML files.
And linux, ubuntu.
Verified fixed for Firefox 3.6.26. As with Juan's testing in comment 55, I used the attached testcase and some of the examples in comment 52. I confirmed the behaviour in Firefox 3.6.25 and verified that it is indeed fixed in Firefox 3.6.26. Tested Windows 7, Mac OS X 10.6.8, and Ubuntu 11.10.
Sorry for the bug spam. I think this can be called VERIFIED FIXED based on recent comments.
Verified fixed with Firefox 10.0.6esrpre 2012-06-21.