Closed Bug 702466 (CVE-2012-0449) Opened 13 years ago Closed 12 years ago

Crash with EIP=0x00000000 when parsing SVG+XSLT

Categories

(Core :: XSLT, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED
Tracking Status
firefox9 --- wontfix
firefox10 + fixed
firefox11 + verified
firefox12 + verified
firefox-esr10 10+ verified
blocking1.9.2 --- .26+
status1.9.2 --- .26-fixed

People

(Reporter: nicolas.gregoire, Assigned: johns)

References

Details

(Keywords: verified-aurora, verified-beta, verified1.9.2, Whiteboard: [sg:critical][see comment 6 - comment 9 for what's busted][qa!])

Crash Data

Attachments

(7 files, 5 obsolete files)

233 bytes, image/svg+xml
Details
20.16 KB, text/plain
Details
79.54 KB, image/png
Details
1.10 KB, patch
sicking
: review+
Details | Diff | Splinter Review
14.36 KB, patch
Details | Diff | Splinter Review
8.96 KB, patch
dveditz
: review+
Details | Diff | Splinter Review
1.32 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
When opening a specific SVG file (created by a mutation fuzzer), the browser will crash with EIP=0x00000000. This was tested on Firefox 3.6.24 / Ubuntu 10.04 LTS and Firefox 8.0 / WinXP SP3.

Under Linux : a call to *(%eax) at  _ZN20txStylesheetCompiler15flushCharactersEv+26 will set EIP to 0x000000. An annotated gdb session is attached. A similar behavior is constated on Firefox 8.0 (a screenshot of windbg !exploitable is attached too).
Attached image WinGDB !exploitable screenshot —
I'm not seeing a crash on current trunk (on Mac).  Just an XML parser error from the failed XSLT load....
Component: SVG → XSLT
QA Contact: general → xslt
I initially just saw an XML parser error, too, but then I was able to crash.
bp-890b1eac-dd1e-4bd9-b3d8-2660e2111114

With a locally-saved version of the testcase, I was able to crash on the first try w/ a fresh profile, in my nightly as well as in my debug build.
Status: UNCONFIRMED → NEW
Ever confirmed: true
We crash here:
> 494         rv = (mHandlerTable->mTextHandler)(mCharacters, *this);
http://hg.mozilla.org/mozilla-central/annotate/b51eb4007926/content/xslt/src/xslt/txStylesheetCompiler.cpp#l494

...because mHandlerTable->mTextHandler is a null pointer.
Hardware: x86 → All
Version: 8 Branch → Trunk
OK, this looks kinda bad...

We get the faulty mHandlerTable (mentioned in comment 5) from a call to "popHandlerTable", which pops a raw pointer off of "mOtherStack" (a stack of void*'s) and casts this raw pointer to a txHandlerTable*.

However, the raw pointer is *not* really a txHandlerTable* -- it's in fact a txElementHandler*, which was pushed onto the stack here:
http://hg.mozilla.org/mozilla-central/annotate/eb84780783ed/content/xslt/src/xslt/txStylesheetCompiler.cpp#l345

So we're effectively casting a txElementHandler* to a txHandlerTable*, and then trying to work with the txHandlerTable* pointer, and crashing.
Note that we never actually get a call to "pushHandlerTable" on this testcase (though we do get popHandlerTable as noted above.  I don't know our XSLT code very well, but I'd expect that we'd intend for our pushHandlerTable/popHandlerTable calls to be balanced...
So the evil popHandlerTable() call comes from txFnEndEmbed(), which calls txFnEndStylesheet(), which calls popHandlerTable().

Balancing this, we have an earlier call to txFnStartEmbed(), which in an ideal world would call txFnStartStylesheet(), which would call pushHandlerTable().

(this is all nicely balanced, with just "start" vs "end" and "push" vs "pop")

However, with this bug's testcase, txFnStartEmbed takes an early-return, because 'aState.handleEmbeddedSheet()' fails.  So basically, we're in a sort of error state, and we bail out early & never get to the pushHandlerTable() call.

So, it looks like we need to detect this same error state when we're cleaning up, so we can skip the unbalanced popHandlerTable call there.
Ah, so we actually do have the same "handleEmbeddedSheet()" check in txFnEndEmbed() -- but we pass that check there, because the state has changed between txFnStartEmbed and txFnEndEmbed.

In particular, txStylesheetCompiler::startElementInternal() sets mEmbedStatus = eInEmbed...
http://hg.mozilla.org/mozilla-central/annotate/eb84780783ed/content/xslt/src/xslt/txStylesheetCompiler.cpp#l317
...which will make "handleEmbeddedSheet()" pass, when we get to txFnEndEmbed().
Beyond the above, I don't really understand what's going on here or what the right fix is, since this XSLT code is pretty complex and not-really-touched-for-a-long-time.

I'm hoping that perhaps sicking knows what to do (or can suggest what to do)... Hopefully the info I've posted above will help him or someone else familiar with this code to figure out what needs fixing. :)
Whiteboard: [see comment 6 - comment 9 for what's busted]
Is it possible to get a non-null txElementHandler* ? This is a very small testcase so it's hard to tell if it's null because that's the only bogus value possible or just due to the testcase.
Crash Signature: [@ @0x0 | txStylesheetCompiler::flushCharacters ]
John will be looking into this bug, re-assigning.
Assignee: nobody → jschoenick
Sicking agrees this is probably exploitable -> sg:critical
Whiteboard: [see comment 6 - comment 9 for what's busted] → [sg:critical][see comment 6 - comment 9 for what's busted]
Unrealistic to fix for fx10 with the release next week (builds already in QA hands I think) but we need to fix by Fx11. Discovered independently by two researchers on the same weekend, we should assume it's fairly easy to fine should anyone else also be poking in this area.
I want to understand something here:  Are we planning to ship with this bug unfixed because of the train schedule?
This affects 1.9.2.26 as well. We should respin to take this if we're going to take it in the other simultaneous releases.
I'm not sure I like this, but it works. Right now it just calls NS_ABORT_IF_FALSE, but it should ideally gracefully fail on the xslt compile. The string typenames should probably be swapped with an enum or somesuch.

I had a nicer version that used macros to avoid having any additional arguments to pushPtr, but I gave up on finding a simple way to tokenize the typenames in a macro.
Comment on attachment 592312 [details] [diff] [review]
WIP - Add type checking to pushPtr/popPtr in XSLT code

Yeah, something like an enum is needed here. These string operations will be too slow.
The issue as I understand it is this:
1) mEmbedStatus is eNeedEmbed, as the stylesheet declaration referenced an embedded stylesheet
2) txFnStartEmbed handler is matched for a parent element, but because eInEmbed was not set it leaves without touching it.
3) An element with the ID matching said embedded stylesheet is found, and mEmbedStatus is set to eInEmbed
4) The element does not otherwise match an embedded stylesheet, so txFnStartEmbed is not one of the handlers that gets called. mEmbedStatus is still eInEmbed
5) The stack begins unwinding, calling endElement on all elements
6) Because mEmbedStatus is still eInEmbed, the exit handler for the embed handler on the stack from part 2 believes it is exiting from *it's* embed, and calls txFnEndStylesheet.
7) Mismatched stack.

This patch should fix this by adding a new state, eTryEmbed, which is set before calling the element handlers. If a startembed handler is reached, it calls startEmbedding() to move state into inEmbed, otherwise the startElement call moves state back to eNeedEmbed to attempt to match other elements (the XML document is invalid at this point, so we could also move to eNoEmbed).

... That is, assuming I fully understand what is going on here.
Attachment #592333 - Flags: feedback?(jonas)
(Attached wrong version of patch)
Attachment #592333 - Attachment is obsolete: true
Attachment #592333 - Flags: feedback?(jonas)
Attachment #592334 - Flags: feedback?(jonas)
Attachment #592334 - Attachment is obsolete: true
Attachment #592334 - Flags: feedback?(jonas)
After talking to Jonas, he believes the only way the handler is skipped is due to LREs. Because the handler checks namespace, directing LREs to the embed handler should also fix this issue. This patch updates the HandlerTable to point at Stop/EndEmbed regardless.
Attached patch Validate stack usage throughout the compiler (obsolete) — — Splinter Review
This validates the stack pushes/pops throughout the parser - which would catch other similar unforseen edge cases.
Attachment #592312 - Attachment is obsolete: true
Attachment #592350 - Attachment is obsolete: true
Attachment #592357 - Flags: review?(jonas)
Attachment #592358 - Flags: review?(jonas)
Comment on attachment 592357 [details] [diff] [review]
Fix state corruption by directing LRE nodes through the handler

Review of attachment 592357 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, this should work fine. I'm pretty sure this was how I intended it to be written originally.
Attachment #592357 - Flags: review?(jonas) → review+
Comment on attachment 592358 [details] [diff] [review]
Validate stack usage throughout the compiler

Review of attachment 592358 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #592358 - Flags: review?(jonas) → review+
Either of these patches seem very safe. Safe enough that I think we should take them in FF10.

The variable-stack patch is probably slightly safer in that it won't affect any behavior other than when things have gone really really bad and we'd likely crash (exploitably) anyway.

The LRE patch will only adjust behavior in very rare conditions, even for XSLT usage. It only affects embedded XSLT stylesheets which I don't think I've never seen in the wild, and it only does so with erroneous such stylesheets. In fact, we've likely always crashed for such stylesheets so it's unlikely going to affect anyone in a negative way.
(In reply to Jonas Sicking (:sicking) from comment #30)
> Either of these patches seem very safe. Safe enough that I think we should
> take them in FF10.

Are both of these patches together necessary to plug the exploit hole? It's not clear if they're both targeting the same problem or serving different purposes. If it's the same problem, I'd prefer to take less code change.

We should also check to make sure that these patches apply to the 1.9.2 branch (marked as affected).
Either patch by itself will fix the security hole, but both should eventually be landed.

The "Validate stack usage" patch adds more safety to the XSLT compiler by type checking all elements pushed/pop'd from it's stack. Thus, stack desync in the compiler (such as caused by this bug) just cause an un-exploitable browser abort:

> ###!!! ABORT: Expected type does not match top element type on stack: 'type == aType', file /home/johns/moz/mozilla-central/content/xslt/src/xslt/txStylesheetCompiler.cpp, line 727

The other patch fixes this specific bug, but doesn't address the lack of safety for the stack, meaning any undiscovered bugs in the compiler could easily become exploitable.

I'll ensure these apply to 1.9.2 and attach an alternate version if necessary
Fix commit message
Attachment #592358 - Attachment is obsolete: true
Rebased the validate patch on 1.9.2 -- I don't have a working 3.6 build environment right now, and it's 11pm, so I can't test it other than the-involved-files-still-compile.

The other patch applies cleanly to 1.9.2
Pushed the "Validate stack usage throughout the compiler" patch as https://hg.mozilla.org/mozilla-central/rev/dba0b31edfbf
Attachment #574456 - Attachment description: Minimized SVG file → Minimized SVG file (load from local file to test)
I applied the 1.9.2 patch and it does convert the crash from this exploitable one to the ABORT mentioned in comment 32 -- in a debug build. but NS_ABORT_IF_FALSE is debug-only so I'm not sure this fixed anything. About to test the LRE patch.
Comment on attachment 592444 [details] [diff] [review]
Abort in release builds too

r=dbaron (trusting Jonas that the *intent* was to abort in release builds)
Attachment #592444 - Flags: review?(dbaron) → review+
I don't think the LRE patch made any difference. Either I crash on the abort added in the first patch, or if I comment that out I get basically the same stack as the one Nicolas got on Ubuntu with 3.6.24. The only difference is mine ends at txStylesheetCompiler::endElement (txStylesheetCompiler.cpp:379), his goes one more frame to txStylesheetCompiler::flushCharacters.

If you convert the NS_ABORT_IF_FALSE to NS_RUNTIMEABORT as in Jonas's patch then that's where we die, only under control.

This is throwing ourselves on the grenade to save our users. Should we leave this bug open until we get a fix for whatever is causing the stack to get unbalanced, clone this into a new bug for the now non-exploitable DoS (my preference), or just forget it as too rare to worry about?
Comment on attachment 592444 [details] [diff] [review]
Abort in release builds too

[Triage Comment]
Approved for 1.9.2.26 (applies with fuzz), a=dveditz
Approved for Aurora and Beta.
Attachment #592444 - Flags: approval1.9.2.26+
Attachment #592444 - Flags: approval-mozilla-beta+
Attachment #592444 - Flags: approval-mozilla-aurora+
Comment on attachment 592367 [details] [diff] [review]
[1.9.2] Validate stack usage throughout the compiler

r=dveditz for the backport
approved for 1.9.2.26, a=dveditz
Attachment #592367 - Flags: review+
Attachment #592367 - Flags: approval1.9.2.26+
Daniel - the crash in this bug should be fixed entirely by the LRE patch, whereas any other bugs that break the stack trigger the fall-on-the-grenade behaviour (that is, the abort should never be 'proper' behaviour).

If the LRE patch on 3.6 is still reaching an abort case, it's possible there's a separate issue there, which we should spin off into a new bug. The abort patch should mean it is a DoS at worst.
Thanks for fixing the patches and verifying that the security hole is now plugged up, Dan!

John - would you mind putting together a list of areas (and strategies) that QA should focus on when testing for functional regressions?
(In reply to John Schoenick [:johns] from comment #43)
> If the LRE patch on 3.6 is still reaching an abort case,

Turns out I hand-applied it wrong ("just one line," I thought, "how hard can it be?"). Actually using 'patch' did work to stop the crash on 3.6.
Comment on attachment 592357 [details] [diff] [review]
Fix state corruption by directing LRE nodes through the handler

[Triage Comment]
Talked this over with akeybl and we're going to approve the LRE fix for all the branches, too.
Attachment #592357 - Flags: approval1.9.2.26+
Attachment #592357 - Flags: approval-mozilla-beta+
Attachment #592357 - Flags: approval-mozilla-aurora+
Comment on attachment 592363 [details] [diff] [review]
Validate stack usage throughout the compiler. r=sicking

[Triage Comment]
Approved for Aurora and Beta.
Attachment #592363 - Flags: approval-mozilla-beta+
Attachment #592363 - Flags: approval-mozilla-aurora+
And the LRE fix checked into m-c
https://hg.mozilla.org/mozilla-central/rev/aee879a3190a

both m-c and 1.9.2 now have three patches applied and should be done.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Alias: CVE-2012-0449
(In reply to Alex Keybl [:akeybl] from comment #45)
> Thanks for fixing the patches and verifying that the security hole is now
> plugged up, Dan!
> 
> John - would you mind putting together a list of areas (and strategies) that
> QA should focus on when testing for functional regressions?

The crash does not occur in 100% of cases, but I've been able to reproduce it nearly 100% via
1) Save the crash .svg file locally
2) Visit the local file in firefox on a fresh profile

Make sure you can reproduce the crash reliably on a known-affected build - the builds with all three patches should merely print a parse error message.

For regressions - This code is 100% in XSLT parsing, so any regressions would be in XSLT documents. It specifically affects embedded stylesheets inside of XSLT documents - something which is extremely rare. I can't even find a page using them as an example, and a regression there would probably not be particularly severe.

XSLT in general is actually pretty rare - you'll mostly find it on tutorial sites explaining how to use it, e.g. http://www.w3schools.com/xsl/xsl_examples.asp

The most likely regressions would be:
1 - Valid XSLT documents with embedded stylesheets fail to render, instead giving a parse error.
2 - XSLT documents of any kind cause the browser to immediately close with an ABORT message on console. Such a crash, if found, would not be a security risk -- It would indicate our patches had exposed another bug in the XSLT compiler.
(In reply to John Schoenick [:johns] from comment #52)
> an ABORT message on console. Such a crash, if found, would not be a security
> risk -- It would indicate our patches had exposed another bug in the XSLT
> compiler.

Such a crash would signal the presence of a previously unknown lurking vulnerability mitigated by this patch, in other words also a success.
Sorry if the accidental flag unsetting alarmed folks. I was trying to agree with John that regressions like #2 would not be a security problem but in fact would be the opposite: symptoms of a security problem that has been de-fanged.
Verified on latest nightly, aurora, and 10RC on Win7, Mac OS X,  I tried the test case on previous builds and the application crashed, then I tried on the latest, fixed, builds, and I got an XSLT parsing error message. 

I tried several of the examples in comment #52 aftewards, and I saw no parsing problems, and all of the XSLT edits applied as expected on the sample XML files.
And linux, ubuntu.
Verified fixed for Firefox 3.6.26. As with Juan's testing in comment 55, I used the attached testcase and some of the examples in comment 52. I confirmed the behaviour in Firefox 3.6.25 and verified that it is indeed fixed in Firefox 3.6.26. Tested Windows 7, Mac OS X 10.6.8, and Ubuntu 11.10.
Keywords: verified1.9.2
Whiteboard: [sg:critical][see comment 6 - comment 9 for what's busted] → [sg:critical][see comment 6 - comment 9 for what's busted][qa!]
Sorry for the bug spam. I think this can be called VERIFIED FIXED based on recent comments.
Status: RESOLVED → VERIFIED
Group: core-security
Verified fixed with Firefox 10.0.6esrpre 2012-06-21.
rforbes-bugspam-for-setting-that-bounty-flag-20130719
Flags: sec-bounty+
You need to log in before you can comment on or make changes to this bug.