Closed Bug 607222 (CVE-2010-3765) Opened 14 years ago Closed 14 years ago

Interleaving document.write and appendChild can lead to duplicate text frames and overrunning of text run buffers

Categories

(Core :: DOM: Core & HTML, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla2.0b7
Tracking Status
blocking2.0 --- beta7+
blocking1.9.2 --- .12+
status1.9.2 --- .12-fixed
blocking1.9.1 --- .15+
status1.9.1 --- .15-fixed

People

(Reporter: reed, Assigned: smaug)

References

Details

(4 keywords, Whiteboard: [sg:critical] 0day)

Attachments

(6 files, 2 obsolete files)

Morten Kråkvik of Telenor SOC reported an in-the-wild 0day affecting Firefox 3.6.x on Windows to security@:

=========================================================================

A compromised site is currently redirecting visitors to
hxxp://l-3com.dyndns-work.com/admissions/admin.php, which contains
exploits directed at Firefox users on Windows.

The exploit is confirmed successful on Windows XP SP3 + Firefox
3.6.11.
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
blocking2.0: --- → ?
status1.9.1: --- → ?
status1.9.2: --- → ?
status2.0: --- → ?
Attached file stack
stack and variables from winxp 20101024 1.9.2 build on winxp with 3.6.11 ua string.
Forgot to mention. The PoC works/crashes if you change your Firefox version to 3.6.11. No need to go to the exploit site.
Here's one crash I get with a minimized testcase (on mac, 3.6.12pre)
bp-957a753d-474c-4907-b0b4-1d41a2101025
scvhost.exe : Not detected by Sandbox (Signature: NO_VIRUS)


 [ DetectionInfo ]
    * Filename: C:\analyzer\scan\scvhost.exe.
    * Sandbox name: NO_MALWARE
    * Signature name: NO_VIRUS.
    * Compressed: NO.
    * TLS hooks: NO.
    * Executable type: Application.
    * Executable file structure: OK.
    * Filetype: PE_I386.

 [ General information ]
    * File length:        48640 bytes.
    * MD5 hash: e8ead7641f68822c8fbfe53ad7f1bf52.
    * SHA1 hash: 244860d5c40d8d13c16fa8bba133c7608a09a276.

 [ Changes to filesystem ]
    * Creates file C:\WINDOWS\temp\symantec.exe.

 [ Changes to registry ]
    * Creates value "Microsoft Windows Update"="C:\WINDOWS\temp\symantec.exe" in key "HKCU\Software\Microsoft\Windows\CurrentVersion\Run".
    * Creates value "Microsoft Windows Update"="C:\WINDOWS\temp\symantec.exe" in key "HKLM\Software\Microsoft\Windows\CurrentVersion\Run".

 [ Network services ]
    * Connects to "update.microsoft.com" on port 80 (TCP).
    * Connects to "l-3com.dyndns-work.com" on port 443 (TCP).

 [ Process/window information ]
    * Creates process "reg.EXE".
    * Will automatically restart after boot (I'll be back...).
    * Creates process "cmd.exe".

 [ Signature Scanning ]
    * C:\WINDOWS\temp\symantec.exe (48640 bytes) : no signature detection.
Blocks: 607228
bp-b1f1a539-9eec-4c62-a204-cee8f2101025
bp-66776ae0-e584-47f6-85c9-8e4292101025

two more crashes, the second one seems kind of odd.

I don't think the malware is that interesting to us, the fact that it gets to run at all is a fail. We should send it on to A-V companies, of course -- they'll want to see it.
Blocks: 607240
###!!! ASSERTION: Invalid offset: 'aOffset <= mSkipChars->mCharCount', file ../../../../gfx/thebes/src/gfxSkipChars.cpp, line 92
###!!! ASSERTION: Overflow: 'aStart + aLength <= mCharacterCount', file ../../../../gfx/thebes/src/gfxFont.cpp, line 1890
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901
WARNING: Break suggested inside cluster!: file ../../../../gfx/thebes/src/gfxFont.cpp, line 1901

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a56d50 in nsFontCache::GetMetricsFor (this=0x7fffd4b3d130, aFont=..., aLangGroup=0x7fffe0de7248, 
    aUserFontSet=0x0, aMetrics=@0x7fffffff7cc0) at ../../../../gfx/src/thebes/nsThebesDeviceContext.cpp:163
163	                tfm->GetThebesFontGroup()->UpdateFontList();
(gdb) bt
Assuming this will block branch releases. Doesn't seem to crash Firefox 4/trunk.
blocking1.9.1: ? → .15+
blocking1.9.2: ? → .12+
blocking2.0: ? → ---
Affects all OSes on both Firefox 3.6.x and Firefox 3.5.x.
blocking1.9.1: .16+ → ?
blocking1.9.2: .13+ → ?
OS: Windows XP → All
Priority: -- → P1
Hardware: x86 → All
Version: 1.9.2 Branch → unspecified
blocking1.9.1: ? → .15+
blocking1.9.2: ? → .12+
blocking1.9.1: .15+ → ?
blocking1.9.2: .12+ → ?
blocking2.0: --- → ?
OS: All → Windows XP
Priority: P1 → --
Hardware: All → x86
Version: unspecified → 1.9.2 Branch
blocking2.0: ? → ---
OS: Windows XP → All
Priority: -- → P1
Hardware: x86 → All
Version: 1.9.2 Branch → unspecified
blocking1.9.1: ? → .15+
blocking1.9.2: ? → .12+
Google's safe-browsing service now blocks the malware site.
Submitted to Panda Security, McAfee, Microsoft, Symantec, ClamAV. Trend Micro's upload form throws an error.
(And Google obviously)
Attached file valgrind log
OK, I've debugged Jesse's testcase to understanding the last step of the problem:  we're crashing because we've double-constructed frames for the "abarBAR" text (originally "a"), so when we notify the primary frame about the character data append in nsCSSFrameConstructor::CharacterDataChanged (when appending "barBAR"), we only notify the first of the two.  Then we crash when dealing with the second one, since its text run has an incorrect cached length (of 1 instead of 7), and we overrun data.  (At least that explains the first assertion.)
First ContentAppended is here:

#0  nsCSSFrameConstructor::ContentAppended (this=0x14d8e30, 
    aContainer=0x14dc0f0, aNewIndexInContainer=7)
    at /home/dbaron/builds/1.9.2/mozilla/layout/base/nsCSSFrameConstructor.cpp:6250
#1  0x00007feaee9d3656 in PresShell::ContentAppended (this=0x18ec780, 
    aDocument=0x18d2fc0, aContainer=0x14dc0f0, aNewIndexInContainer=7)
    at /home/dbaron/builds/1.9.2/mozilla/layout/base/nsPresShell.cpp:5044
#2  0x00007feaeecf1303 in nsNodeUtils::ContentAppended (aContainer=0x14dc0f0, 
    aNewIndexInContainer=7)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsNodeUtils.cpp:137
#3  0x00007feaeec5486f in nsContentSink::NotifyAppend (this=0x14d1620, 
    aContainer=0x14dc0f0, aStartIndex=7)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsContentSink.cpp:1380
#4  0x00007feaeee2933a in SinkContext::FlushTags (this=0x14f0810)
    at /home/dbaron/builds/1.9.2/mozilla/content/html/document/src/nsHTMLContentSink.cpp:1379
#5  0x00007feaeee2f8ed in HTMLContentSink::FlushTags (this=0x14d1620)
    at /home/dbaron/builds/1.9.2/mozilla/content/html/document/src/nsHTMLContentSink.cpp:3199
#6  0x00007feaeec551d2 in nsContentSink::BeginUpdate (this=0x14d1620, 
    aDocument=0x18d2fc0, aUpdateType=1)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsContentSink.cpp:1625
#7  0x00007feaeec9b93a in nsDocument::BeginUpdate (this=0x18d2fc0, 
    aUpdateType=1)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsDocument.cpp:3917
#8  0x00007feaeea464e9 in mozAutoDocUpdate::mozAutoDocUpdate (
    this=0x7fff3e43f370, aDocument=0x18d2fc0, aUpdateType=1, aNotify=1)
    at /home/dbaron/builds/1.9.2/mozilla/layout/generic/../../content/base/src/mozAutoDocUpdate.h:53
#9  0x00007feaeecd673f in nsGenericElement::doInsertChildAt (aKid=0x193df50, 
    aIndex=7, aNotify=1, aParent=0x14dc0f0, aDocument=0x18d2fc0, 
    aChildArray=...)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3219
#10 0x00007feaeecd620e in nsGenericElement::InsertChildAt (this=0x14dc0f0, 
    aKid=0x193df50, aIndex=7, aNotify=1)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3169
#11 0x00007feaeecd9110 in nsGenericElement::doReplaceOrInsertBefore (
    aReplace=0, aNewChild=0x193df98, aRefChild=0x0, aParent=0x14dc0f0, 
    aDocument=0x18d2fc0, aReturn=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3953
#12 0x00007feaeecd75d1 in nsGenericElement::InsertBefore (this=0x14dc0f0, 
    aNewChild=0x193df98, aRefChild=0x0, aReturn=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3488
#13 0x00007feaeedb0df8 in nsHTMLBodyElement::InsertBefore (this=0x14dc0f0, 
    newChild=0x193df98, refChild=0x0, _retval=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/html/content/src/nsHTMLBodyElement.cpp:95
---Type <return> to continue, or q <return> to quit---
#14 0x00007feaeecbf528 in nsGenericElement::AppendChild (this=0x14dc0f0, 
    aNewChild=0x193df98, aReturn=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.h:500
#15 0x00007feaeedb0ea7 in nsHTMLBodyElement::AppendChild (this=0x14dc0f0, 
    newChild=0x193df98, _retval=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/html/content/src/nsHTMLBodyElement.cpp:95
#16 0x00007feaee6a76ad in nsIDOMNode_AppendChild (cx=0x17b24e0, argc=1, 
    vp=0x17cea28) at dom_quickstubs.cpp:2617


and the second is here:

#0  nsCSSFrameConstructor::ContentAppended (this=0x14d8e30, 
    aContainer=0x14dc0f0, aNewIndexInContainer=7)
    at /home/dbaron/builds/1.9.2/mozilla/layout/base/nsCSSFrameConstructor.cpp:6250
#1  0x00007feaee9d3656 in PresShell::ContentAppended (this=0x18ec780, 
    aDocument=0x18d2fc0, aContainer=0x14dc0f0, aNewIndexInContainer=7)
    at /home/dbaron/builds/1.9.2/mozilla/layout/base/nsPresShell.cpp:5044
#2  0x00007feaeecf1303 in nsNodeUtils::ContentAppended (aContainer=0x14dc0f0, 
    aNewIndexInContainer=7)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsNodeUtils.cpp:137
#3  0x00007feaeecd68de in nsGenericElement::doInsertChildAt (aKid=0x193df50, 
    aIndex=7, aNotify=1, aParent=0x14dc0f0, aDocument=0x18d2fc0, 
    aChildArray=...)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3238
#4  0x00007feaeecd620e in nsGenericElement::InsertChildAt (this=0x14dc0f0, 
    aKid=0x193df50, aIndex=7, aNotify=1)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3169
#5  0x00007feaeecd9110 in nsGenericElement::doReplaceOrInsertBefore (
    aReplace=0, aNewChild=0x193df98, aRefChild=0x0, aParent=0x14dc0f0, 
    aDocument=0x18d2fc0, aReturn=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3953
#6  0x00007feaeecd75d1 in nsGenericElement::InsertBefore (this=0x14dc0f0, 
    aNewChild=0x193df98, aRefChild=0x0, aReturn=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.cpp:3488
#7  0x00007feaeedb0df8 in nsHTMLBodyElement::InsertBefore (this=0x14dc0f0, 
    newChild=0x193df98, refChild=0x0, _retval=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/html/content/src/nsHTMLBodyElement.cpp:95
#8  0x00007feaeecbf528 in nsGenericElement::AppendChild (this=0x14dc0f0, 
    aNewChild=0x193df98, aReturn=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/base/src/nsGenericElement.h:500
#9  0x00007feaeedb0ea7 in nsHTMLBodyElement::AppendChild (this=0x14dc0f0, 
    newChild=0x193df98, _retval=0x7fff3e43f780)
    at /home/dbaron/builds/1.9.2/mozilla/content/html/content/src/nsHTMLBodyElement.cpp:95
#10 0x00007feaee6a76ad in nsIDOMNode_AppendChild (cx=0x17b24e0, argc=1, 
    vp=0x17cea28) at dom_quickstubs.cpp:2617
And the "a" text and the <base> element also look like they're out-of-order in the document tree.
I tend to think nsGenericElement::doReplaceOrInsertBefore needs to flush the content sink *before* determining the index (insPos) to use for the insertion (when aRefChild is null).
(In reply to comment #20)
> I tend to think nsGenericElement::doReplaceOrInsertBefore needs to flush the
> content sink *before* determining the index (insPos) to use for the insertion
> (when aRefChild is null).

I should have said nsGenericElement::doReplaceOrInsertBefore or one of its callers needs ...
Inserting:
  mozAutoDocUpdate updateBatch(aDocument, UPDATE_CONTENT_MODEL, PR_TRUE);
right before the if (aRefChild) about 20 lines into that function does in fact fix the assertions/crash with both attachment 486011 [details] and attachment 485985 [details].

But that's probably not quite the right place, and it sort of looks to me like there might be invariants about other stuff that needs to be done *before* doing that (although maybe the calls to nsMutationGuard::DidMutate aren't really an invariant).
Would that kind of updateBatch regress Bug 423269?
mozAutoDocUpdate updateBatch(aDocument, UPDATE_CONTENT_MODEL, PR_TRUE)
seems to cause a hang here.
Or it does fixes the assertions I see with Jesse's testcase (which doesn't crash here), but https://bugzilla.mozilla.org/attachment.cgi?id=485985 hangs.
Ah, the hang is probably because of the long loop.
I don't understand 
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericElement.cpp?mark=4086-4086#4083
Why it is ok to not notify other cases? Jonas, I think you added
that code long ago in Bug 325730.

If the parameter is always true, I don't get crashes nor assertions using
the testcases in this bug.
(In reply to comment #22)
> Inserting:
>   mozAutoDocUpdate updateBatch(aDocument, UPDATE_CONTENT_MODEL, PR_TRUE);
> right before the if (aRefChild) about 20 lines into that function does in fact
> fix the assertions/crash with both attachment 486011 [details] and attachment 485985 [details].

Flushing the document right before returning from nsHTMLDocument::WriteCommon
fixes the crashes too, but I wonder if data coming from the necko to
parser could trigger this too.
Ah, so nsGenericElement::doInsertChildAt does have mozAutoDocUpdate.
So yes, I agree with dbaron.
We should have the update earlier, so that it flushes before we call
container->IndexOf or container->GetChildCount() .
Attached patch WIPSplinter Review
This is almost what dbaron suggested in comment 22.
But I don't quite like this, since this may cause a small performance regression 
because of double batching.
Another option is to check mutation guards around the update batch starts (both of them) and reget insPos as needed like we do in the aReplace case when RemoveChildAt triggers mutations, right?
That might be a faster solution. Less extra virtual calls.
So we *should* not need to change the second argument to the auto-batch-guard. We definitely should move it to above getting the insPos, but we should be able to only do it when doing a replace or inserting a fragment. Which means that such a fix would not affect the testcase as it doesn't use replace or fragment.

The reason that should be enough is that every time we flush we should be expecting random mutations to the tree. So I think the fix is figuring out why that part isn't working currently. I.e. why doesn't the current mutation guards catch the mutation here and update all indexes as needed.
This has been made public @ http://www.norman.com/security_center/virus_description_archive/129146/. The press is picking it up FYI.
I should be clearer...

I have not seen anyone talk in specifics about this bug, only that it exists. Researchers have also got their hands on the malware so it is safe to assume they (and bad guys) got the page that was serving the 0day as well.
I don't think we should be too picky about small performance regressions here, if it adds risk (complexity) or delay.  We have an in-the-wild exploit, and we can replace with a faster version in a subsequent update if we think it's critical to restore the missing perf.

(Not that it's clear exactly what the performance impact would be, and on what loads, but a couple of virtual calls per batch or similar sounds like something we should just eat.)
There is no mutation guard around the part where the html sink ends up adding stuff to the DOM.  That part being the BeginUpdate() call...
Einar Oftedal, Head of Malware Detection Team (MDT), Norman R&D had this to add about the vulnerability:

===============================================================================

So the vulnerability appears to some kind of race condition issue in the 
handling of DOM element (tags) properties. The exploit enumerates all 
properties of the tags AUDIO, A, and BASE and sets them to 'a' one by one 
in a loop that spins may times. Unfortunately I'm not too familar with the 
Firefox code base, but my theory is that at some point in time, Firefox 
uses an object in two separate places, but only has incremented the 
object's reference count once. Consequently, when either part of the code 
is done with the object, the reference count reaches zero and the object 
is freed. This triggers a use-after-free condition upon subsequent use of 
the object, such as seen below. In order for an attacker to exploit this 
issue, a heap spray would need to be created in order to fill the freed 
memory. In this case, it appears that a very big heap spray (>1GB) is 
needed because of the high address. Hence, low memory machines would 
probably fail at running the exploit (I also had some initial problems 
reproducing in a VM due to low memory). However, this could probably be 
addressed which leads me to think that whoever made this exploit did not 
put much effort into it. I also believe the vulnerability was found by 
fuzzing as the author seems to just have bruteforced his way through all 
the properties until some strange application behavior was triggered. Btw, 
I replaced the heap spray with my own (just long strings of 'A's) and was 
able to hit EIP 0x41414141 without much effort, hence it seems to be 
highly exploitable.

0:000> r  
eax=438a0ee0 ebx=00000000 ecx=038fdb80 edx=00000000 esi=0012ee78 
edi=00000000
eip=100ec811 esp=0012e8c8 ebp=0012ea74 iopl=0         nv up ei pl nz na po 
nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000 efl=00010202
xul!XPCWrappedNative::GetNewOrUsed+0x21:
100ec811 8b08            mov     ecx,dword ptr [eax] 
ds:0023:438a0ee0=????????

0:000> kb  
ChildEBP RetAddr  Args to Child 
0012ea74 100f77eb 0012ee78 038fdb80 02ab9c80 
xul!XPCWrappedNative::GetNewOrUsed+0x21 
[e:\builds\moz2_slave\win32_build\build\js\src\xpconnect\src\xpcwrappednative.cpp 
@ 327]
0012eb0c 100f6e58 0012ecf0 0012ebe0 00000000 
xul!XPCConvert::NativeInterface2JSObject+0x28b 
[e:\builds\moz2_slave\win32_build\build\js\src\xpconnect\src\xpcconvert.cpp 
@ 1199]
0012ebac 100efe12 0012ecf0 0012ebe0 0012ec60 
xul!XPCConvert::NativeData2JS+0x98 
[e:\builds\moz2_slave\win32_build\build\js\src\xpconnect\src\xpcconvert.cpp 
@ 471]
0012ee48 100fc072 0012ee78 00000001 00000000 
xul!XPCWrappedNative::CallMethod+0x5e2 
[e:\builds\moz2_slave\win32_build\build\js\src\xpconnect\src\xpcwrappednative.cpp 
@ 2810]
0012ef14 00335c4d 02cc4000 02e93640 00000000 xul!XPC_WN_GetterSetter+0x1b2 
[e:\builds\moz2_slave\win32_build\build\js\src\xpconnect\src\xpcwrappednativejsops.cpp 
@ 1784]
0012efc8 0032a533 02cc4000 00000000 03d3d0f0 js3250!js_Invoke+0x42d 
[e:\builds\moz2_slave\win32_build\build\js\src\jsinterp.cpp @ 1360]
0012effc 00337de0 02cc4000 02e93640 02e7a580 
js3250!js_InternalInvoke+0x103 
[e:\builds\moz2_slave\win32_build\build\js\src\jsinterp.cpp @ 1423]
0012f068 0033911a 02cc4000 02e93640 00000000 
js3250!js_GetPropertyHelper+0x310 
[e:\builds\moz2_slave\win32_build\build\js\src\jsobj.cpp @ 4275]
0012f2b4 003169c5 02cc4000 0012f364 01953090 js3250!js_Interpret+0x115a 
[e:\builds\moz2_slave\win32_build\build\js\src\jsops.cpp @ 1904]
0012f338 003041a1 02df8760 01953090 00000000 js3250!js_Execute+0x1a5 
[e:\builds\moz2_slave\win32_build\build\js\src\jsinterp.cpp @ 1601]
0012f364 100606f3 02cc4000 02df8760 01c6f7e4 
js3250!JS_EvaluateUCScriptForPrincipals+0x61 
[e:\builds\moz2_slave\win32_build\build\js\src\jsapi.cpp @ 5073]
0012f3d8 1018230e 0012f4ac 02df8760 01c6f7e0 
xul!nsJSContext::EvaluateString+0x15e 
[e:\builds\moz2_slave\win32_build\build\dom\base\nsjsenvironment.cpp @ 
1764]
0012f490 10014d80 02e992b0 0012f4ac 02e992b0 
xul!nsScriptLoader::EvaluateScript+0x177 
[e:\builds\moz2_slave\win32_build\build\content\base\src\nsscriptloader.cpp 
@ 711]
0012f544 1018f596 02e992b0 02e992b0 02beb5e4 
xul!nsScriptLoader::ProcessRequest+0x6f 
[e:\builds\moz2_slave\win32_build\build\content\base\src\nsscriptloader.cpp 
@ 625]
0012f8e4 100505ab 02beb5e4 02beb5e4 03d91c00 
xul!nsScriptLoader::ProcessScriptElement+0x2e6 
[e:\builds\moz2_slave\win32_build\build\content\base\src\nsscriptloader.cpp 
@ 577]
0012f8fc 10014b17 03d91cbc 03d91c00 02beb5c0 
xul!nsScriptElement::MaybeProcessScript+0x74 
[e:\builds\moz2_slave\win32_build\build\content\base\src\nsscriptelement.cpp 
@ 193]
0012f9a8 10014af7 1004227f 00000001 02e99370 
xul!nsHTMLScriptElement::MaybeProcessScript+0x1d 
[e:\builds\moz2_slave\win32_build\build\content\html\content\src\nshtmlscriptelement.cpp 
@ 565]
0012f9ac 1004227f 00000001 02e99370 00000000 
xul!nsHTMLScriptElement::DoneAddingChildren+0xc 
[e:\builds\moz2_slave\win32_build\build\content\html\content\src\nshtmlscriptelement.cpp 
@ 490]
0012f9c4 1009c49d 02beb5e4 00000000 03d56380 
xul!HTMLContentSink::ProcessSCRIPTEndTag+0x63 
[e:\builds\moz2_slave\win32_build\build\content\html\document\src\nshtmlcontentsink.cpp 
@ 3116]
0012f9e4 1009bce2 00000053 00000000 00000000 
xul!SinkContext::CloseContainer+0x11d 
[e:\builds\moz2_slave\win32_build\build\content\html\document\src\nshtmlcontentsink.cpp 
@ 1018]
> but a couple of virtual calls per batch or similar

The number depends on the number of observers; it ranges from a few to tens of thousands, depending on the page.  But yes, if I don't have a patch per comment 31 in the next 10 mins (which I hope I will), or if it's not safe, then we should just eat that cost.
OK, it's more complicated than I thought, because the BeginUpdate that triggers the flush is in nsGenericElement::doInsertChildAt, which just has an index to work with, not reference nodes and the like.

So I think the safe fix is probably Smaug's patch...
Indeed it is the inner batch that causes the mutation, since it currently is the only batch which is created.

Smaugs patch frontloads the mutations that the sink causes. But I'm worried that we can still get mutations when the inner (current) batch starts. Though I guess that is less likely to happen since we tend to only do work when entering and exiting the outer-most batch. Except for mutation events of course, but we tend to guard somewhat well against those these days.

Another fix would be to change nsGenericElement::doInsertChildAt to make it more stable in the face of mutations. The problem in this testcase is that we calculate isAppend before starting the batch. If we calculated it after starting the batch we'd simply insert the content before the last node. Technically wrong per specs, but shouldn't crash.
The best thing is probably to do both.
This should also fix the crash. Though probably worth taking both on branch. And definitely take this on m-c.
Note that on m-c this is not a problem in HTML because the HTML5 parser doesn't do weird stuff from BeginUpdate.  Is it a problem in XML?
Could it be problem with XHTML even in m-c if the code does something like?

// While page is being loaded...
element.appendChild(foobar);
initiateSyncXHR(); // Could parser modify DOM here?
element.appendChild(foobar2);
removing status2.0 'unaffected' due to comment 43. It may or may not be exploitable on trunk, let's just fix it.
blocking2.0: --- → ?
blocking2.0: ? → beta7+
status2.0: ? → ---
Attached patch a bit faster patch (obsolete) — Splinter Review
This shouldn't add new virtual nor new Addref/Release calls in the common case
where container is an element.
Attachment #486178 - Flags: review?(jonas)
Comment on attachment 486128 [details] [diff] [review]
Alternative/complimentary patch

I don't understand why ::DisMutate needs to be moved.

Other than that, r=me
Attachment #486128 - Flags: review+
Attachment #486178 - Flags: review?(bzbarsky)
Attached patch same for 191 if needed (obsolete) — Splinter Review
Any chance of review for this soon? I know jonas isn't around until 5...can someone else pick this up?
Alias: CVE-2010-3765
I don't grok which data structure has become corrupt here, but does this indicate that something that should have been covered by frame poisoning wasn't?  (in 3.6)  Or that the content tree could benefit from similar protection?
I don't think frame poisoning should have helped here; that said, I did file some followup bugs (bug 607478 and bug 607494) on other remediation that might.

Here's a basic description of what went wrong:

We have the following sequence of events (in Jesse's testcase):

  document.write("<a></a>a");
  document.body.appendChild(base_element);
  // get scrollTop to flush layout
  document.write("barBAR");

After the first of these document.write() calls, the a *element* has been created and nsIDocumentObserver/nsIMutationObservers have been notified.  However, the "a" is still buffered somewhere and a text node has not yet been created for it.

Inside the appendChild implementation, we share code (I guess that's the purpose) by converting the append to an appropriate insert, doing common stuff, and then later converting those insertions that are appends back to appends for notification purposes.

Before we handle the append child, we clearly need to flush anything buffered in the parser/content sink (Flush_Content).  The problem is that we do this too late, *after* converting the append to the concept "insertion of node at index 7" and "is an append".  Then we flush, creating a text node that ends up in the content tree at index 7 within its parent, and notifying the document observers that an append has occured.  The signature of ContentAppended says "content has been appended to this parent's child list, and the index of the first new child is 7".  This causes a text frame to be created.

Then we actually do the insertion.  We insert the base element at index 7, incorrectly before the "a" text.  Then we send another ContentAppended (since we'd already determined it was an append), with the index of 7.  This notification means we go create frames for the content nodes at index 7 (base) and 8 ("a").  base is display:none, but now we create a *second* frame for the "a" text.  This one doesn't end up being the primary frame.

Then there's a layout flush, which causes text frame reflow and text run creation.

Then we come to the third document.write.  Since the child list of the body ends with text, we handle this write by appending text to an existing text node.  document observers are notified (CharacterDataChanged).  The frame constructor handles this by getting the primary frame for the text node (which gets only the *first* of the two text frames), and tells the text frame that its data changed, which makes it drop its text run.

Then, when we get to reflow, we get to the second text frame, which was not told to drop its text from, so it still has a text run mapping one character.  However, it knows that it has 7 characters, so it advances iterators within its text run's data by 7 characters, which is too far.  (That gets to the first assertion, at least, which is where I started debugging.)
Comment on attachment 486178 [details] [diff] [review]
a bit faster patch

>@@ -3211,17 +3212,18 @@ nsGenericElement::doInsertChildAt(nsICon
> 
>   PRUint32 childCount = aChildArray.ChildCount();
>   NS_ENSURE_TRUE(aIndex <= childCount, NS_ERROR_ILLEGAL_VALUE);
> 
>   nsMutationGuard::DidMutate();
> 
>   PRBool isAppend = (aIndex == childCount);
> 
>-  mozAutoDocUpdate updateBatch(aDocument, UPDATE_CONTENT_MODEL, aNotify);
>+  mozAutoDocUpdate updateBatch(aNoBeginEndUpdate ? nsnull : aDocument,
>+                               UPDATE_CONTENT_MODEL, aNotify);

Would be more readable as

mozAutoDocUpdate updateBatch(aDocument, UPDATE_CONTENT_MODEL,
                             aNotify && !aNoBeginEndUpdate);

r=me with that.
Attachment #486178 - Flags: review?(jonas) → review+
Thanks for the explanation.  Just to make sure I get it: the memory corruption that allows executing data is not from lifecycle problems but from running pointers off the end of an array (in the text run object)?
(In reply to comment #55)
> Thanks for the explanation.  Just to make sure I get it: the memory corruption
> that allows executing data is not from lifecycle problems but from running
> pointers off the end of an array (in the text run object)?

Yes, I believe so.
Comment on attachment 486178 [details] [diff] [review]
a bit faster patch

You can't make that optimization, because we have element classes that override InsertChildAt to do some things in addition to calling nsGenericElement::InsertChildAt.
Attachment #486178 - Flags: review?(bzbarsky) → review-
In particular, html select and optgroup do, as does svg switch (and xtf, and maybe some others; I stopped looking).
Do be clear, I think we should take smaug's "wip" patch from earlier in this bug.
Comment on attachment 486189 [details] [diff] [review]
same for 191 if needed

It's needed.

r=dveditz based on sicking's r+. The only difference (besides line numbers) is the old branch didn't name the mozAutoDocConditionContentUpdateBatch, but either way it's removed on both branches.
Attachment #486189 - Flags: review+
Attachment #486085 - Flags: review?(jonas)
Comment on attachment 486189 [details] [diff] [review]
same for 191 if needed

r-, for the same reason
Attachment #486189 - Flags: review-
The one time a mid-air might have saved me...  My r- did mid-air with your r- so I guess attachments can only mid-air with themselves.
Attachment #486085 - Flags: review?(jonas) → review+
Comment on attachment 486128 [details] [diff] [review]
Alternative/complimentary patch

r=me
Attachment #486128 - Flags: review+
Attachment #486128 - Flags: approval1.9.2.12+
Attachment #486128 - Flags: approval1.9.1.15+
Attachment #486085 - Flags: approval1.9.2.12+
Attachment #486085 - Flags: approval1.9.1.15+
Attachment #486189 - Flags: review+
Approved. Please land on mozilla-1.9.2 (default and the GECKO19211_20100930_RELBRANCH branch) and mozilla-1.9.1 (default and GECKO19114_20100930_RELBRANCH branch)
Assignee: nobody → Olli.Pettay
Attachment #486178 - Attachment is obsolete: true
Attachment #486189 - Attachment is obsolete: true
Attachment #486011 - Attachment is private: true
Attachment #485985 - Attachment is private: true
I've observed that 3.6.11 across platforms crashes on the simplified tet cases immediately on WinXP, and after a few tries on Mac 10.6 and Linux 10.10, and I have verified that the 3.6.12 candidate builds do not crash after several times of trying (about 5:1 tries). Instead you get an unresponsive-script dialog which you can choose to stop or continue.

However, I noticed that if you check the "don't ask me again" checkbox, you hang the browser and the only option is to force quit.

I'll check 3.5.15 builds next.
(In reply to comment #45)
> Note that on m-c this is not a problem in HTML because the HTML5 parser doesn't
> do weird stuff from BeginUpdate.

Specifically, the HTML5 parser flushes trailing text before returning from document.write and makes sure all parser-inserted nodes have notified before running a script and before returning from document.write().
I checked in Mac 10.6, XP, and Ubuntu 10.10

3.5.14: crashes immediately
3.5.15 build candidates: You get an unresponsive script dialog, which you can stop or continue, but it doesn't crash.

You can reproduce the hang if you check the "don't ask me again" and select to continue, but otherwise this bug fix is verified on 1.9.1 and 1.9.2.
(In reply to comment #57)
> You can't make that optimization, because we have element classes that override
> InsertChildAt to do some things in addition to calling
> nsGenericElement::InsertChildAt.
Argh, I was searching only in content/base
Depends on: 607576
(In reply to comment #69)
> I checked in Mac 10.6, XP, and Ubuntu 10.10
> 
> 3.5.14: crashes immediately
> 3.5.15 build candidates: You get an unresponsive script dialog, which you can
> stop or continue, but it doesn't crash.

 I assume that you did this with 3.6.11 and 3.6.12 too since you marked this as verified1.9.2?
(In reply to comment #71)

Yes, I checked 3.6.11/3.6.12 and 3.5.14/3.5.15 (see comment #67 and comment #69 respectively).
Thanks, Juan. I missed comment 67 in the rush. I appreciate you doing the verification for QA.
Attachment #485985 - Attachment is private: false
Attachment #486011 - Attachment is private: false
Attachment #486011 - Attachment is private: true
Attachment #486011 - Attachment is private: false
Group: core-security
Summary: In-the-wild 0day affecting Firefox 3.6.x on Windows → Interleaving document.write and appendChild can lead to duplicate text frames and overrunning of text run buffers
Component: Security → DOM
QA Contact: toolkit → general
Firefox 3.6.12 patch this vulnerable.
I reported document.write vulnerable one day before

https://bugzilla.mozilla.org/show_bug.cgi?id=607222

but now Firefox is safe.
Bug 606790 is not the same as this bug.
Flags: in-testsuite? → in-testsuite+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: