Closed Bug 371321 Opened 17 years ago Closed 17 years ago

memory corruption when onUnload is mixed with document.write()s

Categories

(Core :: Security, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: lcamtuf, Assigned: dveditz)

References

()

Details

(4 keywords, Whiteboard: [sg:critical])

Attachments

(1 file)

Firefox is susceptible to a seemingly pretty nasty, and apparently
easily exploitable memory corruption vulnerability. When a location
transition occurs and the structure of a document is modified from within
onUnload event handler, freed DOM-related memory structures are left in inconsistent state, possibly leading to a remote compromise.

A quick test case that crashes while trying to follow corrupted pointers
(can be forced to write, too; tested on Windows XP, IA32):

  http://lcamtuf.coredump.cx/ietrap/testme.html

Cheers,
/mz
Component: Security → General
Keywords: crash
Product: Firefox → Core
QA Contact: firefox → general
Version: 2.0 Branch → Trunk
Stack:

   dcdcd20  
>  JS_ClearScope(cx=0x03ff52a0, obj=0x0521cde0) Line 3220
   nsJSContext::ClearScope(aGlobalObj=0x0521cde0, aClearFromProtoChain=1) Line 2971
   nsGlobalWindow::ClearWindowScope(aWindow=0x05221028) Line 6433
   nsJSContext::ScriptEvaluated(aTerminated=1) Line 3044
   nsJSContext::CallEventHandler(aTarget=0x04afeb8c, aScope=0x04af1240, aHandler=0x04b07f00, aargv=0x04802850, arv=0x0012f044) Line 1804
   nsJSEventListener::HandleEvent(aEvent=0x04801b70) Line 208
   nsEventListenerManager::HandleEventSubType(aListenerStruct=0x04b1fa60, aListener=0x04b1fa10, aDOMEvent=0x04801b70, aCurrentTarget=0x03ff4f70, aPhaseFlags=6) Line 1230
   nsEventListenerManager::HandleEvent(aPresContext=0x04b01be8, aEvent=0x0012f39c, aDOMEvent=0x0012f304, aCurrentTarget=0x03ff4f70, aFlags=6, aEventStatus=0x0012f308) Line 1350
   nsEventTargetChainItem::HandleEvent(aVisitor={...}, aFlags=6) Line 356
   nsEventTargetChainItem::HandleEventTargetChain(aVisitor={...}, aFlags=6, aCallback=0x00000000) Line 433
   nsEventDispatcher::Dispatch(aTarget=0x03ff4f70, aPresContext=0x04b01be8, aEvent=0x0012f39c, aDOMEvent=0x00000000, aEventStatus=0x0012f398, aCallback=0x00000000, aTargetIsChromeHandler=0) Line 639
   DocumentViewerImpl::PageHide(aIsUnload=1) Line 1261
   nsDocShell::FirePageHideNotification(aIsUnload=1) Line 964
   nsDocShell::CreateContentViewer(aContentType=0x04809280, request=0x0484a954, aContentHandler=0x047c421c) Line 5705
   nsDSURIContentListener::DoContent(aContentType=0x04809280, aIsContentPreferred=1, request=0x0484a954, aContentHandler=0x047c421c, aAbortProcess=0x0012f540) Line 138
   nsDocumentOpenInfo::TryContentListener(aListener=0x03fddc48, aChannel=0x0484a954) Line 789
   nsDocumentOpenInfo::DispatchContent(request=0x0484a954, aCtxt=0x00000000) Line 488
   nsDocumentOpenInfo::OnStartRequest(request=0x0484a954, aCtxt=0x00000000) Line 333
   nsHttpChannel::CallOnStartRequest() Line 712
   nsHttpChannel::ProcessNormal() Line 883
   nsHttpChannel::ProcessResponse() Line 767
   nsHttpChannel::OnStartRequest(request=0x0475d298, ctxt=0x00000000) Line 3954
   nsInputStreamPump::OnStateStart() Line 434
   nsInputStreamPump::OnInputStreamReady(stream=0x048cd2b8) Line 390
   nsInputStreamReadyEvent::Run() Line 112
   nsThread::ProcessNextEvent(mayWait=1, result=0x0012fbc0) Line 483
   NS_ProcessNextEvent_P(thread=0x00b6c438, mayWait=1) Line 225
   nsBaseAppShell::Run() Line 153
   nsAppStartup::Run() Line 171
   XRE_main(argc=5, argv=0x00b6a360, aAppData=0x00403094) Line 2838
   main(argc=5, argv=0x00b6a360) Line 61
   mainCRTStartup() Line 398
(In reply to comment #1)
> Stack:
> 
>    dcdcd20

Oops, this is a copy/paste error. Obviously that should be 0xcdcdcd20.
Assignee: nobody → dveditz
Component: General → Security
QA Contact: general → toolkit
Keywords: testcase
Whiteboard: [sg:critical]
This bug also effects Firefox under Linux(Ubuntu).

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)
Version: Trunk → 1.8 Branch
It looks like this was originally reported against Firefox 2.0.0.x.  What other versions show the problem?  Is it in 1.5.0.x?  Is it in the trunk?
(In reply to comment #4)
> Is it in the trunk?

Yes. The stack I posted in comment 1 is from a Windows debug build from about a week ago.
First we call SetTerminationFunction(ClearWindowScope, currentInner) here:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.cpp&rev=1.911&root=/cvsroot&mark=1510-1511#1510

Then while it sits in that queue there is a JS GC run which deallocates the
JS arena that holds the JSObject* which ClearWindowScope will eventually use:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.cpp&rev=1.911&root=/cvsroot&mark=6421#6412

I think we need to hold some kind of "strong ref" on the script globals
until the termination functions have run.
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.h&rev=1.301&root=/cvsroot&mark=658,684#637
(In reply to comment #4)
> What other versions show the problem?  Is it in 1.5.0.x?  Is it in the trunk?

SeaMonkey 1.5a (Trunk) Win XP crashes nicely, see TB29619085K (for the current
trunk build I got no working TB). There is also a FF trunk incident report. 1.8 Branch (SM 1.1.x) and 1.8.0 Branch (SM 1.0.x) I see only page corruption and the bogus URL, but no crash. After the test you will see a lot of
"wyciwyg://xxxx/http://lcamtuf.coredump.cx/ietrap/testme.html" entries for last
visited pages (GO menu), where xxxx are increasing numbers.   
OS: Windows XP → All
Version: 1.8 Branch → Trunk
Is GetScriptGlobal returning a non-null pointer?  I'd hope that finalizing that JSObject would clear out pointers to it....  If that's not happening, it's a problem for more than just the termination functions, I would think.

Note that this code is _very_ different on branch (though the problem may be the same).
Flags: blocking1.9?
Flags: blocking1.8.1.3?
Flags: blocking1.8.0.11?
(In reply to comment #8)
> Is GetScriptGlobal returning a non-null pointer? 

Yes, it's a pointer to a once valid JSObject that is now filled with
0xdadadada since the JS arena was deallocated (I traced DestroyGCArena()
so I know this is what happens, at least with some certainty ;-)
I don't see how the JS engine can "clear out pointers" like the members
I pointed out in the last link above.
So, would making the TerminationFuncClosure (in nsJSEnvironment.{h,cpp}) hold a strong ref to the going-away inner window's script object fix this?
Well, we could look for the finalize hook.

But in this case, I think the outer window should perhaps tell the inner to forget its script global when it drops it from mInnerWindowHolders.  Or something.  Do we have consumers that expect to access the inner window's JSObject when it's no longer the current inner?  If so, then we may need to do some sort of finalize-like hooks for script globals in general...

Note that we _could_ hack in a JS-only object locking for the termination func, but the problem could pop up with some other language runtime too, so if we can fix it in general that would be nice.
Blake, we'd need to hold a strong ref to the JSObjectHolder to do that.
(In reply to comment #11)
> But in this case, I think the outer window should perhaps tell the inner to
> forget its script global when it drops it from mInnerWindowHolders.  Or
> Do we have consumers that expect to access the inner window's
> JSObject when it's no longer the current inner?

The point of the termination function stuff is to allow scripts running in the currently dying inner window to run to completion, and they do expect to be able to access global variables, etc. which is why we're deferring the JS_ClearScope until later.

It seems to me that having a JS-specific holder in the TerminationFuncClosure correctly mirrors the mInnerWindowHolder that the outer window is dropping, which I think is causing this trouble. If Python or another language needs similar treatment, the outer window would end up having to hold onto more language-specific things anyway.
The outer window's mInnerWindowHolders (on trunk) keeps the python stuff alive too.

Sounds like setting a termination func should perhaps just make a copy of the outer window's mInnerWindowHolders in the TerminationFuncClosure struct?  That would make sense to me...
(In reply to comment #14)
> Sounds like setting a termination func should perhaps just make a copy of the
> outer window's mInnerWindowHolders in the TerminationFuncClosure struct?

FWIW, I did a quick hack to test that and it fixed the crash (on trunk).
The odd thing is we've already got a finalize hook that clears out mJSObject and the right mScriptGlobals[]. Why that's not enough here is beyond me at this point.
DOM window objects need the finalize hook to clean up internal pointers etc, the call to that hook was removed when the XPCOM cycle collector changes landed (bug 333078). This restores the necessary calls to the finalize hook. This fixes the JS_ClearScope() crash on the trunk, and seems to make this testcase unable to crash a trunk build.
Attachment #256240 - Flags: superreview?(bzbarsky)
Attachment #256240 - Flags: review?(mrbkap)
The only crash I've seen with a branch build so far was with this on the top of the stack:

0x05362c08 in nsVoidArray::SafeElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:92
92          return mImpl->mArray[aIndex];
(gdb) bt 10
#0  0x05362c08 in nsVoidArray::SafeElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:92
#1  0x05362c84 in nsVoidArray::ElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:81
#2  0x05362cad in nsVoidArray::operator[] (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:95
#3  0x053b7779 in PresShell::ProcessReflowCommands (this=0x8cc5a68,
    aInterruptible=1) at ../../../mozilla/layout/base/nsPresShell.cpp:6895
#4  0x053b7bdd in PresShell::WillPaint (this=0x8cc5a68)
    at ../../../mozilla/layout/base/nsPresShell.cpp:6564
#5  0x057c025e in nsViewManager::FlushPendingInvalidates (this=0x8abb2e0)
    at ../../../mozilla/view/src/nsViewManager.cpp:4409
#6  0x057c414f in nsViewManager::EnableRefresh (this=0x8abb2e0, aUpdateFlags=0)
    at ../../../mozilla/view/src/nsViewManager.cpp:3442
#7  0x057c0474 in nsViewManager::EndUpdateViewBatch (this=0x8abb2e0,
    aUpdateFlags=0) at ../../../mozilla/view/src/nsViewManager.cpp:3487
#8  0x053b3569 in PresShell::InitialReflow (this=0x8cc5a68, aWidth=15015,
    aHeight=13410) at ../../../mozilla/layout/base/nsPresShell.cpp:2936
#9  0x055e4cca in nsContentSink::StartLayout (this=0x8c5c478, aIsFrameset=0)
    at ../../../../mozilla/content/base/src/nsContentSink.cpp:921

Dbaron, any clues?

And just to be clear, I have *not* seen a crash with a branch build yet that was anywhere near the JS_ClearScope code that the trunk was crashing in. 
Attachment #256240 - Flags: review?(mrbkap) → review+
Comment on attachment 256240 [details] [diff] [review]
Trunk fix, restore WANT_FINALIZE for window objects.

sr=dbaron.  This probably explains some random crashes I've been seeing in the past few weks using trunk builds.
Attachment #256240 - Flags: superreview?(bzbarsky) → superreview+
I tried getting a 1.8 branch build to crash (this morning, on x86_64 Linux, under valgrind), and couldn't get it to crash.

(In reply to comment #18)
> Dbaron, any clues?

One theory might be that the pres shell could have been destroyed in the middle of one of the methods that's currently on the stack.  However, that doesn't make sense since PresShell::Destroy nulls out mViewManager, and the mViewManager pointer is clearly good in frame #8.  Though perhaps nsViewManager::FlushPendingInvalidates called WillPaint on another pres shell that somehow flushed something causing this pres shell to be destroyed?  It's hard to guess; it would be great to see what valgrind says about a crash on the 1.8 branch.
sr=bzbarsky too, fwiw.
I couldn't get a 1.8 branch build on Windows to crash either.  (That was a debug build, trying an optimized non-debug build now.)


Has anybody who's seen the crash on 1.8 branch builds submitted talkback reports?  They might be useful.
Doesn't crash in my optimized Windows 1.8 branch build either.
Aha!

Back on Linux (i686, this time), I tried the testcase in the 2.0.0.1 release three times, and it crashed all three times.  But I tried it again in the 2.0.0.2 release, downloaded and unpackaged the exact same way, again three times, and I didn't crash at all.

So I think the bug originally being reported here (the branch bug) was already fixed in the 2.0.0.2 release.
Yeah, indeed this seems to be fixed on the branch.
I can see the crash in a 2007-01-15 branch build, but not anymore in a 2007-01-17 branch build:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-01-15+04&maxdate=2007-01-17+09&cvsroot=%2Fcvsroot
Fixed by the fix for bug 364692? (that would mean that Firefox2.0.0 don't have this bug)
Fixed on 1.8 branch between Linux nightly builds:
2007-01-16-04-mozilla1.8
and
2007-01-17-04-mozilla1.8
For what it's worth, the talkback reports on branch builds seem to pretty much match jst's stack in comment 18.  I sent 4 on Linux while binary searching: TB29654874, TB29654981, TB29655072, and TB29655112, and somebody else sent one on Windows yesterday, TB29623727.
Ok, with a firefox 1.5.0.x build of 2006-11-04 I don't crash, but with a firefox 2 build of 2006-11-11, I do crash, so it regressed on branch between those dates, which is similar to bug 364692.
Fixed on trunk as of yesterday evening (thanks, jst), so marking FIXED.

Fixed on 1.8 branch (for Firefox 2.0.0.2) by bug 364692 (confirmed by local backout on Linux x86_64 in addition to nightly builds per comment 25 and comment 26), so adding fixed1.8.1.2 keyword.

Worksforme on Linux nightly 2007-02-16-04-mozilla1.8.0 (Firefox 1.5.0.10pre), before bug 364692 landed on the 1.8.0 branch.

Also works for me, Linux 1.5.0.10 release (1.8.0 branch).

Clearing nominations for blocking1.8.1.3, blocking1.8.0.11, and blocking1.9.
Status: NEW → RESOLVED
Closed: 17 years ago
Flags: blocking1.9?
Flags: blocking1.8.1.3?
Flags: blocking1.8.0.11?
Keywords: fixed1.8.1.2
Resolution: --- → FIXED
Well, I crash with a 1.8.0.x build of 2007-01-16, but not anymore with a 1.8.0.x build of 2007-01-18, so I'm also adding the fixed1.8.0.10 keyword.
Keywords: fixed1.8.0.10
I'm curious; if the loop is length more like 25,000 instead of 250, does the number of inner windows actually increase to 25,000?  I see 250 inner windows with the testcase for sure...
This is still a problem in 1.5.0.10 though, right?
(In reply to comment #32)
> This is still a problem in 1.5.0.10 though, right?

Not according to comment 30. I also can't reproduce the problem in a Windows 1.5.0.10.
True indeed. I saw this in a 1.5 build, but it was out of date and updating made the crash go away.
So for future clarity. There are two bugs involved in this bug:

1)
On branch (and possibly trunk) there was a problem that apparently bug 364692 fixed. This problem hasn't really been debugged, but backouts has confirmed that bug 364692 indeed does fix it.

2)
On trunk there was a recent regression that was debugged and fixed by the attachment in this bug. This problem did never exist on branches.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: