Last Comment Bug 371321 - memory corruption when onUnload is mixed with document.write()s
: memory corruption when onUnload is mixed with document.write()s
Status: RESOLVED FIXED
[sg:critical]
: crash, fixed1.8.0.10, fixed1.8.1.2, testcase
Product: Core
Classification: Components
Component: Security (show other bugs)
: Trunk
: x86 All
: -- critical with 1 vote (vote)
: ---
Assigned To: Daniel Veditz [:dveditz]
:
Mentors:
http://lcamtuf.coredump.cx/ietrap/tes...
Depends on: 364692
Blocks: cycle-collector
  Show dependency treegraph
 
Reported: 2007-02-22 17:35 PST by Michal Zalewski
Modified: 2007-03-08 13:56 PST (History)
31 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Trunk fix, restore WANT_FINALIZE for window objects. (1.04 KB, patch)
2007-02-23 18:16 PST, Johnny Stenback (:jst, jst@mozilla.com)
mrbkap: review+
dbaron: superreview+
Details | Diff | Splinter Review

Description Michal Zalewski 2007-02-22 17:35:49 PST
Firefox is susceptible to a seemingly pretty nasty, and apparently
easily exploitable memory corruption vulnerability. When a location
transition occurs and the structure of a document is modified from within
onUnload event handler, freed DOM-related memory structures are left in inconsistent state, possibly leading to a remote compromise.

A quick test case that crashes while trying to follow corrupted pointers
(can be forced to write, too; tested on Windows XP, IA32):

  http://lcamtuf.coredump.cx/ietrap/testme.html

Cheers,
/mz
Comment 1 :Gavin Sharp [email: gavin@gavinsharp.com] 2007-02-22 17:41:45 PST
Stack:

   dcdcd20  
>  JS_ClearScope(cx=0x03ff52a0, obj=0x0521cde0) Line 3220
   nsJSContext::ClearScope(aGlobalObj=0x0521cde0, aClearFromProtoChain=1) Line 2971
   nsGlobalWindow::ClearWindowScope(aWindow=0x05221028) Line 6433
   nsJSContext::ScriptEvaluated(aTerminated=1) Line 3044
   nsJSContext::CallEventHandler(aTarget=0x04afeb8c, aScope=0x04af1240, aHandler=0x04b07f00, aargv=0x04802850, arv=0x0012f044) Line 1804
   nsJSEventListener::HandleEvent(aEvent=0x04801b70) Line 208
   nsEventListenerManager::HandleEventSubType(aListenerStruct=0x04b1fa60, aListener=0x04b1fa10, aDOMEvent=0x04801b70, aCurrentTarget=0x03ff4f70, aPhaseFlags=6) Line 1230
   nsEventListenerManager::HandleEvent(aPresContext=0x04b01be8, aEvent=0x0012f39c, aDOMEvent=0x0012f304, aCurrentTarget=0x03ff4f70, aFlags=6, aEventStatus=0x0012f308) Line 1350
   nsEventTargetChainItem::HandleEvent(aVisitor={...}, aFlags=6) Line 356
   nsEventTargetChainItem::HandleEventTargetChain(aVisitor={...}, aFlags=6, aCallback=0x00000000) Line 433
   nsEventDispatcher::Dispatch(aTarget=0x03ff4f70, aPresContext=0x04b01be8, aEvent=0x0012f39c, aDOMEvent=0x00000000, aEventStatus=0x0012f398, aCallback=0x00000000, aTargetIsChromeHandler=0) Line 639
   DocumentViewerImpl::PageHide(aIsUnload=1) Line 1261
   nsDocShell::FirePageHideNotification(aIsUnload=1) Line 964
   nsDocShell::CreateContentViewer(aContentType=0x04809280, request=0x0484a954, aContentHandler=0x047c421c) Line 5705
   nsDSURIContentListener::DoContent(aContentType=0x04809280, aIsContentPreferred=1, request=0x0484a954, aContentHandler=0x047c421c, aAbortProcess=0x0012f540) Line 138
   nsDocumentOpenInfo::TryContentListener(aListener=0x03fddc48, aChannel=0x0484a954) Line 789
   nsDocumentOpenInfo::DispatchContent(request=0x0484a954, aCtxt=0x00000000) Line 488
   nsDocumentOpenInfo::OnStartRequest(request=0x0484a954, aCtxt=0x00000000) Line 333
   nsHttpChannel::CallOnStartRequest() Line 712
   nsHttpChannel::ProcessNormal() Line 883
   nsHttpChannel::ProcessResponse() Line 767
   nsHttpChannel::OnStartRequest(request=0x0475d298, ctxt=0x00000000) Line 3954
   nsInputStreamPump::OnStateStart() Line 434
   nsInputStreamPump::OnInputStreamReady(stream=0x048cd2b8) Line 390
   nsInputStreamReadyEvent::Run() Line 112
   nsThread::ProcessNextEvent(mayWait=1, result=0x0012fbc0) Line 483
   NS_ProcessNextEvent_P(thread=0x00b6c438, mayWait=1) Line 225
   nsBaseAppShell::Run() Line 153
   nsAppStartup::Run() Line 171
   XRE_main(argc=5, argv=0x00b6a360, aAppData=0x00403094) Line 2838
   main(argc=5, argv=0x00b6a360) Line 61
   mainCRTStartup() Line 398
Comment 2 :Gavin Sharp [email: gavin@gavinsharp.com] 2007-02-22 17:43:09 PST
(In reply to comment #1)
> Stack:
> 
>    dcdcd20

Oops, this is a copy/paste error. Obviously that should be 0xcdcdcd20.
Comment 3 Dieter Komendera 2007-02-23 04:18:33 PST
This bug also effects Firefox under Linux(Ubuntu).

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)
Comment 4 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-23 11:43:46 PST
It looks like this was originally reported against Firefox 2.0.0.x.  What other versions show the problem?  Is it in 1.5.0.x?  Is it in the trunk?
Comment 5 :Gavin Sharp [email: gavin@gavinsharp.com] 2007-02-23 11:46:10 PST
(In reply to comment #4)
> Is it in the trunk?

Yes. The stack I posted in comment 1 is from a Windows debug build from about a week ago.
Comment 6 Mats Palmgren (vacation) 2007-02-23 12:27:56 PST
First we call SetTerminationFunction(ClearWindowScope, currentInner) here:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.cpp&rev=1.911&root=/cvsroot&mark=1510-1511#1510

Then while it sits in that queue there is a JS GC run which deallocates the
JS arena that holds the JSObject* which ClearWindowScope will eventually use:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.cpp&rev=1.911&root=/cvsroot&mark=6421#6412

I think we need to hold some kind of "strong ref" on the script globals
until the termination functions have run.
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.h&rev=1.301&root=/cvsroot&mark=658,684#637
Comment 7 OstGote! 2007-02-23 12:35:19 PST
(In reply to comment #4)
> What other versions show the problem?  Is it in 1.5.0.x?  Is it in the trunk?

SeaMonkey 1.5a (Trunk) Win XP crashes nicely, see TB29619085K (for the current
trunk build I got no working TB). There is also a FF trunk incident report. 1.8 Branch (SM 1.1.x) and 1.8.0 Branch (SM 1.0.x) I see only page corruption and the bogus URL, but no crash. After the test you will see a lot of
"wyciwyg://xxxx/http://lcamtuf.coredump.cx/ietrap/testme.html" entries for last
visited pages (GO menu), where xxxx are increasing numbers.   
Comment 8 Boris Zbarsky [:bz] 2007-02-23 12:45:48 PST
Is GetScriptGlobal returning a non-null pointer?  I'd hope that finalizing that JSObject would clear out pointers to it....  If that's not happening, it's a problem for more than just the termination functions, I would think.

Note that this code is _very_ different on branch (though the problem may be the same).
Comment 9 Mats Palmgren (vacation) 2007-02-23 12:55:18 PST
(In reply to comment #8)
> Is GetScriptGlobal returning a non-null pointer? 

Yes, it's a pointer to a once valid JSObject that is now filled with
0xdadadada since the JS arena was deallocated (I traced DestroyGCArena()
so I know this is what happens, at least with some certainty ;-)
I don't see how the JS engine can "clear out pointers" like the members
I pointed out in the last link above.
Comment 10 Blake Kaplan (:mrbkap) 2007-02-23 13:12:08 PST
So, would making the TerminationFuncClosure (in nsJSEnvironment.{h,cpp}) hold a strong ref to the going-away inner window's script object fix this?
Comment 11 Boris Zbarsky [:bz] 2007-02-23 13:14:51 PST
Well, we could look for the finalize hook.

But in this case, I think the outer window should perhaps tell the inner to forget its script global when it drops it from mInnerWindowHolders.  Or something.  Do we have consumers that expect to access the inner window's JSObject when it's no longer the current inner?  If so, then we may need to do some sort of finalize-like hooks for script globals in general...

Note that we _could_ hack in a JS-only object locking for the termination func, but the problem could pop up with some other language runtime too, so if we can fix it in general that would be nice.
Comment 12 Boris Zbarsky [:bz] 2007-02-23 13:16:32 PST
Blake, we'd need to hold a strong ref to the JSObjectHolder to do that.
Comment 13 Blake Kaplan (:mrbkap) 2007-02-23 13:30:48 PST
(In reply to comment #11)
> But in this case, I think the outer window should perhaps tell the inner to
> forget its script global when it drops it from mInnerWindowHolders.  Or
> Do we have consumers that expect to access the inner window's
> JSObject when it's no longer the current inner?

The point of the termination function stuff is to allow scripts running in the currently dying inner window to run to completion, and they do expect to be able to access global variables, etc. which is why we're deferring the JS_ClearScope until later.

It seems to me that having a JS-specific holder in the TerminationFuncClosure correctly mirrors the mInnerWindowHolder that the outer window is dropping, which I think is causing this trouble. If Python or another language needs similar treatment, the outer window would end up having to hold onto more language-specific things anyway.
Comment 14 Boris Zbarsky [:bz] 2007-02-23 13:34:49 PST
The outer window's mInnerWindowHolders (on trunk) keeps the python stuff alive too.

Sounds like setting a termination func should perhaps just make a copy of the outer window's mInnerWindowHolders in the TerminationFuncClosure struct?  That would make sense to me...
Comment 15 Mats Palmgren (vacation) 2007-02-23 13:58:53 PST
(In reply to comment #14)
> Sounds like setting a termination func should perhaps just make a copy of the
> outer window's mInnerWindowHolders in the TerminationFuncClosure struct?

FWIW, I did a quick hack to test that and it fixed the crash (on trunk).
Comment 16 Johnny Stenback (:jst, jst@mozilla.com) 2007-02-23 17:21:07 PST
The odd thing is we've already got a finalize hook that clears out mJSObject and the right mScriptGlobals[]. Why that's not enough here is beyond me at this point.
Comment 17 Johnny Stenback (:jst, jst@mozilla.com) 2007-02-23 18:16:04 PST
Created attachment 256240 [details] [diff] [review]
Trunk fix, restore WANT_FINALIZE for window objects.

DOM window objects need the finalize hook to clean up internal pointers etc, the call to that hook was removed when the XPCOM cycle collector changes landed (bug 333078). This restores the necessary calls to the finalize hook. This fixes the JS_ClearScope() crash on the trunk, and seems to make this testcase unable to crash a trunk build.
Comment 18 Johnny Stenback (:jst, jst@mozilla.com) 2007-02-23 18:18:48 PST
The only crash I've seen with a branch build so far was with this on the top of the stack:

0x05362c08 in nsVoidArray::SafeElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:92
92          return mImpl->mArray[aIndex];
(gdb) bt 10
#0  0x05362c08 in nsVoidArray::SafeElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:92
#1  0x05362c84 in nsVoidArray::ElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:81
#2  0x05362cad in nsVoidArray::operator[] (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:95
#3  0x053b7779 in PresShell::ProcessReflowCommands (this=0x8cc5a68,
    aInterruptible=1) at ../../../mozilla/layout/base/nsPresShell.cpp:6895
#4  0x053b7bdd in PresShell::WillPaint (this=0x8cc5a68)
    at ../../../mozilla/layout/base/nsPresShell.cpp:6564
#5  0x057c025e in nsViewManager::FlushPendingInvalidates (this=0x8abb2e0)
    at ../../../mozilla/view/src/nsViewManager.cpp:4409
#6  0x057c414f in nsViewManager::EnableRefresh (this=0x8abb2e0, aUpdateFlags=0)
    at ../../../mozilla/view/src/nsViewManager.cpp:3442
#7  0x057c0474 in nsViewManager::EndUpdateViewBatch (this=0x8abb2e0,
    aUpdateFlags=0) at ../../../mozilla/view/src/nsViewManager.cpp:3487
#8  0x053b3569 in PresShell::InitialReflow (this=0x8cc5a68, aWidth=15015,
    aHeight=13410) at ../../../mozilla/layout/base/nsPresShell.cpp:2936
#9  0x055e4cca in nsContentSink::StartLayout (this=0x8c5c478, aIsFrameset=0)
    at ../../../../mozilla/content/base/src/nsContentSink.cpp:921

Dbaron, any clues?

And just to be clear, I have *not* seen a crash with a branch build yet that was anywhere near the JS_ClearScope code that the trunk was crashing in. 
Comment 19 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-23 19:16:38 PST
Comment on attachment 256240 [details] [diff] [review]
Trunk fix, restore WANT_FINALIZE for window objects.

sr=dbaron.  This probably explains some random crashes I've been seeing in the past few weks using trunk builds.
Comment 20 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-23 20:18:28 PST
I tried getting a 1.8 branch build to crash (this morning, on x86_64 Linux, under valgrind), and couldn't get it to crash.

(In reply to comment #18)
> Dbaron, any clues?

One theory might be that the pres shell could have been destroyed in the middle of one of the methods that's currently on the stack.  However, that doesn't make sense since PresShell::Destroy nulls out mViewManager, and the mViewManager pointer is clearly good in frame #8.  Though perhaps nsViewManager::FlushPendingInvalidates called WillPaint on another pres shell that somehow flushed something causing this pres shell to be destroyed?  It's hard to guess; it would be great to see what valgrind says about a crash on the 1.8 branch.
Comment 21 Boris Zbarsky [:bz] 2007-02-23 21:18:53 PST
sr=bzbarsky too, fwiw.
Comment 22 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-24 12:38:29 PST
I couldn't get a 1.8 branch build on Windows to crash either.  (That was a debug build, trying an optimized non-debug build now.)


Has anybody who's seen the crash on 1.8 branch builds submitted talkback reports?  They might be useful.
Comment 23 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-24 13:32:59 PST
Doesn't crash in my optimized Windows 1.8 branch build either.
Comment 24 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-24 13:39:08 PST
Aha!

Back on Linux (i686, this time), I tried the testcase in the 2.0.0.1 release three times, and it crashed all three times.  But I tried it again in the 2.0.0.2 release, downloaded and unpackaged the exact same way, again three times, and I didn't crash at all.

So I think the bug originally being reported here (the branch bug) was already fixed in the 2.0.0.2 release.
Comment 25 Martijn Wargers [:mwargers] (not working for Mozilla) 2007-02-24 13:51:33 PST
Yeah, indeed this seems to be fixed on the branch.
I can see the crash in a 2007-01-15 branch build, but not anymore in a 2007-01-17 branch build:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-01-15+04&maxdate=2007-01-17+09&cvsroot=%2Fcvsroot
Fixed by the fix for bug 364692? (that would mean that Firefox2.0.0 don't have this bug)
Comment 26 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-24 13:55:41 PST
Fixed on 1.8 branch between Linux nightly builds:
2007-01-16-04-mozilla1.8
and
2007-01-17-04-mozilla1.8
Comment 27 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-24 13:59:47 PST
For what it's worth, the talkback reports on branch builds seem to pretty much match jst's stack in comment 18.  I sent 4 on Linux while binary searching: TB29654874, TB29654981, TB29655072, and TB29655112, and somebody else sent one on Windows yesterday, TB29623727.
Comment 28 Martijn Wargers [:mwargers] (not working for Mozilla) 2007-02-24 14:01:34 PST
Ok, with a firefox 1.5.0.x build of 2006-11-04 I don't crash, but with a firefox 2 build of 2006-11-11, I do crash, so it regressed on branch between those dates, which is similar to bug 364692.
Comment 29 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2007-02-24 14:38:52 PST
Fixed on trunk as of yesterday evening (thanks, jst), so marking FIXED.

Fixed on 1.8 branch (for Firefox 2.0.0.2) by bug 364692 (confirmed by local backout on Linux x86_64 in addition to nightly builds per comment 25 and comment 26), so adding fixed1.8.1.2 keyword.

Worksforme on Linux nightly 2007-02-16-04-mozilla1.8.0 (Firefox 1.5.0.10pre), before bug 364692 landed on the 1.8.0 branch.

Also works for me, Linux 1.5.0.10 release (1.8.0 branch).

Clearing nominations for blocking1.8.1.3, blocking1.8.0.11, and blocking1.9.
Comment 30 Martijn Wargers [:mwargers] (not working for Mozilla) 2007-02-24 15:10:22 PST
Well, I crash with a 1.8.0.x build of 2007-01-16, but not anymore with a 1.8.0.x build of 2007-01-18, so I'm also adding the fixed1.8.0.10 keyword.
Comment 31 Boris Zbarsky [:bz] 2007-02-25 14:50:52 PST
I'm curious; if the loop is length more like 25,000 instead of 250, does the number of inner windows actually increase to 25,000?  I see 250 inner windows with the testcase for sure...
Comment 32 Johnny Stenback (:jst, jst@mozilla.com) 2007-02-26 18:03:35 PST
This is still a problem in 1.5.0.10 though, right?
Comment 33 :Gavin Sharp [email: gavin@gavinsharp.com] 2007-02-26 18:12:18 PST
(In reply to comment #32)
> This is still a problem in 1.5.0.10 though, right?

Not according to comment 30. I also can't reproduce the problem in a Windows 1.5.0.10.
Comment 34 Johnny Stenback (:jst, jst@mozilla.com) 2007-02-27 16:23:40 PST
True indeed. I saw this in a 1.5 build, but it was out of date and updating made the crash go away.
Comment 35 Jonas Sicking (:sicking) PTO Until July 5th 2007-03-08 13:56:14 PST
So for future clarity. There are two bugs involved in this bug:

1)
On branch (and possibly trunk) there was a problem that apparently bug 364692 fixed. This problem hasn't really been debugged, but backouts has confirmed that bug 364692 indeed does fix it.

2)
On trunk there was a recent regression that was debugged and fixed by the attachment in this bug. This problem did never exist on branches.

Note You need to log in before you can comment on or make changes to this bug.