371321 - memory corruption when onUnload is mixed with document.write()s

Reporter

Description

•

17 years ago

Firefox is susceptible to a seemingly pretty nasty, and apparently
easily exploitable memory corruption vulnerability. When a location
transition occurs and the structure of a document is modified from within
onUnload event handler, freed DOM-related memory structures are left in inconsistent state, possibly leading to a remote compromise.

A quick test case that crashes while trying to follow corrupted pointers
(can be forced to write, too; tested on Windows XP, IA32):

  http://lcamtuf.coredump.cx/ietrap/testme.html

Cheers,
/mz

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

17 years ago

Component: Security → General

Keywords: crash

Product: Firefox → Core

QA Contact: firefox → general

Version: 2.0 Branch → Trunk

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 1

•

17 years ago

Stack:

   dcdcd20  
>  JS_ClearScope(cx=0x03ff52a0, obj=0x0521cde0) Line 3220
   nsJSContext::ClearScope(aGlobalObj=0x0521cde0, aClearFromProtoChain=1) Line 2971
   nsGlobalWindow::ClearWindowScope(aWindow=0x05221028) Line 6433
   nsJSContext::ScriptEvaluated(aTerminated=1) Line 3044
   nsJSContext::CallEventHandler(aTarget=0x04afeb8c, aScope=0x04af1240, aHandler=0x04b07f00, aargv=0x04802850, arv=0x0012f044) Line 1804
   nsJSEventListener::HandleEvent(aEvent=0x04801b70) Line 208
   nsEventListenerManager::HandleEventSubType(aListenerStruct=0x04b1fa60, aListener=0x04b1fa10, aDOMEvent=0x04801b70, aCurrentTarget=0x03ff4f70, aPhaseFlags=6) Line 1230
   nsEventListenerManager::HandleEvent(aPresContext=0x04b01be8, aEvent=0x0012f39c, aDOMEvent=0x0012f304, aCurrentTarget=0x03ff4f70, aFlags=6, aEventStatus=0x0012f308) Line 1350
   nsEventTargetChainItem::HandleEvent(aVisitor={...}, aFlags=6) Line 356
   nsEventTargetChainItem::HandleEventTargetChain(aVisitor={...}, aFlags=6, aCallback=0x00000000) Line 433
   nsEventDispatcher::Dispatch(aTarget=0x03ff4f70, aPresContext=0x04b01be8, aEvent=0x0012f39c, aDOMEvent=0x00000000, aEventStatus=0x0012f398, aCallback=0x00000000, aTargetIsChromeHandler=0) Line 639
   DocumentViewerImpl::PageHide(aIsUnload=1) Line 1261
   nsDocShell::FirePageHideNotification(aIsUnload=1) Line 964
   nsDocShell::CreateContentViewer(aContentType=0x04809280, request=0x0484a954, aContentHandler=0x047c421c) Line 5705
   nsDSURIContentListener::DoContent(aContentType=0x04809280, aIsContentPreferred=1, request=0x0484a954, aContentHandler=0x047c421c, aAbortProcess=0x0012f540) Line 138
   nsDocumentOpenInfo::TryContentListener(aListener=0x03fddc48, aChannel=0x0484a954) Line 789
   nsDocumentOpenInfo::DispatchContent(request=0x0484a954, aCtxt=0x00000000) Line 488
   nsDocumentOpenInfo::OnStartRequest(request=0x0484a954, aCtxt=0x00000000) Line 333
   nsHttpChannel::CallOnStartRequest() Line 712
   nsHttpChannel::ProcessNormal() Line 883
   nsHttpChannel::ProcessResponse() Line 767
   nsHttpChannel::OnStartRequest(request=0x0475d298, ctxt=0x00000000) Line 3954
   nsInputStreamPump::OnStateStart() Line 434
   nsInputStreamPump::OnInputStreamReady(stream=0x048cd2b8) Line 390
   nsInputStreamReadyEvent::Run() Line 112
   nsThread::ProcessNextEvent(mayWait=1, result=0x0012fbc0) Line 483
   NS_ProcessNextEvent_P(thread=0x00b6c438, mayWait=1) Line 225
   nsBaseAppShell::Run() Line 153
   nsAppStartup::Run() Line 171
   XRE_main(argc=5, argv=0x00b6a360, aAppData=0x00403094) Line 2838
   main(argc=5, argv=0x00b6a360) Line 61
   mainCRTStartup() Line 398

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 2

•

17 years ago

(In reply to comment #1)
> Stack:
> 
>    dcdcd20

Oops, this is a copy/paste error. Obviously that should be 0xcdcdcd20.

Ben Bucksch (:BenB)

Updated

•

17 years ago

Assignee: nobody → dveditz

Component: General → Security

QA Contact: general → toolkit

Jesse Ruderman

Updated

•

17 years ago

Keywords: testcase

Whiteboard: [sg:critical]

Dieter Komendera

Comment 3

•

17 years ago

This bug also effects Firefox under Linux(Ubuntu).

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

17 years ago

Version: Trunk → 1.8 Branch

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 4

•

17 years ago

It looks like this was originally reported against Firefox 2.0.0.x.  What other versions show the problem?  Is it in 1.5.0.x?  Is it in the trunk?

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 5

•

17 years ago

(In reply to comment #4)
> Is it in the trunk?

Yes. The stack I posted in comment 1 is from a Windows debug build from about a week ago.

Mats Palmgren (inactive)

Comment 6

•

17 years ago

First we call SetTerminationFunction(ClearWindowScope, currentInner) here:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.cpp&rev=1.911&root=/cvsroot&mark=1510-1511#1510

Then while it sits in that queue there is a JS GC run which deallocates the
JS arena that holds the JSObject* which ClearWindowScope will eventually use:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.cpp&rev=1.911&root=/cvsroot&mark=6421#6412

I think we need to hold some kind of "strong ref" on the script globals
until the termination functions have run.
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/dom/src/base/nsGlobalWindow.h&rev=1.301&root=/cvsroot&mark=658,684#637

OstGote!

Comment 7

•

17 years ago

(In reply to comment #4)
> What other versions show the problem?  Is it in 1.5.0.x?  Is it in the trunk?

SeaMonkey 1.5a (Trunk) Win XP crashes nicely, see TB29619085K (for the current
trunk build I got no working TB). There is also a FF trunk incident report. 1.8 Branch (SM 1.1.x) and 1.8.0 Branch (SM 1.0.x) I see only page corruption and the bogus URL, but no crash. After the test you will see a lot of
"wyciwyg://xxxx/http://lcamtuf.coredump.cx/ietrap/testme.html" entries for last
visited pages (GO menu), where xxxx are increasing numbers.

OS: Windows XP → All

Version: 1.8 Branch → Trunk

Boris Zbarsky [:bzbarsky]

Comment 8

•

17 years ago

Is GetScriptGlobal returning a non-null pointer?  I'd hope that finalizing that JSObject would clear out pointers to it....  If that's not happening, it's a problem for more than just the termination functions, I would think.

Note that this code is _very_ different on branch (though the problem may be the same).

Flags: blocking1.9?

Flags: blocking1.8.1.3?

Flags: blocking1.8.0.11?

Mats Palmgren (inactive)

Comment 9

•

17 years ago

(In reply to comment #8)
> Is GetScriptGlobal returning a non-null pointer? 

Yes, it's a pointer to a once valid JSObject that is now filled with
0xdadadada since the JS arena was deallocated (I traced DestroyGCArena()
so I know this is what happens, at least with some certainty ;-)
I don't see how the JS engine can "clear out pointers" like the members
I pointed out in the last link above.

Blake Kaplan (:mrbkap) (inactive)

Comment 10

•

17 years ago

So, would making the TerminationFuncClosure (in nsJSEnvironment.{h,cpp}) hold a strong ref to the going-away inner window's script object fix this?

Boris Zbarsky [:bzbarsky]

Comment 11

•

17 years ago

Well, we could look for the finalize hook.

But in this case, I think the outer window should perhaps tell the inner to forget its script global when it drops it from mInnerWindowHolders.  Or something.  Do we have consumers that expect to access the inner window's JSObject when it's no longer the current inner?  If so, then we may need to do some sort of finalize-like hooks for script globals in general...

Note that we _could_ hack in a JS-only object locking for the termination func, but the problem could pop up with some other language runtime too, so if we can fix it in general that would be nice.

Boris Zbarsky [:bzbarsky]

Comment 12

•

17 years ago

Blake, we'd need to hold a strong ref to the JSObjectHolder to do that.

Blake Kaplan (:mrbkap) (inactive)

Comment 13

•

17 years ago

(In reply to comment #11)
> But in this case, I think the outer window should perhaps tell the inner to
> forget its script global when it drops it from mInnerWindowHolders.  Or
> Do we have consumers that expect to access the inner window's
> JSObject when it's no longer the current inner?

The point of the termination function stuff is to allow scripts running in the currently dying inner window to run to completion, and they do expect to be able to access global variables, etc. which is why we're deferring the JS_ClearScope until later.

It seems to me that having a JS-specific holder in the TerminationFuncClosure correctly mirrors the mInnerWindowHolder that the outer window is dropping, which I think is causing this trouble. If Python or another language needs similar treatment, the outer window would end up having to hold onto more language-specific things anyway.

Boris Zbarsky [:bzbarsky]

Comment 14

•

17 years ago

The outer window's mInnerWindowHolders (on trunk) keeps the python stuff alive too.

Sounds like setting a termination func should perhaps just make a copy of the outer window's mInnerWindowHolders in the TerminationFuncClosure struct?  That would make sense to me...

Mats Palmgren (inactive)

Comment 15

•

17 years ago

(In reply to comment #14)
> Sounds like setting a termination func should perhaps just make a copy of the
> outer window's mInnerWindowHolders in the TerminationFuncClosure struct?

FWIW, I did a quick hack to test that and it fixed the crash (on trunk).

Johnny Stenback (:jst)

Comment 16

•

17 years ago

The odd thing is we've already got a finalize hook that clears out mJSObject and the right mScriptGlobals[]. Why that's not enough here is beyond me at this point.

Johnny Stenback (:jst)

Comment 17

•

17 years ago

Attached patch Trunk fix, restore WANT_FINALIZE for window objects. — Details — Splinter Review

DOM window objects need the finalize hook to clean up internal pointers etc, the call to that hook was removed when the XPCOM cycle collector changes landed (bug 333078). This restores the necessary calls to the finalize hook. This fixes the JS_ClearScope() crash on the trunk, and seems to make this testcase unable to crash a trunk build.

Attachment #256240 - Flags: superreview?(bzbarsky)

Attachment #256240 - Flags: review?(mrbkap)

Johnny Stenback (:jst)

Comment 18

•

17 years ago

The only crash I've seen with a branch build so far was with this on the top of the stack:

0x05362c08 in nsVoidArray::SafeElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:92
92          return mImpl->mArray[aIndex];
(gdb) bt 10
#0  0x05362c08 in nsVoidArray::SafeElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:92
#1  0x05362c84 in nsVoidArray::ElementAt (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:81
#2  0x05362cad in nsVoidArray::operator[] (this=0x8cc5b1c, aIndex=5664791)
    at ../../../../../dist/include/xpcom/nsVoidArray.h:95
#3  0x053b7779 in PresShell::ProcessReflowCommands (this=0x8cc5a68,
    aInterruptible=1) at ../../../mozilla/layout/base/nsPresShell.cpp:6895
#4  0x053b7bdd in PresShell::WillPaint (this=0x8cc5a68)
    at ../../../mozilla/layout/base/nsPresShell.cpp:6564
#5  0x057c025e in nsViewManager::FlushPendingInvalidates (this=0x8abb2e0)
    at ../../../mozilla/view/src/nsViewManager.cpp:4409
#6  0x057c414f in nsViewManager::EnableRefresh (this=0x8abb2e0, aUpdateFlags=0)
    at ../../../mozilla/view/src/nsViewManager.cpp:3442
#7  0x057c0474 in nsViewManager::EndUpdateViewBatch (this=0x8abb2e0,
    aUpdateFlags=0) at ../../../mozilla/view/src/nsViewManager.cpp:3487
#8  0x053b3569 in PresShell::InitialReflow (this=0x8cc5a68, aWidth=15015,
    aHeight=13410) at ../../../mozilla/layout/base/nsPresShell.cpp:2936
#9  0x055e4cca in nsContentSink::StartLayout (this=0x8c5c478, aIsFrameset=0)
    at ../../../../mozilla/content/base/src/nsContentSink.cpp:921

Dbaron, any clues?

And just to be clear, I have *not* seen a crash with a branch build yet that was anywhere near the JS_ClearScope code that the trunk was crashing in.

Blake Kaplan (:mrbkap) (inactive)

Updated

•

17 years ago

Attachment #256240 - Flags: review?(mrbkap) → review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

17 years ago

Comment on attachment 256240 [details] [diff] [review]
Trunk fix, restore WANT_FINALIZE for window objects.

sr=dbaron.  This probably explains some random crashes I've been seeing in the past few weks using trunk builds.

Attachment #256240 - Flags: superreview?(bzbarsky) → superreview+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

17 years ago

I tried getting a 1.8 branch build to crash (this morning, on x86_64 Linux, under valgrind), and couldn't get it to crash.

(In reply to comment #18)
> Dbaron, any clues?

One theory might be that the pres shell could have been destroyed in the middle of one of the methods that's currently on the stack.  However, that doesn't make sense since PresShell::Destroy nulls out mViewManager, and the mViewManager pointer is clearly good in frame #8.  Though perhaps nsViewManager::FlushPendingInvalidates called WillPaint on another pres shell that somehow flushed something causing this pres shell to be destroyed?  It's hard to guess; it would be great to see what valgrind says about a crash on the 1.8 branch.

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Blocks: cycle-collector

Boris Zbarsky [:bzbarsky]

Comment 21

•

17 years ago

sr=bzbarsky too, fwiw.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

17 years ago

I couldn't get a 1.8 branch build on Windows to crash either.  (That was a debug build, trying an optimized non-debug build now.)


Has anybody who's seen the crash on 1.8 branch builds submitted talkback reports?  They might be useful.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 23

•

17 years ago

Doesn't crash in my optimized Windows 1.8 branch build either.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 24

•

17 years ago

Aha!

Back on Linux (i686, this time), I tried the testcase in the 2.0.0.1 release three times, and it crashed all three times.  But I tried it again in the 2.0.0.2 release, downloaded and unpackaged the exact same way, again three times, and I didn't crash at all.

So I think the bug originally being reported here (the branch bug) was already fixed in the 2.0.0.2 release.

Martijn Wargers (dead)

Comment 25

•

17 years ago

Yeah, indeed this seems to be fixed on the branch.
I can see the crash in a 2007-01-15 branch build, but not anymore in a 2007-01-17 branch build:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=MOZILLA_1_8_BRANCH&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-01-15+04&maxdate=2007-01-17+09&cvsroot=%2Fcvsroot
Fixed by the fix for bug 364692? (that would mean that Firefox2.0.0 don't have this bug)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 26

•

17 years ago

Fixed on 1.8 branch between Linux nightly builds:
2007-01-16-04-mozilla1.8
and
2007-01-17-04-mozilla1.8

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 27

•

17 years ago

For what it's worth, the talkback reports on branch builds seem to pretty much match jst's stack in comment 18.  I sent 4 on Linux while binary searching: TB29654874, TB29654981, TB29655072, and TB29655112, and somebody else sent one on Windows yesterday, TB29623727.

Martijn Wargers (dead)

Comment 28

•

17 years ago

Ok, with a firefox 1.5.0.x build of 2006-11-04 I don't crash, but with a firefox 2 build of 2006-11-11, I do crash, so it regressed on branch between those dates, which is similar to bug 364692.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 29

•

17 years ago

Fixed on trunk as of yesterday evening (thanks, jst), so marking FIXED.

Fixed on 1.8 branch (for Firefox 2.0.0.2) by bug 364692 (confirmed by local backout on Linux x86_64 in addition to nightly builds per comment 25 and comment 26), so adding fixed1.8.1.2 keyword.

Worksforme on Linux nightly 2007-02-16-04-mozilla1.8.0 (Firefox 1.5.0.10pre), before bug 364692 landed on the 1.8.0 branch.

Also works for me, Linux 1.5.0.10 release (1.8.0 branch).

Clearing nominations for blocking1.8.1.3, blocking1.8.0.11, and blocking1.9.

Status: NEW → RESOLVED

Closed: 17 years ago

Flags: blocking1.9?

Flags: blocking1.8.1.3?

Flags: blocking1.8.0.11?

Keywords: fixed1.8.1.2

Resolution: --- → FIXED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

17 years ago

Depends on: 364692

Martijn Wargers (dead)

Comment 30

•

17 years ago

Well, I crash with a 1.8.0.x build of 2007-01-16, but not anymore with a 1.8.0.x build of 2007-01-18, so I'm also adding the fixed1.8.0.10 keyword.

Keywords: fixed1.8.0.10

Boris Zbarsky [:bzbarsky]

Comment 31

•

17 years ago

I'm curious; if the loop is length more like 25,000 instead of 250, does the number of inner windows actually increase to 25,000?  I see 250 inner windows with the testcase for sure...

Johnny Stenback (:jst)

Comment 32

•

17 years ago

This is still a problem in 1.5.0.10 though, right?

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 33

•

17 years ago

(In reply to comment #32)
> This is still a problem in 1.5.0.10 though, right?

Not according to comment 30. I also can't reproduce the problem in a Windows 1.5.0.10.

Johnny Stenback (:jst)

Comment 34

•

17 years ago

True indeed. I saw this in a 1.5 build, but it was out of date and updating made the crash go away.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 35

•

17 years ago

So for future clarity. There are two bugs involved in this bug:

1)
On branch (and possibly trunk) there was a problem that apparently bug 364692 fixed. This problem hasn't really been debugged, but backouts has confirmed that bug 364692 indeed does fix it.

2)
On trunk there was a recent regression that was debugged and fixed by the attachment in this bug. This problem did never exist on branches.