Closed Bug 460706 Opened 16 years ago Closed 16 years ago

Crash [@ little2_updatePosition] with xmlhttprequest and large xml file

Categories

(Core :: XML, defect, P2)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED
mozilla1.9.1b3

People

(Reporter: martijn.martijn, Assigned: mrbkap)

References

Details

(5 keywords, Whiteboard: [sg:critical?] post 1.8-branch)

Crash Data

Attachments

(2 files)

Attached file testcase
See testcase, which crashes current trunk build and Firefox 3 on load for me.
This might be related to bug 314660, although the testcase doesn't crash in Firefox 2.

http://crash-stats.mozilla.com/report/index/cc37c568-9e3c-11dd-835e-001cc45a2ce4?p=1
0  	xul.dll  	little2_updatePosition  	 parser/expat/lib/xmltok_impl.c:1745
1 	xul.dll 	MOZ_XML_ResumeParser 	parser/expat/lib/xmlparse.c:1787
2 	xul.dll 	xul.dll@0x2ab720
Flags: blocking1.9.1?
Based on testcase this looks very scary, quite possibly exploitable.

Peterv: Can you look into this since this looks like an expat bug?
Group: core-security
Whiteboard: [sg:critical?]
Peter, see above comment.
Assignee: nobody → peterv
Flags: blocking1.9.1? → blocking1.9.1+
On second thought, we won't block the release on this. Peter, please do work on this as soon as you have time to though.
Flags: blocking1.9.1+ → blocking1.9.1-
Can't reproduce on OS X, might be Windows only?
Actually, nevermind. Need to make sure to load as XML.
Doesn't look like an Expat bug.

Looks like the first script will block the parser, we'll evaluate the script (javascript:// URL), unblock the parser and post a ContinueInterruptedParsing event. Then we'll evaluate the next script which does a synchronous XMLHttpRequest, which spins the event loop and we'll process the ContinueInterruptedParsing event. This will make us parse the original document to the end, but the Expat code that's on the stack (waiting for the script evaluation to end) doesn't know that. It seems odd that we post a ContinueInterruptedParsing event when synchronously evaluating a script (javascript:// URL).

Jonas or Boris, feel free to chime in :-).
Wait, the javascript url isn't synchronously being evaluated, is it? It's when we execute the second <script> that we will re-enter expat in unholy ways. I'm not sure why the javascript:// thing is needed at all.

I think that mrbkaps patch in bug 444322 might actually fix this bug, unless there is further interaction with the javascript url.
(In reply to comment #7)
> Wait, the javascript url isn't synchronously being evaluated, is it?

You're right, need some more debugging.
We get a ScriptAvailable for the javascript:// URL with aResult==NS_ERROR_DOM_RETVAL_UNDEFINED. ScriptAvailable ublocks the parser and posts a ContinueInterruptedParsing event. Not 100% sure yet about the next part, looks like we get more data at that point (there's a nsInputStreamReadyEvent in the queue), so we parse some more. That makes us evaluate the inline script, which spins the event queue (XMLHttpRequest.send) and we run the ContinueInterruptedParsing event which was after the nsInputStreamReadyEvent in the queue.

It seems wrong to enable the parser and then post the event. Making the event enable the parser if the script load failed does fix this bug, but I need to think a bit more about it. I wonder if we could hit the same problem with a successful script load, but in that case we probably need to enable the parser from ScriptAvailable (for document.write?). Maybe we should block the parser in ScriptEvaluated then and let the event unblock there too?
Hmm, we seem to get NS_ERROR_DOM_RETVAL_UNDEFINED because "WARNING: No principal to execute JS with: file /Users/peterv/source/mozilla/central/mozilla/dom/src/jsurl/nsJSProtocolHandler.cpp, line 181"
Replacing the javascript:// URL with data:text/javascript, still gets very confused.
Note that the fix for bug 444322 will make us not parse anything while waiting for the XHR to finish loading. Maybe that is enough here?
Depends on: 444322
Yeah, looks like this will be fixed by bug 444322.
I still crash with the testcase, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20081212 Minefield/3.2a1pre
But I had to test the testcase locally, online it caused a hang for me.
Remarking as blocking then :(

Martijn, you were using a build with bug 444322 fixed right?
Flags: blocking1.9.1- → blocking1.9.1+
I tested with the 20081212 build, I think it has the fix for 444322 in, afaict. Anyway, I'll try to retest tomorrow.
Also crashes, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20081214 Minefield/3.2a1pre
I had thought this was impossible, but it wasn't. What happens is that we process the first <script src />, then the content sink does a ContinueInterruptedParsingAsync. This returns to the event loop, hoping to get the newly posted event. But we've already received an OnDataAvailable, so we enter the parser, which does a synchronous XMLHttpRequest. This processes events, one of which is the "continue your interrupted parsing" event, re-entering the parser, and we lose.
Assignee: peterv → mrbkap
Status: NEW → ASSIGNED
Attachment #353118 - Flags: superreview?(jonas)
Attachment #353118 - Flags: review?(jonas)
Attachment #353118 - Flags: superreview?(jonas)
Attachment #353118 - Flags: superreview+
Attachment #353118 - Flags: review?(jonas)
Attachment #353118 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/8159ae6a8046
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.6+
Given the now-public crashtest we'd better make sure we land the fix on 3.0.x sooner rather than later.
At least some part of this was backed out by Serge.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #20)
> http://hg.mozilla.org/mozilla-central/rev/9990da98d7b7 contains a crashtest,
> too.

This test crashes randomly, at least on Linux and Windows...

Backed out:
http://hg.mozilla.org/mozilla-central/rev/8a97d8909df3
http://hg.mozilla.org/mozilla-central/rev/050ae62b7d9d

See for example:
{
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229740533.1229744280.21191.gz
Linux mozilla-central moz2-linux-slave08 dep unit test on 2008/12/19 18:35:33

REFTEST TEST-PASS | file:///builds/slave/trunk_linux-8/build/parser/htmlparser/tests/crashtests/423373-1.html | (LOAD ONLY)
command timed out: 1200 seconds without output, killing pid 15188
}
Whiteboard: [sg:critical?] → [sg:critical?][needs new patch?]
We placed this on our Top Security Bugs list... can we get a new patch as soon as possible?  Thanks!
Not going to get a baked patch in time for 3.0.6, aim for next time.
Flags: blocking1.9.0.6+ → blocking1.9.0.7+
I'm assuming we don't want this on 1.8.1 since it doesn't crash there. Blake, correct me if I'm wrong.
Flags: wanted1.8.1.x-
Priority: -- → P2
Blake: Where are you on a patch for this?
I have no clue why the crashtest crashes on tinderbox. Nobody else can seem to make it crash, either. IMO we should just land this patch on all of the branches, since it does fix the crash that I *was* able to reproduce.
Let's try re-landing it on mozilla-central during a quiet period and seeing if it sticks this time. If not, we need to either disable that test and find a better one. If this stick on m-c, we'll approve it for 1.9.0.7, given the relative safety of the patch.
I tried again. It did not stick.
I think we've got to push this out until we figure out the test situation. When you've tried reproducing the crash (with a fixed build) do you run all the tests before it or just this one test? It could be the previous tests are creating some bad state that this test is merely exposing.
Status: REOPENED → NEW
Flags: blocking1.9.0.7+ → blocking1.9.0.8+
Today I spent a couple of hours running all of the tests. I didn't crash in my test, but I did find bug 476975. Still no clue what's going on with my test.
Comment on attachment 353118 [details] [diff] [review]
Fix
[Checkin: Comment 19]

This is causing bug 478277 on the 1.9.0 branch. It has been baking on trunk for a while with no ill effects. I think we should get the patch in and deal with the test stupidity later.
Attachment #353118 - Flags: approval1.9.1?
Attachment #353118 - Flags: approval1.9.0.7?
mrbkap wants this on the respin shortlist, nom-ing for 1.9.0.7 should that come to pass.
Flags: blocking1.9.0.7?
Are we just waiting on approval here?
Pretty much (I'm asking for explicit approval for the 1.9.1 branch because of the confusion with the crashtest). This bug has been FIXED (as far as I know) for a while, but it's been in-testsuite? because the crashtest apparently crashes on tinderbox and nowhere else.
Attachment #353118 - Attachment description: Fix → Fix [Checkin: Comment 19]
mrbkap: you get your re-spin ride-along wish. Please land on the 1.8 branch and add the fixed1.9.0.8 keyword, and on the GECKO190_20090217_RELBRANCH and add fixed1.9.0.7
Flags: blocking1.9.0.7? → blocking1.9.0.7+
I meant 1.9.0 branch (CVS HEAD) with the fixed1.9.0.8 keyword, of course
Attachment #353118 - Flags: approval1.9.0.7? → approval1.9.0.7+
Comment on attachment 353118 [details] [diff] [review]
Fix
[Checkin: Comment 19]

Approved for 1.9.0.7, a=dveditz for release-drivers
(In reply to comment #20)
> http://hg.mozilla.org/mozilla-central/rev/9990da98d7b7 contains a crashtest,
> too.

Could you attach this test patch here, ftr.

(In reply to comment #23)
> This test crashes randomly, at least on Linux and Windows...

Fwiw, I wondered about:
does this test patch crash without the code patch ?
Is the fix not enough ? Is it wrong ?
Is the test wrong ? Does it trigger some other bug/crash ?
Flags: in-testsuite?
Target Milestone: --- → mozilla1.9.2a1
(In reply to comment #40)
> Could you attach this test patch here, ftr.

Will do, shortly (it's just the testcase in this bug in crashtest form).

> does this test patch crash without the code patch ?

Yes.

> Is the fix not enough ? Is it wrong ?

That would be the million dollar question. I haven't been able to reproduce the crash that the tinderboxes have been seeing, and the fix here is correct as far as it goes so it's really hard to answer definitively.

> Is the test wrong ? Does it trigger some other bug/crash ?

This is my suspicion, but until tinderbox cna report the stack for its crash, we can't answer it.
Fix checked in everywhere. I'm going to file a new bug on getting the crashtest into the tree.
Status: NEW → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Whiteboard: [sg:critical?][needs new patch?] → [sg:critical?]
(In reply to comment #41)
> (In reply to comment #40)
> > Could you attach this test patch here, ftr.
> 
> Will do, shortly (it's just the testcase in this bug in crashtest form).

Bug 478978 contains the testcase.
Target Milestone: mozilla1.9.2a1 → ---
Target Milestone: --- → mozilla1.9.1b3
Attachment #353118 - Flags: approval1.9.1?
Crashes 1.9.0.6 cleanly on Windows XP. On the 1.9.0 nightly from last night (since I'm waiting for Windows 1.9.0.7 bits), it takes my CPU to 98% and sits there for more than five minutes. 

There is no crash though. Does this count as a "fix" since it doesn't crash?

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7pre) Gecko/2009021705 GranParadiso/3.0.7pre (.NET CLR 3.5.30729)
Yeah, I've noticed the hang too, I filed bug 460706 for it.
This bug was indeed about the crash, so this indeed counts as a fix, so marking verified in the corresponding areas.
Status: RESOLVED → VERIFIED
I let it run for a couple of hours too. It pegged my CPU and just sat there the whole time.
(In reply to comment #46)
> Yeah, I've noticed the hang too, I filed bug 460706 for it.

Sorry, I meant I filed bug 479499 for it.
Whiteboard: [sg:critical?] → [sg:critical?] post 1.8-branch
I relanded the crashtest since bug 479499 is now fixed.
Flags: in-testsuite? → in-testsuite+
we've found a regression caused by this bug (verified by bc).

regression range was February 17-18th and is in 1.9.1.

Followed a separate bug 482293 on the regression.
sorry for the spamming, meant to say that the regression is in 1.9.1, 1.9.0 and trunk.
(In reply to comment #1)
> Based on testcase this looks very scary, quite possibly exploitable.

How can this be exploited? I don't see how the original crash or fix is security related.  Rather the problem is basically threading related (as is 482293).
(In reply to comment #52)
> How can this be exploited? I don't see how the original crash or fix is
> security related.  Rather the problem is basically threading related (as is
> 482293).

The concern is that since we're re-entering expat, we could end up stomping over free'd memory when we unwind.
No crash, but I do receive the crash on 1.9.1:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2pre) Gecko/20090723 Shiretoko/3.5.2pre (.NET CLR 3.5.30729) ID:20090723050841
Sigh, thanks for the heads up, ss. The second "crash" is supposed to be "hang"
Group: core-security
Crash Signature: [@ little2_updatePosition]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: