FTP use-after-free crash [@nsInputStreamPump::Cancel]

VERIFIED FIXED

Status

()

defect
--
critical
VERIFIED FIXED
7 years ago
3 years ago

People

(Reporter: posidron, Assigned: sworkman)

Tracking

({csectype-uaf, sec-critical})

Trunk
x86_64
macOS
Points:
---

Firefox Tracking Flags

(firefox17 wontfix, firefox18 wontfix, firefox19- wontfix, firefox20+ wontfix, firefox21+ verified, firefox22+ verified, firefox23+ verified, firefox-esr10 wontfix, firefox-esr1721+ fixed, b2g18+ fixed, b2g18-v1.0.1 wontfix)

Details

(Whiteboard: crash [leave open][adv-main21+][adv-esr1706+], )

Attachments

(5 attachments)

Posted file callstack
This is a reproducible use-after-free crash which occurred while testing FTP server responses. Haven't managed it to get an external testcase for this bug but it crashes reliably in the fuzzer after some time. (The flow of the requests/responses is included in the file of the callstack.)

The fake server accepts FTP connections from Firefox, some of the responses have no CRLF token in the response (FTP banner) when Firefox tries to handle upcoming connections it shows an alert() message of the previous response where we didn't provide a CRLF token, clicking that alert() box results in this crash. I am guessing that this seems to be a problem in handling async connections and their state.

In the callstack of the free'ed thread we see:

    #26 0x1051b0541 in nsFtpState::StopProcessing nsFtpConnectionThread.cpp:1866
    #27 0x1051b90e4 in nsFtpState::CloseWithStatus nsFtpConnectionThread.cpp:2153

therefore I have put this into the FTP category.

The following assertions are shown before the crash occurs:

###!!! ASSERTION: ftp read state mixup: 'Error', file /Users/cdiehl/Code/Mozilla/mc-asan-opt/netwerk/protocol/ftp/nsFtpConnectionThread.cpp, line 195

###!!! ASSERTION: Read buffer doesn't include response code: 'line.Length() > 4 && startNum', file /Users/cdiehl/Code/Mozilla/mc-asan-opt/netwerk/protocol/ftp/nsFtpConnectionThread.cpp, line 184


alloc: netwerk/base/src/nsInputStreamPump.cpp:64

nsRefPtr<nsInputStreamPump> pump = new nsInputStreamPump();


free: netwerk/base/src/nsInputStreamPump.cpp:141

// although this class can only be accessed from one thread at a time, we do
// allow its ownership to move from thread to thread, assuming the consumer
// understands the limitations of this.
NS_IMPL_THREADSAFE_ISUPPORTS3(nsInputStreamPump,
                              nsIRequest,
                              nsIInputStreamCallback,
                              nsIInputStreamPump)


re-use: netwerk/base/src/nsInputStreamPump.cpp:188

185:   // close input stream
186:    if (mAsyncStream) {
187:        mAsyncStream->CloseWithStatus(status);
188:        if (mSuspendCount == 0) 
189:            EnsureWaiting();


Tested on m-c changeset: 112115:56d3db021482
cdiehl:  is it likely that you'll be able to provide a reproducible testcase for this (can we figure out what fuzzer case causes it to crash)?  That would be very helpful here.
Jason: It's a custom fork of Peach. If you have a GitHub account and some time to configure it on your system then I could give you access to it. Feel free to ping me on IRC. 
Regarding the testcase - I could look at it again on the weekend till then I am bit busy with my Q4 goals.
Would the built-in NSPR logging help narrow down the traffic to the part near where it crashes?\

I'm going to guess that if it's a bug in our FTP code it's old and affects all branches.
Christoph: since this isn't going anywhere without more information I'm assigning to you to see if the logging trick in comment 3 might help.
Assignee: nobody → cdiehl
Christoph,

To get an NSPR log for FTP, follow instructions here

  https://developer.mozilla.org/en-US/docs/HTTP_Logging

except use for NSPR_LOG_MODULES use

  NSPR_LOG_MODULES=timestamp,nsFtp:5

A log could be very helpful here.
Posted file NSPR_LOG_MODULES
Attached is the NSPR_LOG of a session with such a crash.
(In reply to Jason Duell (:jduell) from comment #5)
> Christoph,
> 
> To get an NSPR log for FTP, follow instructions here
> 
>   https://developer.mozilla.org/en-US/docs/HTTP_Logging
> 
> except use for NSPR_LOG_MODULES use
> 
>   NSPR_LOG_MODULES=timestamp,nsFtp:5
> 
> A log could be very helpful here.

Any updates here? We have the requested log info as per comment 6. Is there something low risk we can do to resolve this issue in FF18 timeframe ?
I'm booked with B2G work right now.  cc-ing Josh in case he knows of necko folks who've got cycles for this.
meanwhile a reproducible test case (steps to reliably generate the crash) is definitely the most useful thing we could have here, if someone can get that.  If it's too hard to figure out which fuzzer case is blowing things up then a wireshark log of the crash would probably be a good start.  Or even just a recipe for a developer on how to get the fuzzer working.
(In reply to Jason Duell (:jduell) from comment #9)
> Or even just a recipe for a developer on how to get the fuzzer working.

Please take a look the callstack - at the end of the file. That is the whole session which produced the crash.
Between the assertion output and the sent and received bytes the attachments seem to have what jduell was asking for.
Assignee: cdiehl → jduell.mcbugs
(In reply to Daniel Veditz [:dveditz] from comment #11)
> Between the assertion output and the sent and received bytes the attachments
> seem to have what jduell was asking for.

Jason noted he didn't have time for this bug due to B2G. Sending over to Josh to re-assign.
Assignee: jduell.mcbugs → joshmoz
Flags: needinfo?(joshmoz)
I don't recall that conversation :)

I'm still planning to work on this very soon.  Josh, if michal or someone can take sooner, you can reassign.
Assignee: joshmoz → jduell.mcbugs
Flags: needinfo?(joshmoz)
Great, thanks Jason.
Given Firefox 19 ships in two weeks it's unrealistic to expect we can jump a fix (should we get one in time) into a late beta; guess we'll stop tracking 19 as a release target. Then again if it turns out to be something simple (null check, kungFuDeathGrip) we might be able to.
(In reply to Jason Duell (:jduell) from comment #13)
> I don't recall that conversation :)

comment 8? :D
cdiehl: can you have a peek at this file to see if you see anything obviously wrong with my reconstruction of the server session here?  I reconstructed it from the the info at the end of the callstack file, per comment 10.  

Alas, running 'nc -l 21 <file' doesn't cause my nightly build to crash (I do see a dialog pop up with lots of "AAAA" characters, but clicking "OK" doesn't crash as in comment 0).  Could be a timing thing, or maybe I don't have the exact same responses from the server.

The stack trace from the crash seems to indicate we're hitting refcount =0 for the nsInputStreamPump in nsInputStreamReadyEvent::Run(), but then the LoadGroup Cancel() -> nsBaseChannel::Cancel() is thinking it still has a valid pointer to the nsInputStreamPump.  So refcounting has gone awry somewhere.
This is kind of the right way. We need to trigger somehow two message boxes. The other message box needs to pop up immediately after the first one was closed by clicking "Ok", than the crash should occur.
As you see in the video, it's not necessarily needed to have two pop up boxes. I am a bit clueless to make a testcase here since it happens randomly or at least I can't determine a reproducible flow of the FTP commands. I am not even sure if it's a problem because of the FTP commands sent to the browser. Some how I think it's about an unfinished state (no QUIT command?) and initiating a new FTP connection to host.
Batch edit: Bugs marked status-b2g18: affected after 2/13 branching of v1.0.1 are now also status-b2g18-v1.0.1: affected
Christoph, 

Thanks for the video and info. I expect the best way forward is for me to install a copy of your peach framework and see if I can catch this in a debugger.  I know this doesn't happen reproducibly, but does it happen on a particular test?  About how often (10%, 1%, etc)?
It happens 100% of the time but not on a specific testcase. At least I wasn't able to determine a repeatable pattern.

I have added now a detailed callstack with all threads. If that doesn't help you and you still want to test it with the fuzzer, please let know your GitHub account.

Note, if you are going to debug it with an ASan build make sure to set a breakpoint at __asan_report_error.
Assignee: jduell.mcbugs → sworkman
I think I have located the problem in the code.

The ASan built I've been using locally identifies the freed memory as nsInputStreamPump:
-- it is allocated in nsBaseChannel::BeginPumpingData, which is called in nsFtpChannel::AsyncOpen (really, nsBaseChannel::AsyncOpen).
-- it is freed in nsBaseChannel::OnStopRequest.

The problem is with the alert box that pops up. It is called in the following code path:

nsBaseChannel::Cancel
nsInputStreamPump::Cancel
nsFtpState::CloseWithStatus
nsFtpState::StopProcessing
nsIPrompt::Alert.

At this point, it looks like the Alert causes an async wait, so these functions don't return until the alert window is dismissed. In the meantime, nsBaseChannel::OnStopRequest (among other functions) has been called, and the nsInputStreamPump is freed. Then, when the alert window is dismissed, we eventually get back to nsInputStreamPump::Cancel, where it tries to do a compare on one of its member vars, a use-after-free error which causes the ASan build to crash.

I notice in the code, in nsFtpConnectionThread.cpp, the home of nsFtpState, that nsIPrompt::Alert is called in two places, both of which have the following comment:

"XXX(darin): this code should not be dictating UI like this!"

but then just before it:

"check to see if the control status is bad.
 web shell wont throw an alert.  we better:"

There has to be a way to dispatch the error message to the shell without losing serialization here. Or some kind of async alert that means we don't block here. That is what I'll look into now.

Btw, thanks to Christoph for helping set up peach fuzzer and the Asan builds.
We'll mark this for a specific ESR release once this is resolved/fixed.
Moved the call to Alert to an async nsRunnable class. Dispatched on the main thread. This will allow nsFtpState::StopProcessing to run to completion, and thus allow nsInputStreamPump::Cancel to complete well before nsInputStreamPump is freed.

Christoph, can you check verify that the crash doesn't happen, please?
Attachment #725565 - Flags: review?(jduell.mcbugs)
Attachment #725565 - Flags: feedback?(cdiehl)
Comment on attachment 725565 [details] [diff] [review]
v1.0 Make call to nsIPromtp::Alert from nsFtpState::StopProcessing async

feedback+
I can no longer reproduce it.
Attachment #725565 - Flags: feedback?(cdiehl) → feedback+
Awesome, thanks Christoph!
Comment on attachment 725565 [details] [diff] [review]
v1.0 Make call to nsIPromtp::Alert from nsFtpState::StopProcessing async

Review of attachment 725565 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work Steve!

We need to hold off landing this until sec team gives us green light. I'll set the sec-approval ? flag for that.
Attachment #725565 - Flags: review?(jduell.mcbugs) → review+
Comment on attachment 725565 [details] [diff] [review]
v1.0 Make call to nsIPromtp::Alert from nsFtpState::StopProcessing async

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Not obvious w/o being able to see comments in bug.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No

Which older supported branches are affected by this flaw?  ancient.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  Easy and low risk.

How likely is this patch to cause regressions; how much testing does it need? No automated FTP tests :( but is now passing fuzzer!
Attachment #725565 - Flags: sec-approval?
Thanks, Jason!
I need comments from the Release Management team before I can give sec-approval+ for trunk. We're two weeks and one day from shipping. If we can't take this in beta and, potentially, ESR, we'll need to wait until 4/23 to take this (the middle of the next cycle).
Why don't we get some automated FTP tests then and take this in the first FF21 beta.
We would need to write (or find) a JS/Python FTP server and integrate it into the build system to get automated FTP tests.  That's a worthy but non-trivial goal.  Do we really want to block this secfix for that?  I wouldn't.

FTP is on life-support at this point: I'd rather see us remove support for the entire protocol than spend significant cycles developing test infrastructure.  Neither is realistic yet, probably, so I think just landing security fixes like this is the way to go.

That said, this isn't an obvious exploit, so it doesn't need to block anything IMO.  But the patch is ready, simple, and unlikely to be improved.  Let's just land it?
Flags: needinfo?(lsblakk)
To be clear, I don't have a strong opinion on whether this should land now or in ff21. I just don't want to block on automated testing.
So, given that we're unlikely to have automated tests here, what are we going to do to get this checked in and when?
It's passing the fuzzer that exposed the problem, and I assume Steve has hand-tested that regular FTP browsing/downloading works.  (needinfo-ing him to confirm).  Other than that, I think we just need a green flag to land (and on which branches) from security/release.
Flags: needinfo?(lsblakk) → needinfo?(sworkman)
Comment on attachment 725565 [details] [diff] [review]
v1.0 Make call to nsIPromtp::Alert from nsFtpState::StopProcessing async

Sec-approval+ to land after 4/2 (when we branch and ship).
Attachment #725565 - Flags: sec-approval? → sec-approval+
Flags: needinfo?(sworkman)
For the record, here is a list of the tests I've run manually:

- get list of files available remotely
- download a file
- download and open a txt file, an HTML file, and a gzip file
- cancel a download
- resume a download
- sftp connection and login

These are basic, but all is working fine. I'll land the patch after 4/2, as requested.
Today is 4/3, so landing on inbound:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/56c37ce3f358
Status: NEW → ASSIGNED
Whiteboard: crash → crash [leave open]
Comment on attachment 725565 [details] [diff] [review]
v1.0 Make call to nsIPromtp::Alert from nsFtpState::StopProcessing async

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment] B2G
Bug caused by (feature/regressing bug #):
  FTP error handling code.
User impact if declined:
  Use-after-free exploits.
Testing completed:
  Yes - manual testing, comment 36 and comment 38.
Risk to taking this patch (and alternatives if risky):
  Low risk.
String or UUID changes made by this patch:
  None.

[Approval Request Comment] ESR
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  --sec-crit, so ignoring these questions.
User impact if declined: 
Fix Landed on Version:
Risk to taking this patch (and alternatives if risky): 
String or UUID changes made by this patch: 

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

[Approval Request Comment] Aurora, Beta.
Bug caused by (feature/regressing bug #): 
  FTP error handling code.
User impact if declined:
  Use-after-free exploits
Testing completed (on m-c, etc.):
  Landed on inbound today.
Risk to taking this patch (and alternatives if risky):
  Low risk.
String or IDL/UUID changes made by this patch:
  None.
Attachment #725565 - Flags: approval-mozilla-esr17?
Attachment #725565 - Flags: approval-mozilla-beta?
Attachment #725565 - Flags: approval-mozilla-b2g18?
Attachment #725565 - Flags: approval-mozilla-aurora?
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Attachment #725565 - Flags: approval-mozilla-esr17?
Attachment #725565 - Flags: approval-mozilla-esr17+
Attachment #725565 - Flags: approval-mozilla-beta?
Attachment #725565 - Flags: approval-mozilla-beta+
Attachment #725565 - Flags: approval-mozilla-b2g18?
Attachment #725565 - Flags: approval-mozilla-b2g18+
Attachment #725565 - Flags: approval-mozilla-aurora?
Attachment #725565 - Flags: approval-mozilla-aurora+
Lukas, status-b2g18-v1.0.1 is affected - I'm not sure if there's an uplift process here - should I land on that branch as well?
Flags: needinfo?(lsblakk)
Whiteboard: crash [leave open] → crash [leave open][adv-main21+][adv-main1706+]
Whiteboard: crash [leave open][adv-main21+][adv-main1706+] → crash [leave open][adv-main21+][adv-esr1706+]
Based on comment 26 and comment 38, QA is going to call this verified.
(In reply to Matt Wobensmith from comment #44)
> Based on comment 26 and comment 38, QA is going to call this verified.

Marking this verified across the board based on this comment.
We're not taking fixes to v1.0.1 anymore, just 1.1 so clearing this.
Flags: needinfo?(lsblakk)
Group: core-security
You need to log in before you can comment on or make changes to this bug.