Closed Bug 1328642 Opened 3 years ago Closed 3 years ago

Crash in FileClose

Categories

(Core :: DOM: Workers, defect, critical)

50 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox-esr45 51+ fixed
firefox50 --- wontfix
firefox51 --- fixed
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: marco, Assigned: bkelly)

References

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main51+][adv-esr45.7+])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-13855a66-0e50-4177-83a0-2fdf32170104.
=============================================================

(40.43% in signature vs 00.65% overall) address = 0xffffffffe5e5e5e5
These sorts of crashes should be filed as security bugs.
Group: core-security
I added the csectype-uaf keyword but forgot to mark the bug as core-security :(
Ben, can you look at these crashes and see if there's something we can do? Thanks.
No crash reports in FF53 yet.  Some small hope its already been fixed.

So far nothing is jumping out at me unfortunately.  I might need to push some diagnostic asserts.
One weird thing is that the reports started Dec 28, 2016, but across a variety of releases.  Zero reports before Dec 28, but now we are getting them on FF47 to FF52.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Also, some of the crashes don't go through dom/cache.  It appears to be an issue with nsFileInputStream closing files on windows:

https://crash-stats.mozilla.com/report/index/91f698ca-c487-4573-bc49-bc5522161229
(In reply to Ben Kelly [:bkelly] from comment #5)
> One weird thing is that the reports started Dec 28, 2016, but across a
> variety of releases.  Zero reports before Dec 28, but now we are getting
> them on FF47 to FF52.

Which reports started on Dec 28?

There are reports earlier than Dec 28 if you restrict the search to product=Firefox and address contains 'e5':
https://crash-stats.mozilla.com/signature/?product=Firefox&address=~e5&signature=FileClose&date=%3E%3D2016-10-04T19%3A29%3A00.000Z&date=%3C2017-01-04T19%3A29%3A00.000Z#graphs

Did you restrict the search with other parameters?
I think we might need to push some diagnostic assertions to try to figure out whats going on.
(In reply to Marco Castelluccio [:marco] from comment #7)
> (In reply to Ben Kelly [:bkelly] from comment #5)
> > One weird thing is that the reports started Dec 28, 2016, but across a
> > variety of releases.  Zero reports before Dec 28, but now we are getting
> > them on FF47 to FF52.
> 
> Which reports started on Dec 28?

I just clicked on "more reports" link from the crash report in comment 0.  Sorry for my confusion.
BTW, my theory is that the CacheStreamControlChild is being UAF'd in some way.  Bug 1328686 will add some assertions that will try to show if this is the case.  Not linking the bug due to security issues, though.
Keywords: sec-high
I push some diagnostic asserts to m-i in bug 1328686.  I'll uplift them to aurora next week.
I was looking through FileClose crash comments and noticed this one referenced weather.com:

https://crash-stats.mozilla.com/report/index/7563555a-a6fd-48a9-b15c-101b82160819

Bug 1329989 is a UAF bug that is triggered by bughunter on weather.com.  Perhaps the issue over there is stomping on memory leading to this bug.
See Also: → 1329989
Duplicate of this bug: 1329447
Group: core-security → dom-core-security
Unfortunately this is still happening in FF51.0b14 which has the fix for bug 1329989:

https://crash-stats.mozilla.com/report/index/568dc20c-bee3-4680-a80b-d20ea2170116
I have a new theory:

1) The Cache API ReadStream is basically a wrapper around nsFileInputStream
2) There are places we read and close the ReadStream from an IO thread like StreamTransportService, etc.
3) Cache API also tries to close the nsFileInputStream on its owning thread under some circumstances
4) nsFileInputStream does not provide any cross-thread protection for its underlying mFD

So if we have a Cache ReadStream being read in a worker thread, we can get a situation where the stream is closed on the IO thread right when the worker thread is terminated and also tries to close the stream.  This could result in one of the two closes operating on garbage data.
In particular, we have a narrow window where nsFileStreamBase's mFD is non-nullptr, but has already have PR_FreeFileDesc() called on it.  This internally free's its memory.

I'm going to see if I can add some locking to nsFileStream.  The tricky part is to avoid blocking the main thread.
Attached patch patch (obsolete) — Splinter Review
In the end I decided it made more sense to put the locking in the ReadStream.  This Cache API class is purposely designed to be accessed on an IO thread while simultaneously being closed on the owning thread.  Therefore the locking belongs in this object.  (It also make this patch less risky than changing all file streams.)

I decided to make the mutex clearly scoped to only the mStream and mSnappyStream objects.  I did not try to pull in any of the other ReadStream state into the mutex.  It seemed safer to leave the current runnables and atomic<> variables used to handle that state.  Also, I wanted to minimize the chance for a recursive lock or other unexpected contention.

Overall this lock should have very little perf impact.  Typically there should only be one thread accessing the stream.  We only need to potentially block in the case where shutdown or a worker thread is forcing the stream to close.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd58c73fc83cff3d7fb02c67df7f4c57ece56f13
Attachment #8827654 - Flags: review?(bugmail)
Oh, I also fixed a stupid bug where mHasEverBeenRead was not be initialized.
Comment on attachment 8827654 [details] [diff] [review]
patch

Review of attachment 8827654 [details] [diff] [review]:
-----------------------------------------------------------------

I also verified that this covers all uses of mStream/mSnappyStream in the file.  (Noting that of course NoteClosedRunnable and ForgetRunnable have their own logically separate mStream which refers to a ReadStream::Inner.)

::: dom/cache/ReadStream.cpp
@@ +285,5 @@
>    // stream ops can happen on any thread
> +  nsresult rv = NS_OK;
> +  {
> +    MutexAutoLock lock(mMutex);
> +    rv = mSnappyStream->Close();

Affirming that I see the change from mStream to mSnappyStream and that this makes sense and is a semantic improvement.
Attachment #8827654 - Flags: review?(bugmail) → review+
Updated with commit message.
Attachment #8827654 - Attachment is obsolete: true
Attachment #8827689 - Flags: review+
Attachment #8827689 - Attachment is patch: true
Comment on attachment 8827689 [details] [diff] [review]
Protect multi-thread stream access with a mutex. r=asuth

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

The patch is clearly about protecting shared data from multi-threaded access.  With digging you could probably figure out mFD is a prime target here.

That being said, you also would need to figure out how to force two threads to access the stream simultaneously from content script.

You can get the ReadStream::Close() called on the owning thread by creating it in a worker thread and calling worker.terminate().  That is not too hard.

The harder part is getting the other thread to access it simultaneously.  You can try using Response.text(), although that does not call Close() automatically so may not trigger there.  You could try using ServiceWorker respondWith(), but that would be harder since you cannot then use worker.terminate().

Its probably possible, but it would take some effort and would still be quite racy.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

The comments and commit message mention protecting against cross-thread access.  That pretty clear from the code, though, since its adding a Mutex.

Which older supported branches are affected by this flaw?

All branches with Cache API enabled.  This includes 45ESR even though service workers themselves are disabled there.  It may be harder to trigger without service workers to use with it, though.

If not all supported branches, which bug introduced the flaw?

All branches basically.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

This code has not changed recently and should be relatively easy to backport.

How likely is this patch to cause regressions; how much testing does it need?

I think the risk is low.  I scoped the mutex locks to be as small as possible.  This code also has a good test suite and it passed a try build.

Also note that we won't really know if this fixes the problem until we get the patch into a high volume branch.
Attachment #8827689 - Flags: sec-approval?
Attachment #8827689 - Flags: approval-mozilla-release?
Attachment #8827689 - Flags: approval-mozilla-esr45?
Attachment #8827689 - Flags: approval-mozilla-beta?
Attachment #8827689 - Flags: approval-mozilla-aurora?
Attachment #8827689 - Flags: sec-approval? → sec-approval+
Comment on attachment 8827689 [details] [diff] [review]
Protect multi-thread stream access with a mutex. r=asuth

Another fix for service workers crash, let's take it.
Attachment #8827689 - Flags: approval-mozilla-release?
Attachment #8827689 - Flags: approval-mozilla-release+
Attachment #8827689 - Flags: approval-mozilla-esr45?
Attachment #8827689 - Flags: approval-mozilla-esr45+
Attachment #8827689 - Flags: approval-mozilla-beta?
Attachment #8827689 - Flags: approval-mozilla-beta+
Attachment #8827689 - Flags: approval-mozilla-aurora?
Attachment #8827689 - Flags: approval-mozilla-aurora+
I'll land this in the next hour or two.  Sorry I didn't see the flag approvals last night.
I had to rebase around my MOZ_DIAGNOSTIC_ASSERT changes to ReadStream.  I'm doing a local build and test before landing.
Apparently there are conflicts on ESR also.  I am checking it out to do the landing there as well.
https://hg.mozilla.org/mozilla-central/rev/c6e104aef85f
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Whiteboard: [adv-main51+][adv-esr45.7+]
We still need to watch crash stats to verify this fixed it.
Group: dom-core-security → core-security-release
We've started to get crash reports for the new RC, but I don't see any instances of this crash.  So far so good.
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.