Closed Bug 1466577 Opened 2 years ago Closed Last year

Crash in arena_dalloc | Allocator<T>::free | MOZ_Z_inflateEnd

Categories

(Core :: Networking: WebSockets, defect, P3, critical)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 62+ fixed
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 + fixed
firefox63 + fixed

People

(Reporter: mats, Assigned: michal)

References

Details

(4 keywords, Whiteboard: [necko-triaged][post-critsmash-triage][adv-main62+][adv-esr60.2+])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is
report bp-d2714192-2567-45ce-af04-afb540180602.
=============================================================


MOZ_CRASH Reason: 	MOZ_RELEASE_ASSERT((mapelm->bits & ((size_t)0x01U)) != 0) (Double-free?)

All crashes are on Android.


Top 10 frames of crashing thread:

0 libmozglue.so arena_dalloc memory/build/mozjemalloc.cpp:3502
1 libmozglue.so Allocator<MozJemallocBase>::free memory/build/malloc_decls.h:40
2 libmozglue.so MOZ_Z_inflateEnd modules/zlib/src/inflate.c:1285
3 libxul.so mozilla::net::PMCECompression::~PMCECompression netwerk/protocol/websocket/WebSocketChannel.cpp:811
4 libxul.so nsAutoPtr<mozilla::net::PMCECompression>::assign xpcom/base/nsAutoPtr.h:45
5 libxul.so mozilla::net::WebSocketChannel::StopSession xpcom/base/nsAutoPtr.h:129
6 libxul.so wcsrtombs 
7 libxul.so nsTimerImpl::InitCommon xpcom/threads/nsTimerImpl.cpp:416
8 libmozglue.so mozilla::detail::MutexImpl::unlock mozglue/misc/Mutex_posix.cpp:181
9 libxul.so nsTimerImpl::InitHighResolutionWithCallback xpcom/threads/nsTimerImpl.cpp:488

=============================================================
Group: core-security → network-core-security
By itself this doesn't appear exploitable (release assert), but it's definitely new in 60. Would it be better to unhide and treat as a topcrash/stability bug?
Flags: needinfo?(mats)
If it's a double-free then I guess that it'd be kinda random
if the condition is true or not?  I haven't analyzed the code though.

I'm not sure unhiding it would get more people that knows this code
looking at it.  We might as well CC everyone we think might be able
to help.
Flags: needinfo?(mats)
Assignee: nobody → michal.novotny
Priority: -- → P2
Whiteboard: [necko-triaged]
We call inflateEnd() only in the destructor and only if mInflater was successfully initialized with inflateInit2(), so I don't see how we could call it twice.
Assignee: michal.novotny → nobody
Priority: P2 → P3
Perhaps adding canaries in the relevant code might help pin down the bug?
https://searchfox.org/mozilla-central/source/mfbt/Poison.h#82

Before and after the regular members in PMCECompression for example:
https://searchfox.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp#946
then sprinkle Check() calls on those canaries liberally in the relevant code.

Before WebSocketChannel::mPMCECompressor too maybe?
Flags: needinfo?(michal.novotny)
Attached patch patch v1 (obsolete) — Splinter Review
I've added the checks to the code path where it's crashing according to crash-stats reports.
Assignee: nobody → michal.novotny
Flags: needinfo?(michal.novotny)
Attachment #8986461 - Flags: review?(mcmanus)
There seems to be race condition when stopping session on the socket thread using AbortSession() and calling Close() on the main thread. When the socket thread nulls out mTransaction we can call StopSession() on the main thread from Close(). Comment in StopSession() says that it shouldn't be called after StartWebsocketData() was called. Jason, Patrick, is the purpose of mTransaction check at https://searchfox.org/mozilla-central/rev/8aaaaf0f98cfec766f8be28c390bc0dcfbb0ea03/netwerk/protocol/websocket/WebSocketChannel.cpp#2354 to check that StartWebsocketData() hasn't been called yet?

I think the right thing to do here is to synchronize this code using a lock (having several atomic bools doesn't seem to be enough) and
- call StopSession() if StartWebsocketData() hasn't been called yet
- send fin message if websocket channel is alive 
- do nothing and return a failure if StopSession() was already called on the socket thread
Flags: needinfo?(mcmanus)
Flags: needinfo?(jduell.mcbugs)
I'm not sure of the answer to the question you asked.. your approach seems reasonable - if you need more eyes I would suggest looping nick in.
Flags: needinfo?(mcmanus)
The new lock protects access to mStopped and mDataStarted. This ensures that Close() won't call StopSession() once the websocket data started (StopSession() is then expected to be called only on socket thread) and also that data cannot start at the same time when Close() is being executed on the main thread.

Flag mStopped is now set iff the session is really stopped. It used to be set together with mRequestedClose in AbortSession() when fin packet was queued, which was IMO wrong.

Redundant mTCPClosed=true were removed from OnInputStreamReady() because the flag is set at the beginning of AbortSession().
Attachment #8986461 - Attachment is obsolete: true
Attachment #8986461 - Flags: review?(mcmanus)
Flags: needinfo?(jduell.mcbugs)
Attachment #8991829 - Flags: review?(hurley)
Comment on attachment 8991829 [details] [diff] [review]
diff-bug1466577-v2.patch

Review of attachment 8991829 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good overall, but I have a couple questions/concerns to address before landing.

::: netwerk/protocol/websocket/WebSocketChannel.cpp
@@ +2412,4 @@
>    // normally this should be called on socket thread, but it is ok to call it
>    // from OnStartRequest before the socket thread machine has gotten underway
>  
> +  MOZ_ASSERT(mStopped);

I know this only happens on debug builds, but should we (debug-only) lock the mutex before accessing? This is the only case of unprotected access, and it just has a bad smell.

@@ -3612,5 @@
> -  mRequestedClose = true;
> -  mScriptCloseReason = reason;
> -  mScriptCloseCode = code;
> -
> -  if (!mTransport || mConnecting != NOT_CONNECTING) {

Can you explain why this condition was removed? The log lines (which you left in) appear to be correct only if !mTransport, but I'm not seeing anything that ensures that's the case. If we can assume it to be so, could you add a comment explaining why?
Attachment #8991829 - Flags: review?(hurley) → review+
(In reply to Nicholas Hurley [:nwgh][:hurley] from comment #10)
> Comment on attachment 8991829 [details] [diff] [review]
> diff-bug1466577-v2.patch
> 
> Review of attachment 8991829 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks pretty good overall, but I have a couple questions/concerns to address
> before landing.
> 
> ::: netwerk/protocol/websocket/WebSocketChannel.cpp
> @@ +2412,4 @@
> >    // normally this should be called on socket thread, but it is ok to call it
> >    // from OnStartRequest before the socket thread machine has gotten underway
> >  
> > +  MOZ_ASSERT(mStopped);
> 
> I know this only happens on debug builds, but should we (debug-only) lock
> the mutex before accessing? This is the only case of unprotected access, and
> it just has a bad smell.

It's not needed because mStopped is atomic. It's just a sanity check that DoStopSession() is called properly after setting mStopped to true and this is done under a lock due to synchronization of StopSession(), AbortSession() and Close() methods. Once set to true, it never reverts back, so no lock is needed here.


> @@ -3612,5 @@
> > -  mRequestedClose = true;
> > -  mScriptCloseReason = reason;
> > -  mScriptCloseCode = code;
> > -
> > -  if (!mTransport || mConnecting != NOT_CONNECTING) {
> 
> Can you explain why this condition was removed? The log lines (which you
> left in) appear to be correct only if !mTransport, but I'm not seeing
> anything that ensures that's the case. If we can assume it to be so, could
> you add a comment explaining why?

The condition has changed to "if (mDataStarted)" above in the patch. It's equivalent, but IMO easier to understand. As described in comment #7, if the data already started, try to send fin packet, otherwise call DoStopSession().
Comment on attachment 8991829 [details] [diff] [review]
diff-bug1466577-v2.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Close() method must be called by the script at the same time when server aborts the connection. It's probably doable, but the result is double free of an object that performs compression. I don't think it can be easily used for anything else than crashing the browser.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
It's obvious that the patch solves a race condition, but as I noted above, I don't think this is really a security problem.

Which older supported branches are affected by this flaw?
Race condition in Websocket closing was probably from the beginning. This particular crash is possible since landing bug 792831.

If not all supported branches, which bug introduced the flaw?
See above.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
It shouldn't be hard to backport the patch if it's not applicable.

How likely is this patch to cause regressions; how much testing does it need?
I don't expect regressions but it's hard to say. Try push looked good.
Attachment #8991829 - Flags: sec-approval?
Comment on attachment 8991829 [details] [diff] [review]
diff-bug1466577-v2.patch

[Triage Comment]

sec-approval=dveditz

a=dveditz for beta and esr60. Please land on branches soon so we have bake time.
Attachment #8991829 - Flags: sec-approval?
Attachment #8991829 - Flags: sec-approval+
Attachment #8991829 - Flags: approval-mozilla-esr60+
Attachment #8991829 - Flags: approval-mozilla-beta+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/5ea7bb29e22e
Group: network-core-security → core-security-release
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Flags: qe-verify-
Whiteboard: [necko-triaged] → [necko-triaged][post-critsmash-triage]
This needs a rebased patch for ESR60 uplift.
Flags: needinfo?(aryx.bugmail)
Whiteboard: [necko-triaged][post-critsmash-triage] → [necko-triaged][post-critsmash-triage][adv-main62+][adv-esr60.2+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.