Closed Bug 1400554 Opened 2 years ago Closed 2 years ago

Crash in mozilla::net::nsHttpConnection::EnsureNPNComplete

Categories

(Core :: Networking: HTTP, defect, P2, critical)

52 Branch
x86
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 57+ fixed
firefox55 --- wontfix
firefox56 + wontfix
firefox57 + fixed
firefox58 + fixed

People

(Reporter: jesup, Assigned: dragana)

Details

(4 keywords, Whiteboard: [necko-triaged][adv-main57+][adv-esr52.5+][post-critsmash-triage])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-bb6ca798-7d2a-4fe7-b180-674950170827.
=============================================================

UAF crash in EnsureNPNComplete() - crashes appear to start in 52, and continue into 57.  Some of the crashes show clear UAF signatures, such as this one (which even worse is an EXEC of a UAF address).  It appears that something has deleted the nsHttpConnection, apparently while this is the middle of executing (they're not all at the start, they're scattered throughout the function, though most are at/near the beginning).  The implication would be that it was freed on another thread, though that's not guaranteed.

if you look at a 57 crash, you can see it's crashing on SocketThread:
https://crash-stats.mozilla.com/report/index/59cbc88a-c387-4383-8587-0422a0170912
Group: core-security
This was a low-level crash (~1 a day) for ages that jumped to 5-15 a day toward the end of June (we shipped Firefox 54 on June 13).

The bulk of the crashes are EXCEPTION_ACCESS_VIOLATION_READ at 0x0 which isn't as troubling. If you strip those out there rest are pretty troubling -- UAFs, execs and writes to random addresses, etc.
Group: core-security → network-core-security
->jduell to triage
Flags: needinfo?(jduell.mcbugs)
Track 56+/57+ as sec-critical.
Patrick, looks like we have one crash report here for 56 dev edition. 
Can you help find someone to investigate? Thanks.
Flags: needinfo?(mcmanus)
Flags: needinfo?(mcmanus)
not sure where to start, but plausibly some of the tfo arrangements..
Flags: needinfo?(dd.mozilla)
Dragana, see comment 5.
Flags: needinfo?(jduell.mcbugs)
This is what I think is happening:

we cancel transaction and set mTransaction to nullptr.

TLSFilterTransaction is closed but it does not cancel the timer.

The timer is canceled in Cleanup() which gets call during TLSFilterTransaction destructor, but the destructor will not be called ever because the timer hold reference to the TLSFilterTransaction (so that part of the code in Cleanup() is not useful).
Flags: needinfo?(dd.mozilla)
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Attachment #8914680 - Flags: review?(mcmanus)
Comment on attachment 8914680 [details] [diff] [review]
bug_1400554_v1.patch

Review of attachment 8914680 [details] [diff] [review]:
-----------------------------------------------------------------

ty!
Attachment #8914680 - Flags: review?(mcmanus) → review+
Comment on attachment 8914680 [details] [diff] [review]
bug_1400554_v1.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch? The pointer is nulled and memory is not freed. so to exploitable. This is true for crashes in 56 and 55. There are some older crashes.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? the patch is self.

Which older supported branches are affected by this flaw? 57, 56, 52esr. This is very old problem - bug 378637.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? The patch will apply on old branches as well. The patch is not risky.

How likely is this patch to cause regressions; how much testing does it need? no risk. The patch only cancels a timer if the tunnel is closed.
Attachment #8914680 - Flags: sec-approval?
> The pointer is nulled and memory is not freed. so to exploitable. 

This is a little confusing.  The current bug shows clear UAF signatures, and even an EXEC UAF, which is very bad.  The patch cancels the timer and nulls the pointer.  Based on the patch someone would know that whatever bug we're fixing would involve that timer.  It would not be simple to figure out what that would cause, or connect to the crash signatures here.
(In reply to Randell Jesup [:jesup] from comment #11)
> > The pointer is nulled and memory is not freed. so to exploitable. 
> 
> This is a little confusing.  The current bug shows clear UAF signatures, and
> even an EXEC UAF, which is very bad.  The patch cancels the timer and nulls
> the pointer.  Based on the patch someone would know that whatever bug we're
> fixing would involve that timer.  It would not be simple to figure out what
> that would cause, or connect to the crash signatures here.

I have check all crashes for 56 and the most for 55 from last week, I have not seen any uaf.
only crash at address 0 (mTransaction == nullptr):
0 	xul.dll 	mozilla::net::nsHttpConnection::EnsureNPNComplete(nsresult&, unsigned int&) 	netwerk/protocol/http/nsHttpConnection.cpp:385
1 	xul.dll 	mozilla::net::nsHttpConnection::OnSocketWritable() 	netwerk/protocol/http/nsHttpConnection.cpp:1696
2 	xul.dll 	mozilla::net::nsHttpConnection::OnTunnelNudged(mozilla::net::TLSFilterTransaction*) 	netwerk/protocol/http/nsHttpConnection.cpp:574
3 	xul.dll 	mozilla::net::TLSFilterTransaction::StartTimerCallback() 	netwerk/protocol/http/TunnelUtils.cpp:460
4 	xul.dll 	mozilla::net::TLSFilterTransaction::Notify(nsITimer*) 	netwerk/protocol/http/TunnelUtils.cpp:445


I looked older crashes and I see some. I think that is a separate problem. I will investigate further.
There are several for 57 in the last month with UAF signatures, such as https://crash-stats.mozilla.com/report/index/59cbc88a-c387-4383-8587-0422a0170912 (UAF EXEC crash)
Thanks for explaining the situation.

I note that there are a total; of 3 crashes for 57; two appear to be the same install/crash with the EXEC crash.  The other 57 crash was a nullptr, from a considerably earlier build (there's been at least one change landed to nsHttpConnection.cpp between the 56beta/early57 code and the late 57nightly (EXEC crash); that might be a clue.  However, there are other non-nullptr crashes in 56 and 55, etc, though it's *usually* nullptr.  And these all are at the same line of that file, so normally I'd assume they're the same bug.

If there's no way this bug (and patch) could be the cause of the non-nullptr crashes, then please mark the bug as leave-open so we don't lose track and think the sec-critical is fixed.  Thanks
once again a summary:

55 and earlier:
there are 2 kind of crashes: crashes with address 0 and uaf crashes.
The crashes with address 0 will be resolved with the patch in this bug. They are not uaf, the reference to HttpConnection is held by TLSFilterTransaction.
The uaf crashes: on 56 I have rewrote some part of HttpConnectionMgr. That code is very very old. I have added some ref counting that was not there. I believe that is the reason why we do not see the uaf crashes on 56 (There were more uaf crashes involving httpConnnection with different signatures that I believed should be fixed with the changes I made to HttpConnectionMgr).


56 has only crashes on address 0 involving TLSFilterTransaction and a timer and they should be fixed with this bug.

57 has 3 crashes: 1 on address 0 (should be fixed with this patch) and 2 uaf crashes. The uaf are EXEC crashes involving only one instance and happening 6seconds apart. I do not know what cause them. The build id is 20170912013600. We did some changes involving httpConnectionMgr scheduling and some follow up bug involving that code were fixed around that time. I will leave this bug open and observe if uaf crashes appear again.
Priority: -- → P2
Whiteboard: [necko-triaged]
Tracking 58+ for this sec-critical issue.
Comment on attachment 8914680 [details] [diff] [review]
bug_1400554_v1.patch

sec-approval+ for trunk.
Please nominate for beta and ESR52.
Attachment #8914680 - Flags: sec-approval? → sec-approval+
Comment on attachment 8914680 [details] [diff] [review]
bug_1400554_v1.patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: crash
Fix Landed on Version: 58
Risk to taking this patch (and alternatives if risky): no risk. the ptch cancels a timer.
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Approval Request Comment
[Feature/Bug causing the regression]: bug 378637 - Http proxy over https
[User impact if declined]: crash
[Is this code covered by automated tests?]: no, it is a timing issue and it is hard to reproduce it.
[Has the fix been verified in Nightly?]: The crashes are rare on Nightly it will be hard to verify it.
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: no risk. the ptch cancels a timer.
[String changes made/needed]: none
Attachment #8914680 - Flags: approval-mozilla-esr52?
Attachment #8914680 - Flags: approval-mozilla-beta?
Comment on attachment 8914680 [details] [diff] [review]
bug_1400554_v1.patch

Sec-crit, Beta57+, ESR52+
Attachment #8914680 - Flags: approval-mozilla-esr52?
Attachment #8914680 - Flags: approval-mozilla-esr52+
Attachment #8914680 - Flags: approval-mozilla-beta?
Attachment #8914680 - Flags: approval-mozilla-beta+
Is there more to do in this bug? The Fx58 development cycle is ending soon.
Flags: needinfo?(dd.mozilla)
From a tracking standpoint, it'd probably be easier to close this bug and move any follow-up work to a new one.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #23)
> Is there more to do in this bug? The Fx58 development cycle is ending soon.

There were 2 different crashes with this signature. One was fixed with this patch and the other was fix by the rework that I did in Nightly 56. But there are still 3 crashes of the second version: 2 on Nightly57 on the same instance and on on Nightly 58. The are so rare I am not sure if we are able to track them down.

I left this bug open to follow the second time of crashes. Anyway I can close this bug and open another one.
Flags: needinfo?(dd.mozilla)
The follow up bug is 1412887.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: --- → mozilla58
Whiteboard: [necko-triaged] → [necko-triaged][adv-main57+][adv-esr52.5+]
Group: network-core-security → core-security-release
Flags: qe-verify-
Whiteboard: [necko-triaged][adv-main57+][adv-esr52.5+] → [necko-triaged][adv-main57+][adv-esr52.5+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.