Closed Bug 1580288 Opened 5 years ago Closed 5 years ago

Crash [@ Length] through [@ mozilla::net::nsHttpTransaction::SetHttpTrailer] due to pointer race

Categories

(Core :: Networking: HTTP, defect, P1)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 71+ fixed
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 + fixed
firefox72 + fixed

People

(Reporter: decoder, Assigned: CuveeHsu)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [adv-main71+r][necko-triaged][post-critsmash-triage][adv-esr68.3+r])

Crash Data

Attachments

(4 files)

The attached testcase crashes on mozilla-central revision 20190907-7f0c7ef868ff .

For detailed crash information, see attachment.

To reproduce the issue, perform the following steps:

  1. Download the attached testcase, save as "test.bin".
    2a. Build with --enable-fuzzing (requires Clang and ASan, also build gtests using ./mach gtest dontruntests).
    2b. Alternatively you can download builds from TC using python -mfuzzfetch -a --fuzzing --tests gtest (see https://github.com/MozillaSecurity/fuzzfetch).
  2. Run MOZ_RUN_GTEST=1 LIBFUZZER=1 FUZZER=NetworkHttp2 objdir/dist/bin/firefox -runs=10000 test.bin

This is not a regular null-deref but actually a thread race on an nsAutoPtr (this also explains why the -runs=10000 is required). Like other pointer races that we have seen before we should assume that this can cause a use-after-free with the right scheduling, unless we can explicitly rule that out. I will post a TSan trace after filing.

Attached file Detailed Crash Information —
Attached file Testcase —
Attached file tsan-log-f321dd9af076.txt —

TSan Log for the data race.

Flags: needinfo?(honzab.moz)

nsHttpTransaction::SetHttpTrailers must use the same locking as nsHttpTransaction::TakeResponseTrailers() uses. Need to add MutexAutoLock lock(*nsHttp::GetLock()); Great would be to keep the lock only when manipulating the member, best by performing the loop in a local array and then swap to the member under the lock.

P1 - easy fix.

Flags: needinfo?(honzab.moz)
Priority: -- → P1
Whiteboard: [necko-triaged]
Group: core-security → network-core-security

:decoder, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(choller)

This is probably a long-standing issue.

Flags: needinfo?(choller)
Assignee: nobody → juhsu

Removing the regression keyword as this is probably a long standing issues (see comment #36).

Keywords: regression
Attached file Bug 1580288, r=mayhemer —
Attachment #9097801 - Attachment description: Bug 1580288 - lock protected → Bug 1580288, r=mayhemer

Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not easy. It's a race condition which the timing is not easy to control. It's not revealed for a year until fuzzy.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?: should be low risk
Attachment #9097801 - Flags: sec-approval?

We're in release week so we're not going to land this for a couple of weeks - keeping flag.

Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer

Not a significant concern for exploitation - let's land now in central and beta whenever relman approves.

Attachment #9097801 - Flags: sec-approval? → sec-approval+
Keywords: checkin-needed

Backed out for xpcshell failure in netwerk/test/unit/test_header_Server_Timing.js:

https://hg.mozilla.org/integration/autoland/rev/c865edc95632478be74dad0b49f0e509915c606e

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedJob=272911975&resultStatus=superseded%2Cusercancel%2Cretry%2Ctestfailed%2Cbusted%2Cexception&revision=f6e6598e402b1ea24740bf717b67f4d611dfce0f
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=272916906&repo=autoland

[task 2019-10-25T08:39:24.725Z] 08:39:24 INFO - TEST-START | netwerk/test/unit/test_header_Server_Timing.js
[task 2019-10-25T08:39:26.058Z] 08:39:26 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_header_Server_Timing.js | xpcshell return code: -11
[task 2019-10-25T08:39:26.058Z] 08:39:26 INFO - TEST-INFO took 1335ms
[task 2019-10-25T08:39:26.058Z] 08:39:26 INFO - >>>>>>>
[task 2019-10-25T08:39:26.060Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.060Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.060Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.062Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.062Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /builds/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2632
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - (xpcshell/head.js) | test pending (2)
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002: file /builds/worker/workspace/build/src/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 661
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - running event loop
[task 2019-10-25T08:39:26.065Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/dom/media/CubebUtils.cpp, line 381
[task 2019-10-25T08:39:26.065Z] 08:39:26 INFO - "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2019-10-25T08:39:26.065Z] 08:39:26 INFO - PID 14845 | ###!!! ERROR: Potential deadlock detected:
[task 2019-10-25T08:39:26.066Z] 08:39:26 INFO - PID 14845 | === Cyclical dependency starts at
[task 2019-10-25T08:39:26.066Z] 08:39:26 INFO - PID 14845 | --- Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.066Z] 08:39:26 INFO - PID 14845 | calling context
[task 2019-10-25T08:39:26.067Z] 08:39:26 INFO - PID 14845 | [stack trace unavailable]
[task 2019-10-25T08:39:26.068Z] 08:39:26 INFO - PID 14845 | === Cycle completed at
[task 2019-10-25T08:39:26.068Z] 08:39:26 INFO - PID 14845 | --- Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.068Z] 08:39:26 INFO - PID 14845 | calling context
[task 2019-10-25T08:39:26.069Z] 08:39:26 INFO - PID 14845 | [stack trace unavailable]
[task 2019-10-25T08:39:26.069Z] 08:39:26 INFO - PID 14845 | ###!!! Deadlock may happen NOW!
[task 2019-10-25T08:39:26.069Z] 08:39:26 INFO - PID 14845 | \x07[14845, Socket Thread] ###!!! ASSERTION: Potential deadlock detected:
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Cyclical dependency starts at
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Cycle completed at
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.071Z] 08:39:26 INFO - PID 14845 | ###!!! Deadlock may happen NOW!
[task 2019-10-25T08:39:26.072Z] 08:39:26 INFO - PID 14845 | : 'Error', file /builds/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 249
[task 2019-10-25T08:39:42.409Z] 08:39:42 INFO - PID 14845 | #01: mozilla::OffTheBooksMutex::Lock() [xpcom/threads/BlockingResourceBase.cpp:319]
[task 2019-10-25T08:39:42.410Z] 08:39:42 INFO - PID 14845 | #02: mozilla::net::nsHttp::ResolveAtom(char const*) [netwerk/protocol/http/nsHttp.cpp:146]
[task 2019-10-25T08:39:42.410Z] 08:39:42 INFO - PID 14845 | #03: mozilla::net::nsHttpHeaderArray::ParseHeaderLine(nsTSubstring<char> const&, mozilla::net::nsHttpAtom*, nsTSubstring<char>, nsTSubstring<char>) [netwerk/protocol/http/nsHttpHeaderArray.cpp:356]
[task 2019-10-25T08:39:42.411Z] 08:39:42 INFO - PID 14845 | #04: mozilla::net::nsHttpTransaction::SetHttpTrailers(nsTString<char>&) [netwerk/protocol/http/nsHttpTransaction.cpp:2383]
[task 2019-10-25T08:39:42.411Z] 08:39:42 INFO - PID 14845 | #05: mozilla::net::Http2Stream::ConvertResponseTrailers(mozilla::net::Http2Decompressor*, nsTSubstring<char>&) [netwerk/protocol/http/Http2Stream.cpp:1154]
[task 2019-10-25T08:39:42.412Z] 08:39:42 INFO - PID 14845 | #06: mozilla::net::Http2Session::ResponseHeadersComplete() [netwerk/protocol/http/Http2Session.cpp:1497]
[task 2019-10-25T08:39:42.412Z] 08:39:42 INFO - PID 14845 | #07: mozilla::net::Http2Session::RecvHeaders(mozilla::net::Http2Session*) [netwerk/protocol/http/Http2Session.cpp:1469]
[task 2019-10-25T08:39:42.413Z] 08:39:42 INFO - PID 14845 | #08: mozilla::net::Http2Session::WriteSegmentsAgain(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*, bool*) [netwerk/protocol/http/Http2Session.cpp:3571]
[task 2019-10-25T08:39:42.413Z] 08:39:42 INFO - PID 14845 | #09: mozilla::net::nsHttpConnection::OnSocketReadable() [netwerk/protocol/http/nsHttpConnection.cpp:2134]
[task 2019-10-25T08:39:42.414Z] 08:39:42 INFO - PID 14845 | #10: mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/protocol/http/nsHttpConnection.cpp:2487]
[task 2019-10-25T08:39:42.414Z] 08:39:42 INFO - PID 14845 | #11: non-virtual thunk to mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/protocol/http/nsHttpConnection.cpp:0]
[task 2019-10-25T08:39:42.415Z] 08:39:42 INFO - PID 14845 | #12: mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) [netwerk/base/nsSocketTransport2.cpp:0]
[task 2019-10-25T08:39:42.415Z] 08:39:42 INFO - PID 14845 | #13: mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short) [netwerk/base/nsSocketTransport2.cpp:2255]
[task 2019-10-25T08:39:42.416Z] 08:39:42 INFO - PID 14845 | #14: mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [netwerk/base/nsSocketTransportService2.cpp:0]
[task 2019-10-25T08:39:42.416Z] 08:39:42 INFO - PID 14845 | #15: mozilla::net::nsSocketTransportService::Run() [netwerk/base/nsSocketTransportService2.cpp:973]
[task 2019-10-25T08:39:42.416Z] 08:39:42 INFO - PID 14845 | #16: non-virtual thunk to mozilla::net::nsSocketTransportService::Run() [netwerk/base/nsSocketTransportService2.cpp:0]
[task 2019-10-25T08:39:42.417Z] 08:39:42 INFO - PID 14845 | #17: nsThread::ProcessNextEvent(bool, bool
) [xpcom/threads/nsThread.cpp:1225]
[task 2019-10-25T08:39:42.417Z] 08:39:42 INFO - PID 14845 | #18: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:486]
[task 2019-10-25T08:39:42.418Z] 08:39:42 INFO - PID 14845 | #19: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:333]
[task 2019-10-25T08:39:42.418Z] 08:39:42 INFO - PID 14845 | #20: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
[task 2019-10-25T08:39:42.419Z] 08:39:42 INFO - PID 14845 | #21: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:460]
[task 2019-10-25T08:39:42.535Z] 08:39:42 INFO - PID 14845 | #22: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-10-25T08:39:42.536Z] 08:39:42 INFO - PID 14845 | #23: libpthread.so.0 + 0x76ba
[task 2019-10-25T08:39:42.537Z] 08:39:42 INFO - PID 14845 | #24: libc.so.6 + 0x10741d
[task 2019-10-25T08:39:42.537Z] 08:39:42 INFO - PID 14845 | #25: ??? (???:???)
[task 2019-10-25T08:39:42.538Z] 08:39:42 INFO - PID 14845 | [14845, Socket Thread] ###!!! ASSERTION: Potential deadlock detected:
[task 2019-10-25T08:39:42.539Z] 08:39:42 INFO - PID 14845 | Cyclical dependency starts at
[task 2019-10-25T08:39:42.539Z] 08:39:42 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:42.540Z] 08:39:42 INFO - PID 14845 | Cycle completed at
[task 2019-10-25T08:39:42.540Z] 08:39:42 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:42.541Z] 08:39:42 INFO - PID 14845 | ###!!! Deadlock may happen NOW!
[task 2019-10-25T08:39:42.541Z] 08:39:42 INFO - PID 14845 | : 'Error', file /builds/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 249
[task 2019-10-25T08:39:42.542Z] 08:39:42 INFO - PID 14845 | Hit MOZ_CRASH() at /builds/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
[task 2019-10-25T08:39:42.542Z] 08:39:42 INFO - PID 14845 | ExceptionHandler::GenerateDump cloned child 14865
[task 2019-10-25T08:39:42.542Z] 08:39:42 INFO - PID 14845 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-10-25T08:39:42.543Z] 08:39:42 INFO - PID 14845 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-10-25T08:39:42.543Z] 08:39:42 INFO - <<<<<<<

Flags: needinfo?(juhsu)

double lock with ResolveAtom which protects sAtomTable.

This bug is to protect nsHttpTransaction::mResponseTrailersTaken and mForTakeResponseTrailers

My previous solution dodged the double lock.
Alternatively, we might introduce another lock instance like static Mutex* sTransactionLock

Flags: needinfo?(juhsu)

I'm now considering to use My previous solution since we already acquire locks so often in ResolveAtom . I did an experiment for BBC and we acquire lock on parent process main thread for > 500 times in ResolveAtom!

The number makes me feel it's a little over optimization for save one lock (and the critical section is so tiny, just copy an trailer array) for the second trailer header comes, given we only allow Server-Timing: at the moment. I don't have data, but I believe we might hit this around the possibility one out of a million requests.

Introduce another lock might be error prone, since the deadlock condition will be more complicated when we dance with two locks.

What do you think, Honza?

Flags: needinfo?(honzab.moz)

(I understand you didn't want to push to try for a sec bug, but running xpc tests in netwerk/test/unit locally is a good thing to do ;))

(In reply to Junior [:junior] from comment #15)

I'm now considering to use My previous solution since we already acquire locks so often in ResolveAtom . I did an experiment for BBC and we acquire lock on parent process main thread for > 500 times in ResolveAtom!

Like in one loop or for all request/response processing done during the page load? It may be just expected to behave like this, I believe we are entering a very large number of locks elsewhere too.

If you see a space to improve, though, can you please file a bug?

The number makes me feel it's a little over optimization for save one lock (and the critical section is so tiny, just copy an trailer array) for the second trailer header comes, given we only allow Server-Timing: at the moment. I don't have data, but I believe we might hit this around the possibility one out of a million requests.

Introduce another lock might be error prone, since the deadlock condition will be more complicated when we dance with two locks.

What do you think, Honza?

Go with your original solution! It's simple enough and it would be too complicated to come up with something else for such a small benefit. I'm always trying to save perf where it seems possible, here it's not. Thanks for exploring this!

Flags: needinfo?(honzab.moz)
Group: network-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

Please nominate this for Beta and ESR68 approval when you get a chance.

Flags: needinfo?(juhsu)

Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer

Beta/Release Uplift Approval Request

  • User impact if declined: potential crash
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): small change
  • String changes made/needed:
Flags: needinfo?(juhsu)
Attachment #9097801 - Flags: approval-mozilla-beta?

Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer

Fixes a Necko sec issue, approved for 71.0b7 and 68.3esr.

Attachment #9097801 - Flags: approval-mozilla-esr68+
Attachment #9097801 - Flags: approval-mozilla-beta?
Attachment #9097801 - Flags: approval-mozilla-beta+
Flags: qe-verify-
Whiteboard: [necko-triaged] → [necko-triaged][post-critsmash-triage]
Whiteboard: [necko-triaged][post-critsmash-triage] → [adv-main71+r][necko-triaged][post-critsmash-triage]
Whiteboard: [adv-main71+r][necko-triaged][post-critsmash-triage] → [adv-main71+r][necko-triaged][post-critsmash-triage][adv-esr68.3+r]
Blocks: tsan
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: