Crash [@ Length] through [@ mozilla::net::nsHttpTransaction::SetHttpTrailer] due to pointer race
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
People
(Reporter: decoder, Assigned: CuveeHsu)
References
(Blocks 1 open bug)
Details
(4 keywords, Whiteboard: [adv-main71+r][necko-triaged][post-critsmash-triage][adv-esr68.3+r])
Crash Data
Attachments
(4 files)
7.91 KB,
text/plain
|
Details | |
4.00 KB,
application/octet-stream
|
Details | |
20.01 KB,
text/plain
|
Details | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr68+
tjr
:
sec-approval+
|
Details | Review |
The attached testcase crashes on mozilla-central revision 20190907-7f0c7ef868ff .
For detailed crash information, see attachment.
To reproduce the issue, perform the following steps:
- Download the attached testcase, save as "test.bin".
2a. Build with --enable-fuzzing (requires Clang and ASan, also build gtests using./mach gtest dontruntests
).
2b. Alternatively you can download builds from TC usingpython -mfuzzfetch -a --fuzzing --tests gtest
(see https://github.com/MozillaSecurity/fuzzfetch). - Run MOZ_RUN_GTEST=1 LIBFUZZER=1 FUZZER=NetworkHttp2 objdir/dist/bin/firefox -runs=10000 test.bin
This is not a regular null-deref but actually a thread race on an nsAutoPtr (this also explains why the -runs=10000 is required). Like other pointer races that we have seen before we should assume that this can cause a use-after-free with the right scheduling, unless we can explicitly rule that out. I will post a TSan trace after filing.
Reporter | ||
Comment 1•5 years ago
|
||
Reporter | ||
Comment 2•5 years ago
|
||
Comment 4•5 years ago
|
||
nsHttpTransaction::SetHttpTrailers
must use the same locking as nsHttpTransaction::TakeResponseTrailers()
uses. Need to add MutexAutoLock lock(*nsHttp::GetLock());
Great would be to keep the lock only when manipulating the member, best by performing the loop in a local array and then swap to the member under the lock.
P1 - easy fix.
Updated•5 years ago
|
Comment 5•5 years ago
|
||
:decoder, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.
Updated•5 years ago
|
Comment 7•5 years ago
|
||
Removing the regression keyword as this is probably a long standing issues (see comment #36).
Assignee | ||
Comment 8•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Not easy. It's a race condition which the timing is not easy to control. It's not revealed for a year until fuzzy.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?:
- How likely is this patch to cause regressions; how much testing does it need?: should be low risk
Comment 10•5 years ago
|
||
We're in release week so we're not going to land this for a couple of weeks - keeping flag.
Updated•5 years ago
|
Comment 11•5 years ago
|
||
Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer
Not a significant concern for exploitation - let's land now in central and beta whenever relman approves.
Assignee | ||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
Backed out for xpcshell failure in netwerk/test/unit/test_header_Server_Timing.js:
https://hg.mozilla.org/integration/autoland/rev/c865edc95632478be74dad0b49f0e509915c606e
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedJob=272911975&resultStatus=superseded%2Cusercancel%2Cretry%2Ctestfailed%2Cbusted%2Cexception&revision=f6e6598e402b1ea24740bf717b67f4d611dfce0f
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=272916906&repo=autoland
[task 2019-10-25T08:39:24.725Z] 08:39:24 INFO - TEST-START | netwerk/test/unit/test_header_Server_Timing.js
[task 2019-10-25T08:39:26.058Z] 08:39:26 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_header_Server_Timing.js | xpcshell return code: -11
[task 2019-10-25T08:39:26.058Z] 08:39:26 INFO - TEST-INFO took 1335ms
[task 2019-10-25T08:39:26.058Z] 08:39:26 INFO - >>>>>>>
[task 2019-10-25T08:39:26.060Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.060Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.060Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.061Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.062Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.062Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /builds/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2632
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2019-10-25T08:39:26.063Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: 'result.isErr()', file /builds/worker/workspace/build/src/startupcache/StartupCache.cpp, line 173
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - (xpcshell/head.js) | test pending (2)
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002: file /builds/worker/workspace/build/src/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 661
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2019-10-25T08:39:26.064Z] 08:39:26 INFO - running event loop
[task 2019-10-25T08:39:26.065Z] 08:39:26 INFO - PID 14845 | [14845, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/dom/media/CubebUtils.cpp, line 381
[task 2019-10-25T08:39:26.065Z] 08:39:26 INFO - "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2019-10-25T08:39:26.065Z] 08:39:26 INFO - PID 14845 | ###!!! ERROR: Potential deadlock detected:
[task 2019-10-25T08:39:26.066Z] 08:39:26 INFO - PID 14845 | === Cyclical dependency starts at
[task 2019-10-25T08:39:26.066Z] 08:39:26 INFO - PID 14845 | --- Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.066Z] 08:39:26 INFO - PID 14845 | calling context
[task 2019-10-25T08:39:26.067Z] 08:39:26 INFO - PID 14845 | [stack trace unavailable]
[task 2019-10-25T08:39:26.068Z] 08:39:26 INFO - PID 14845 | === Cycle completed at
[task 2019-10-25T08:39:26.068Z] 08:39:26 INFO - PID 14845 | --- Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.068Z] 08:39:26 INFO - PID 14845 | calling context
[task 2019-10-25T08:39:26.069Z] 08:39:26 INFO - PID 14845 | [stack trace unavailable]
[task 2019-10-25T08:39:26.069Z] 08:39:26 INFO - PID 14845 | ###!!! Deadlock may happen NOW!
[task 2019-10-25T08:39:26.069Z] 08:39:26 INFO - PID 14845 | \x07[14845, Socket Thread] ###!!! ASSERTION: Potential deadlock detected:
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Cyclical dependency starts at
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Cycle completed at
[task 2019-10-25T08:39:26.070Z] 08:39:26 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:26.071Z] 08:39:26 INFO - PID 14845 | ###!!! Deadlock may happen NOW!
[task 2019-10-25T08:39:26.072Z] 08:39:26 INFO - PID 14845 | : 'Error', file /builds/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 249
[task 2019-10-25T08:39:42.409Z] 08:39:42 INFO - PID 14845 | #01: mozilla::OffTheBooksMutex::Lock() [xpcom/threads/BlockingResourceBase.cpp:319]
[task 2019-10-25T08:39:42.410Z] 08:39:42 INFO - PID 14845 | #02: mozilla::net::nsHttp::ResolveAtom(char const*) [netwerk/protocol/http/nsHttp.cpp:146]
[task 2019-10-25T08:39:42.410Z] 08:39:42 INFO - PID 14845 | #03: mozilla::net::nsHttpHeaderArray::ParseHeaderLine(nsTSubstring<char> const&, mozilla::net::nsHttpAtom*, nsTSubstring<char>, nsTSubstring<char>) [netwerk/protocol/http/nsHttpHeaderArray.cpp:356]
[task 2019-10-25T08:39:42.411Z] 08:39:42 INFO - PID 14845 | #04: mozilla::net::nsHttpTransaction::SetHttpTrailers(nsTString<char>&) [netwerk/protocol/http/nsHttpTransaction.cpp:2383]
[task 2019-10-25T08:39:42.411Z] 08:39:42 INFO - PID 14845 | #05: mozilla::net::Http2Stream::ConvertResponseTrailers(mozilla::net::Http2Decompressor*, nsTSubstring<char>&) [netwerk/protocol/http/Http2Stream.cpp:1154]
[task 2019-10-25T08:39:42.412Z] 08:39:42 INFO - PID 14845 | #06: mozilla::net::Http2Session::ResponseHeadersComplete() [netwerk/protocol/http/Http2Session.cpp:1497]
[task 2019-10-25T08:39:42.412Z] 08:39:42 INFO - PID 14845 | #07: mozilla::net::Http2Session::RecvHeaders(mozilla::net::Http2Session*) [netwerk/protocol/http/Http2Session.cpp:1469]
[task 2019-10-25T08:39:42.413Z] 08:39:42 INFO - PID 14845 | #08: mozilla::net::Http2Session::WriteSegmentsAgain(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*, bool*) [netwerk/protocol/http/Http2Session.cpp:3571]
[task 2019-10-25T08:39:42.413Z] 08:39:42 INFO - PID 14845 | #09: mozilla::net::nsHttpConnection::OnSocketReadable() [netwerk/protocol/http/nsHttpConnection.cpp:2134]
[task 2019-10-25T08:39:42.414Z] 08:39:42 INFO - PID 14845 | #10: mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/protocol/http/nsHttpConnection.cpp:2487]
[task 2019-10-25T08:39:42.414Z] 08:39:42 INFO - PID 14845 | #11: non-virtual thunk to mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/protocol/http/nsHttpConnection.cpp:0]
[task 2019-10-25T08:39:42.415Z] 08:39:42 INFO - PID 14845 | #12: mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) [netwerk/base/nsSocketTransport2.cpp:0]
[task 2019-10-25T08:39:42.415Z] 08:39:42 INFO - PID 14845 | #13: mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short) [netwerk/base/nsSocketTransport2.cpp:2255]
[task 2019-10-25T08:39:42.416Z] 08:39:42 INFO - PID 14845 | #14: mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [netwerk/base/nsSocketTransportService2.cpp:0]
[task 2019-10-25T08:39:42.416Z] 08:39:42 INFO - PID 14845 | #15: mozilla::net::nsSocketTransportService::Run() [netwerk/base/nsSocketTransportService2.cpp:973]
[task 2019-10-25T08:39:42.416Z] 08:39:42 INFO - PID 14845 | #16: non-virtual thunk to mozilla::net::nsSocketTransportService::Run() [netwerk/base/nsSocketTransportService2.cpp:0]
[task 2019-10-25T08:39:42.417Z] 08:39:42 INFO - PID 14845 | #17: nsThread::ProcessNextEvent(bool, bool) [xpcom/threads/nsThread.cpp:1225]
[task 2019-10-25T08:39:42.417Z] 08:39:42 INFO - PID 14845 | #18: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:486]
[task 2019-10-25T08:39:42.418Z] 08:39:42 INFO - PID 14845 | #19: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:333]
[task 2019-10-25T08:39:42.418Z] 08:39:42 INFO - PID 14845 | #20: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
[task 2019-10-25T08:39:42.419Z] 08:39:42 INFO - PID 14845 | #21: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:460]
[task 2019-10-25T08:39:42.535Z] 08:39:42 INFO - PID 14845 | #22: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-10-25T08:39:42.536Z] 08:39:42 INFO - PID 14845 | #23: libpthread.so.0 + 0x76ba
[task 2019-10-25T08:39:42.537Z] 08:39:42 INFO - PID 14845 | #24: libc.so.6 + 0x10741d
[task 2019-10-25T08:39:42.537Z] 08:39:42 INFO - PID 14845 | #25: ??? (???:???)
[task 2019-10-25T08:39:42.538Z] 08:39:42 INFO - PID 14845 | [14845, Socket Thread] ###!!! ASSERTION: Potential deadlock detected:
[task 2019-10-25T08:39:42.539Z] 08:39:42 INFO - PID 14845 | Cyclical dependency starts at
[task 2019-10-25T08:39:42.539Z] 08:39:42 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:42.540Z] 08:39:42 INFO - PID 14845 | Cycle completed at
[task 2019-10-25T08:39:42.540Z] 08:39:42 INFO - PID 14845 | Mutex : nsHttp.sLock (currently acquired)
[task 2019-10-25T08:39:42.541Z] 08:39:42 INFO - PID 14845 | ###!!! Deadlock may happen NOW!
[task 2019-10-25T08:39:42.541Z] 08:39:42 INFO - PID 14845 | : 'Error', file /builds/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 249
[task 2019-10-25T08:39:42.542Z] 08:39:42 INFO - PID 14845 | Hit MOZ_CRASH() at /builds/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
[task 2019-10-25T08:39:42.542Z] 08:39:42 INFO - PID 14845 | ExceptionHandler::GenerateDump cloned child 14865
[task 2019-10-25T08:39:42.542Z] 08:39:42 INFO - PID 14845 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-10-25T08:39:42.543Z] 08:39:42 INFO - PID 14845 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-10-25T08:39:42.543Z] 08:39:42 INFO - <<<<<<<
Assignee | ||
Comment 14•5 years ago
|
||
double lock with ResolveAtom which protects sAtomTable
.
This bug is to protect nsHttpTransaction::mResponseTrailersTaken
and mForTakeResponseTrailers
My previous solution dodged the double lock.
Alternatively, we might introduce another lock instance like static Mutex* sTransactionLock
Assignee | ||
Comment 15•5 years ago
|
||
I'm now considering to use My previous solution since we already acquire locks so often in ResolveAtom . I did an experiment for BBC and we acquire lock on parent process main thread for > 500 times in ResolveAtom
!
The number makes me feel it's a little over optimization for save one lock (and the critical section is so tiny, just copy an trailer array) for the second trailer header comes, given we only allow Server-Timing:
at the moment. I don't have data, but I believe we might hit this around the possibility one out of a million requests.
Introduce another lock might be error prone, since the deadlock condition will be more complicated when we dance with two locks.
What do you think, Honza?
Comment 16•5 years ago
|
||
(I understand you didn't want to push to try for a sec bug, but running xpc tests in netwerk/test/unit
locally is a good thing to do ;))
(In reply to Junior [:junior] from comment #15)
I'm now considering to use My previous solution since we already acquire locks so often in ResolveAtom . I did an experiment for BBC and we acquire lock on parent process main thread for > 500 times in
ResolveAtom
!
Like in one loop or for all request/response processing done during the page load? It may be just expected to behave like this, I believe we are entering a very large number of locks elsewhere too.
If you see a space to improve, though, can you please file a bug?
The number makes me feel it's a little over optimization for save one lock (and the critical section is so tiny, just copy an trailer array) for the second trailer header comes, given we only allow
Server-Timing:
at the moment. I don't have data, but I believe we might hit this around the possibility one out of a million requests.Introduce another lock might be error prone, since the deadlock condition will be more complicated when we dance with two locks.
What do you think, Honza?
Go with your original solution! It's simple enough and it would be too complicated to come up with something else for such a small benefit. I'm always trying to save perf where it seems possible, here it's not. Thanks for exploring this!
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Comment 19•5 years ago
|
||
Please nominate this for Beta and ESR68 approval when you get a chance.
Assignee | ||
Comment 20•5 years ago
|
||
Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer
Beta/Release Uplift Approval Request
- User impact if declined: potential crash
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): small change
- String changes made/needed:
Comment 21•5 years ago
|
||
Comment on attachment 9097801 [details]
Bug 1580288, r=mayhemer
Fixes a Necko sec issue, approved for 71.0b7 and 68.3esr.
Updated•5 years ago
|
Comment 22•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Description
•