Closed Bug 1575217 Opened 1 year ago Closed 1 year ago

AddressSanitizer: heap-use-after-free [@ operator bool] with READ of size 8 though [@ mozilla::net::SpdyConnectTransaction::CreateShimError] or Crash [@ mozilla::net::nsHttpTransaction::IsDone]


(Core :: Networking: WebSockets, defect, P1)




Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 70+ fixed
firefox69 --- wontfix
firefox70 --- fixed
firefox71 --- fixed


(Reporter: decoder, Assigned: mayhemer)


(Blocks 1 open bug)


(4 keywords, Whiteboard: [necko-triaged][post-critsmash-triage][adv-main70+][adv-main70+r][adv-esr68.2+][adv-esr68.2+r])

Crash Data


(5 files, 2 obsolete files)

The attached testcase crashed on mozilla-central revision 29e9dde37bd2 (build with --enable-tests --enable-address-sanitizer --disable-jemalloc --enable-optimize=-O2 --enable-fuzzing --disable-debug).

For detailed crash information, see attachment.

The issue is not reproducible anymore after I rebased all of my patches to tip (this needs the websockets target patch, which will soon be in bug 1528951, but it also needs the changes from bug 1566342 for HTTP2 and those only landed on m-c recently).

Note that I was not able to reproduce the use-after-free, but running the test 1000 times in a row crashed with a SEGV instead (I will post the second crash stack after filing). These are likely connected as each test that the fuzzer saved for the use-after-free issue shows this behavior. It is unlikely that this is a bug in the fuzzing target itself, especially now that the bug vanished after rebasing. My guess is that either something changed about timings that made it less likely to reproduce or we fixed this in another bug meanwhile (but I can't find any candiates for that).

Filing after discussing this with :kershaw on IRC, so we can investigate if this is still an issue.

Attached file Detailed Crash Information (obsolete) —
Attached file Testcase (obsolete) —

The SEGV crash I got instead of the use-after-free looks like this:

==19996==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000001b6 (pc 0x7f033eba7e95 bp 0x7f02b51f39f0 sp 0x7f02b51f39e0 T13)
==19996==The signal is caused by a READ memory access.
==19996==Hint: address points to the zero page.
    #0 0x7f033eba7e94 in mozilla::net::nsHttpTransaction::IsDone() netwerk/protocol/http/nsHttpTransaction.cpp:717:43
    #1 0x7f033ecbf5d5 in mozilla::net::SpdyConnectTransaction::WebsocketWriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) netwerk/protocol/http/TunnelUtils.cpp:1337:50
    #2 0x7f033ea78fc6 in mozilla::net::Http2Stream::WriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) netwerk/protocol/http/Http2Stream.cpp:307:31
    #3 0x7f033ea5e8f7 in mozilla::net::Http2Session::WriteSegmentsAgain(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*, bool*) netwerk/protocol/http/Http2Session.cpp:3417:33
    #4 0x7f033eb63d1e in mozilla::net::nsHttpConnection::OnSocketReadable() netwerk/protocol/http/nsHttpConnection.cpp:2101:24
    #5 0x7f033eb66b71 in mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) netwerk/protocol/http/nsHttpConnection.cpp:2444:17
    #6 0x7f033eb66f74 in non-virtual thunk to mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) netwerk/protocol/http/nsHttpConnection.cpp
    #7 0x7f033e026811 in mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) netwerk/base/nsSocketTransport2.cpp:285:27
    #8 0x7f033e038122 in mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short) netwerk/base/nsSocketTransport2.cpp:2235:14
    #9 0x7f033e048c85 in mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) netwerk/base/nsSocketTransportService2.cpp
    #10 0x7f033e047482 in mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:971:7
    #11 0x7f033e049aac in non-virtual thunk to mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp
    #12 0x7f033dd116eb in nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1225:14
    #13 0x7f033dd17221 in NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:486:10
    #14 0x7f033f155838 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:303:20
    #15 0x7f033efef15f in RunInternal ipc/chromium/src/base/
    #16 0x7f033efef15f in RunHandler ipc/chromium/src/base/
    #17 0x7f033efef15f in MessageLoop::Run() ipc/chromium/src/base/
    #18 0x7f033dd0a96b in nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:459:11
    #19 0x7f0365b3fa28 in _pt_root nsprpub/pr/src/pthreads/ptthread.c:201:5
    #20 0x7f03657836da in start_thread (/lib/x86_64-linux-gnu/
    #21 0x7f036476188e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Regarding the UAF:
mTransaction is a refptr. Looks like SpdyConnectTransaction is Release()'d more times than AddRef()'ed.

The callstack between Http2Session::Shutdown and SpdyConnectTransaction::CreateShimError is likely going through:

I would be very interested in which recent change caused this crash to go away! It might help investigating.

Regarding the SEGV:
It's probably related. mDrivingTransaction is a refptr member of SpdyConnectTransaction. The crash has some similarities to the UAF, if the called-on object was freed, we should crash sooner already.

I was able to reproduce this issue once more by running the fuzzer on mozilla-central tip (+ websockets target patch). Attached is a new ASan log for mozilla-central rev e7e658ec1e98. I will also attach the new testcase, but it requires the patch for repoducing.

Attachment #9086657 - Attachment is obsolete: true
Attached file Updated Testcase

Here is an updated testcase that locally reproduced the use-after-free if I run it 10000 times in a row on the WebSocket fuzzing target.

Attachment #9086658 - Attachment is obsolete: true
The same testcase can also trigger a different variation with use-after-free [@ mozilla::net::DeleteHttpTransaction::Run], which looks like double-free'ing the HttpTransation.
Attachment #9086945 - Attachment description: Detailed Crash Information → Detailed Crash Information for Comment 7

I think we can act on this. It would be good to figure if this is websockets isolated or have a larger impact. P1 for baseline diagnoses only and attention.

Priority: -- → P1
Whiteboard: [necko-triaged]
Group: core-security → network-core-security

(In reply to Honza Bambas (:mayhemer) from comment #9)

I think we can act on this. It would be good to figure if this is websockets isolated or have a larger impact. P1 for baseline diagnoses only and attention.

From what I see, I think this is not only websocket releated. I think we can hit it with http2 proxies as well.

Honza, do you have time to take a look?

Flags: needinfo?(honzab.moz)
Assignee: nobody → nhnguyen

Taking this now.

Assignee: nhnguyen → honzab.moz
Flags: needinfo?(honzab.moz)

I was able to locally reproduce "a crash" with:

  • update to e7e658ec1e98
  • apply D42816 on top of it
  • build with comment 0 config (./mach build, ./mach gtest - just to build gtest libxul)
  • added TestWebsocketFuzzing.cpp to netwerk/test/fuzz/
  • export MOZ_LOG=sync,nsHttp:5 + file logging
  • `FUZZER=NetworkWebsocket LIBFUZZER=1 MOZ_RUN_GTEST=1 obj-opt/dist/bin/firefox -runs=10000 ../crash-72188709c9f1dc0a790c5b7a10da3372c1cbda56


==7831==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000001b6 (pc 0x7f80f1ea77c5 bp 0x7f80e1b679c0 sp 0x7f80e1b679b0 T13)
==7831==The signal is caused by a READ memory access.
==7831==Hint: address points to the zero page.
    #0 0x7f80f1ea77c4 in mozilla::net::nsHttpTransaction::IsDone() /home/cltbld/Mozilla/src/mozilla-unified/netwerk/protocol/http/nsHttpTransaction.cpp:698:43
    #1 0x7f80f1d3114e in mozilla::net::SpdyConnectTransaction::WebsocketWriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) /home/cltbld/Mozilla/src/mozilla-unified/netwerk/protocol/http/TunnelUtils.cpp:1393:50
    #2 0x7f80f1bcc796 in mozilla::net::Http2Stream::WriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) /home/cltbld/Mozilla/src/mozilla-unified/netwerk/protocol/http/Http2Stream.cpp:322:31

The log show we access SpdyConnectTransaction::mDrivingTransaction on multiple threads. This is 'illusion of atomic refcnt' problem.

Main thread does:

while the socket thread is between these two lines:

This concurrent access on a single RefPtr can cause various, hard to predict issues, specifically UAFs or "use-while-being-destroyed" (UWBD?) or other heap breakage. This is the worst possible race bug that I don't see the first time...

Access to this members needs a lock. I will fix it and try if there are not more issues here (more members to lock...)


Comment on attachment 9088716 [details]
Bug 1575217, r=michal!,dragana!

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: I think it's quite complicated. This is a tight race, hard to control, IMO.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: 65+
  • If not all supported branches, which bug introduced the flaw?: 1434137, 1451293
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Easy, probably just about rebasing.
  • How likely is this patch to cause regressions; how much testing does it need?: Not likely. It's just moving reassignment operations to a single (socket) thread and adds (debug only) assertions for it, no new locks added, just dispatch.
Attachment #9088716 - Flags: sec-approval?

sec-approval+ for trunk. We'll want beta and ESR68 patches as well.

Attachment #9088716 - Flags: sec-approval? → sec-approval+

Comment on attachment 9088716 [details]
Bug 1575217, r=michal!,dragana!

Beta/Release Uplift Approval Request

  • User impact if declined: possible UAF
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): we are moving re-assignments/nullifications to a single thread (via dispatch). no new locks. logic single-thread racing (like nullified too soon/too late) should not occur as the nullification is made after the work has been done and (not) performing the branches specifically dependent on non-nullness are thus no longer necessary or have no effect.
  • String changes made/needed:

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: see above
  • Fix Landed on Version: 71
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): see above
  • String or UUID changes made by this patch:
Attachment #9088716 - Flags: approval-mozilla-esr68?
Attachment #9088716 - Flags: approval-mozilla-beta?
Group: network-core-security → core-security
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Comment on attachment 9088716 [details]
Bug 1575217, r=michal!,dragana!

Fix for potential UAF, let's land for beta 4.

Attachment #9088716 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9088716 [details]
Bug 1575217, r=michal!,dragana!

Approved for 68.2esr also.

Attachment #9088716 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
Group: core-security → core-security-release
Flags: qe-verify-
Whiteboard: [necko-triaged] → [necko-triaged][post-critsmash-triage]
Whiteboard: [necko-triaged][post-critsmash-triage] → [necko-triaged][post-critsmash-triage][adv-main70+][adv-main70-rollup]
Whiteboard: [necko-triaged][post-critsmash-triage][adv-main70+][adv-main70-rollup] → [necko-triaged][post-critsmash-triage][adv-main70+][adv-main70-rollup][adv-esr68.2+][adv-esr68.2-rollup]
Whiteboard: [necko-triaged][post-critsmash-triage][adv-main70+][adv-main70-rollup][adv-esr68.2+][adv-esr68.2-rollup] → [necko-triaged][post-critsmash-triage][adv-main70+][adv-main70+r][adv-esr68.2+][adv-esr68.2+r]

I remember originally reproducing this on TSan as well to confirm that the tool would have found this (and much easier) but apparently I never attached the log. So I reverted the fix locally and reproduced this again under TSan easily and it shows the data race on RefPtr as expected. It requires less than 10 runs to show up (compared to ASan having a 0.01% chance to find this). Adding the log for documentation purposes.

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.