Crash in mozilla::net::Http2Session::ReadSegmentsAgain

RESOLVED FIXED in Firefox 48

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: philipp, Assigned: mcmanus)

Tracking

(4 keywords)

45 Branch
mozilla50
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 wontfix, firefox48+ fixed, firefox49+ fixed, firefox-esr4548+ fixed, firefox50+ fixed)

Details

(Whiteboard: [spdy][necko-active][adv-main48+][adv-esr45.3+], crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-5679b845-1c0e-411e-b90a-5e96e2160701.
=============================================================
Crashing Thread (6)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::net::Http2Session::ReadSegmentsAgain(mozilla::net::nsAHttpSegmentReader*, unsigned int, unsigned int*, bool*) 	netwerk/protocol/http/Http2Session.cpp:2302
1 	xul.dll 	mozilla::net::nsHttpConnection::OnSocketWritable() 	netwerk/protocol/http/nsHttpConnection.cpp:1603
2 	xul.dll 	mozilla::net::nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream*) 	netwerk/protocol/http/nsHttpConnection.cpp:2109
3 	xul.dll 	nsSocketOutputStream::OnSocketReady(nsresult) 	netwerk/base/nsSocketTransport2.cpp:552

this crash signature seems to be regressing since firefox 45 - primarily on fennec so far.

in the last few days it's also significantly rising on firefox desktop, in a manual spot-check of affected crash reports it seems that they are those users are having the addon firefox@zenmate.com https://addons.mozilla.org/firefox/addon/zenmate-security-privacy-vpn/ (300k users) present. the 5.5.3 update of the extension on june 28 is coinciding with the crash spike on firefox desktop.
Whiteboard: [spdy][necko-next]
The developer has been notified about this.
the [@ mozilla::net::TLSFilterTransaction::WriteSegments ] signature (bug 1170928) seems to be increasing at the same time.
[Tracking Requested - why for this release]:
on 48.0b5 this and bug 1170928 are now 4% of all crashes.
Tracking as this is currently the #3 topcrash in beta.
Assignee: nobody → mcmanus
Whiteboard: [spdy][necko-next] → [spdy][necko-active]
I've started looking at zenmate and it uses h2 proxying - so this is very likely a dup of 1170928. I can leave them both open for the moment.

actively investigating - don't have a reliable str yet but just getting going
the stream in question is a h2 tunnel.. after the stream is removed from the pending queues in session::closestream it is readded by the stack below.. shortly thereafter it is destroyed when removed from the transaction hash.. leaving the dangling raw pointer to cause this crash..

the obvious fix of not clearing the pending queues until after stream->close has been called seems to hang the tunnel rather than crashing.

#0  mozilla::net::Http2Session::TransactionHasDataToWrite (this=0x7f5a6aeea000, caller=0x7f5a76b4ea90) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/Http2Session.cpp:3584
#1  0x00007f5aa4973dc1 in mozilla::net::OutputStreamShim::AsyncWait (this=0x7f5a74cd3130, callback=0x0, target=0x0) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/TunnelUtils.cpp:1311
#2  0x00007f5aa49afa26 in mozilla::net::nsHttpConnection::Close (this=0x7f5a77582780, reason=nsresult::NS_ERROR_FAILURE, aIsShutdown=false) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/nsHttpConnection.cpp:619
#3  0x00007f5aa49ae29f in mozilla::net::nsHttpConnection::CloseTransaction (this=0x7f5a77582780, trans=0x7f5a773f7b80, reason=nsresult::NS_ERROR_FAILURE) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/nsHttpConnection.cpp:1521
#4  0x00007f5aa49b4239 in mozilla::net::nsHttpConnection::OnInputStreamReady (this=0x7f5a77582780, in=0x7f5a74cd30a0) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/nsHttpConnection.cpp:2100
#5  0x00007f5aa4972ec8 in mozilla::net::SpdyConnectTransaction::CreateShimError (this=0x7f5a76b4ea90, code=nsresult::NS_ERROR_NET_PARTIAL_TRANSFER) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/TunnelUtils.cpp:1211
#6  0x00007f5aa4973b66 in mozilla::net::SpdyConnectTransaction::Close (this=0x7f5a76b4ea90, code=nsresult::NS_ERROR_NET_PARTIAL_TRANSFER) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/TunnelUtils.cpp:1280
#7  0x00007f5aa48f1289 in mozilla::net::Http2Stream::Close (this=0x7f5a87c55e20, reason=nsresult::NS_ERROR_NET_PARTIAL_TRANSFER) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/Http2Stream.cpp:1082
#8  0x00007f5aa48ece38 in mozilla::net::Http2Session::CloseStream (this=0x7f5a6aeea000, aStream=0x7f5a87c55e20, aResult=nsresult::NS_ERROR_NET_PARTIAL_TRANSFER) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/Http2Session.cpp:1125
#9  0x00007f5aa48ecbe0 in mozilla::net::Http2Session::Shutdown (this=0x7f5a6aeea000) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/Http2Session.cpp:155
#10 0x00007f5aa48ed9b8 in mozilla::net::Http2Session::Close (this=0x7f5a6aeea000, aReason=nsresult::NS_ERROR_NET_TIMEOUT) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/Http2Session.cpp:3057
#11 0x00007f5aa48ed694 in mozilla::net::Http2Session::ReadTimeoutTick (this=0x7f5a6aeea000, now=24055185) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/Http2Session.cpp:284
#12 0x00007f5aa49b1e9e in mozilla::net::nsHttpConnection::ReadTimeoutTick (this=0x7f5a751bd800, now=24055185) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/nsHttpConnection.cpp:1159
#13 0x00007f5aa49b763b in mozilla::net::nsHttpConnectionMgr::TimeoutTick (this=0x7f5a90190b40) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/nsHttpConnectionMgr.cpp:2773
#14 0x00007f5aa49b71fb in mozilla::net::nsHttpConnectionMgr::Observe (this=0x7f5a90190b40, subject=0x7f5a9115f350, topic=0x7f5aab286636 "timer-callback", data=0x0) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/protocol/http/nsHttpConnectionMgr.cpp:313
#15 0x00007f5aa433ba1d in nsTimerImpl::Fire (this=0x7f5a9115f350) at /home/mcmanus/src/mozilla2/wd/gecko-dev/xpcom/threads/nsTimerImpl.cpp:530
#16 0x00007f5aa430c330 in nsTimerEvent::Run (this=0x7f5a93f70b10) at /home/mcmanus/src/mozilla2/wd/gecko-dev/xpcom/threads/TimerThread.cpp:286
#17 0x00007f5aa43135bb in nsThread::ProcessNextEvent (this=0x7f5ab78b9f00, aMayWait=true, aResult=0x7f5a9b2fc86e) at /home/mcmanus/src/mozilla2/wd/gecko-dev/xpcom/threads/nsThread.cpp:1073
#18 0x00007f5aa438cbbc in NS_ProcessNextEvent (aThread=0x7f5ab78b9f00, aMayWait=true) at /home/mcmanus/src/mozilla2/wd/gecko-dev/xpcom/glue/nsThreadUtils.cpp:290
#19 0x00007f5aa44d1f84 in mozilla::net::nsSocketTransportService::Run (this=0x7f5ab78b9e00) at /home/mcmanus/src/mozilla2/wd/gecko-dev/netwerk/base/nsSocketTransportService2.cpp:911
#20 0x00007f5aa43135bb in nsThread::ProcessNextEvent (this=0x7f5ab78b9f00, aMayWait=false, aResult=0x7f5a9b2fcc0e) at /home/mcmanus/src/mozilla2/wd/gecko-dev/xpcom/threads/nsThread.cpp:1073
#21 0x00007f5aa438cbbc in NS_ProcessNextEvent (aThread=0x7f5ab78b9f00, aMayWait=false) at /home/mcmanus/src/mozilla2/wd/gecko-dev/xpcom/glue/nsThreadUtils.cpp:290
#22 0x00007f5aa4beba46 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x7f5a9f2cfb80, aDelegate=0x7f5ab78a8410) at /home/mcmanus/src/mozilla2/wd/gecko-dev/ipc/glue/MessagePump.cpp:354
#23 0x00007f5aa4b4cd85 in MessageLoop::RunInternal (this=0x7f5ab78a8410) at /home/mcmanus/src/mozilla2/wd/gecko-dev/ipc/chromium/src/base/message_loop.cc:235
#24 0x00007f5aa4b4cd05 in MessageLoop::RunHandler (this=0x7f5ab78a8410) at /home/mcmanus/src/mozilla2/wd/gecko-dev/ipc/chromium/src/base/message_loop.cc:228
#25 0x00007f5aa4b4ccdd in MessageLoop::Run (this=0x7f5ab78a8410) at /home/mcmanus/src/mozilla2/wd/gecko-dev/ipc/chromium/src/base/message_loop.cc:208
#26 0x00007f5aa4310e38 in nsThread::ThreadFunc (aArg=0x7f5ab78b9f00) at /home/mcmanus/src/mozilla2/wd/gecko-dev/xpcom/threads/nsThread.cpp:468
#27 0x00007f5ab8dc2f65 in _pt_root (arg=0x7f5ab789e220) at ../../../../../gecko-dev/nsprpub/pr/src/pthreads/ptthread.c:216
#28 0x00007f5ab88946fa in start_thread (arg=0x7f5a9b2fd700) at pthread_create.c:333
#29 0x00007f5ab7b25b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
I think the best explanation of this patch is the stack trace from comment 6.. Shutdown() empties the various run queues and destroys the streams.. but in the case of a tunnel closing as part of that, the tunnel re-entrantly puts itself in the run queue when calling forceIO.
Comment on attachment 8768377 [details] [diff] [review]
ASpdySession::ForceIO() safety after h2 Shutdown()

Review of attachment 8768377 [details] [diff] [review]:
-----------------------------------------------------------------

Hooray! Thanks for finding this.
Attachment #8768377 - Flags: review?(hurley) → review+
Duplicate of this bug: 1170928
https://hg.mozilla.org/mozilla-central/rev/1df45496d045
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment on attachment 8768377 [details] [diff] [review]
ASpdySession::ForceIO() safety after h2 Shutdown()


Approval Request Comment
[Feature/regressing bug #]: https proxy - 378637 - gecko 38
[User impact if declined]: crashes with some https proxies - zenmate is gaining market traction which is why this crash rate has gone up
[Describe test coverage new/current, TreeHerder]: some treeherder, fix manually verified by reporter
[Risks and why]: medium low - very targeted patch with well understood stack trace
[String/UUID change made/needed]: none


[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: likely a high crash rate just like beta 48 (#3 top crasher) due to changes in the way firefox is used with vpns/https proxies
Fix Landed on Version: 50
Risk to taking this patch (and alternatives if risky): medium low.. could disable feature as alternative but that would regress some installs
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8768377 - Flags: approval-mozilla-esr45?
Attachment #8768377 - Flags: approval-mozilla-beta?
Attachment #8768377 - Flags: approval-mozilla-aurora?
Comment on attachment 8768377 [details] [diff] [review]
ASpdySession::ForceIO() safety after h2 Shutdown()

Review of attachment 8768377 [details] [diff] [review]:
-----------------------------------------------------------------

This patch fixes a regression. Take it in 48 beta 7 and aurora.
Attachment #8768377 - Flags: approval-mozilla-beta?
Attachment #8768377 - Flags: approval-mozilla-beta+
Attachment #8768377 - Flags: approval-mozilla-aurora?
Attachment #8768377 - Flags: approval-mozilla-aurora+
Hi Patrick, I was looking at the stability data for this crash signature on esr45.2.0 and I found only ~12 instances of it from crash-stats vs ~1k on 48.0b5. Is this still a significant crash for ESR45? 

As you may already know, we only take sec-crit and sec-high fixes on ESR45. We can take stability fixes but only for top crashers. I am unsure if this one meets that bar. What do you think?
Flags: needinfo?(mcmanus)
(In reply to Ritu Kothari (:ritu) from comment #16)
> Hi Patrick, I was looking at the stability data for this crash signature on
> esr45.2.0 and I found only ~12 instances of it from crash-stats vs ~1k on
> 48.0b5. Is this still a significant crash for ESR45? 
> 
> As you may already know, we only take sec-crit and sec-high fixes on ESR45.
> We can take stability fixes but only for top crashers. I am unsure if this
> one meets that bar. What do you think?

The code situation on esr-45 and 48 (and for that matter 47) is the same so the only explanation I have for the difference is different usage patterns.. which makes some sense as the major (but not sole) trigger is a newly updated addon (zenmate) and esr users might be slower to take that on.. otoh I don't see why they wouldn't do so eventually and end up with a similar crash rate. That's why I nom'd - your call though.
Flags: needinfo?(mcmanus)
I will add more information: this crash happened when pac-file is set and network connection is breaking. Just 5-60 seconds after I simply remove my network cable from PC(this is important - simply disabling network won't work). 
This may cause two reasons: first one is initial proxy tunel bug, when the connection is missing, and second one: the extension ping own servers for asking some data, so when the request is making and the network is in broken state it crash. 
I do not know wich one of these two is main reason, but this issue is reproducing 100% on my env. 
ff v 46.0.1
I update my ff to latest public version: 47.0.1 and it is also affected.
(In reply to h6.msangel from comment #21)
> I update my ff to latest public version: 47.0.1 and it is also affected.
this issue should be fixed starting with firefox 48 beta 7, which is going to be released in the course of the next days, in firefox nightly (50) & developer edition (49).
hi, in 48.0b7 there continue to be some [@ mozilla::net::TLSFilterTransaction::WriteSegments ] crashes with zenmate present, like https://crash-stats.mozilla.com/report/index/8db4f614-ea8f-48a8-b541-7b3f32160713
could you take a look at that again?
Crash Signature: [@ mozilla::net::Http2Session::ReadSegmentsAgain] → [@ mozilla::net::Http2Session::ReadSegmentsAgain] [@ mozilla::net::TLSFilterTransaction::WriteSegments]
Flags: needinfo?(mcmanus)
can't repro that - but its a null deref on an error path. easy to patch up (i'll do so in a different bug)

can you tell if the crash rate is improved? I really would expect so..
Flags: needinfo?(mcmanus)
early crash data from beta 7 suggests that the mozilla::net::Http2Session::ReadSegmentsAgain signature has totally gone (from 4% of browser crashes in b6), but mozilla::net::TLSFilterTransaction::WriteSegments continues at roughly the same level (1.69% on b6, 1.4% on b7)
Depends on: 1286664
[Tracking Requested - why for this release]:
From the description in comment 6 and the "e5e5e5" marker in the crashing address this is a use-after-free security vulnerability.
Comment on attachment 8768377 [details] [diff] [review]
ASpdySession::ForceIO() safety after h2 Shutdown()

Sec-high, taking in into esr
Attachment #8768377 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
Whiteboard: [spdy][necko-active] → [spdy][necko-active][adv-main48+][adv-esr45.3+]
Duplicate of this bug: 1271695
You need to log in before you can comment on or make changes to this bug.