Closed
Bug 1362821
Opened 7 years ago
Closed 7 years ago
Networking stops occasionally
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr45 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | fixed |
People
(Reporter: euthanasia_waltz, Assigned: dragana)
References
Details
(Keywords: regression, Whiteboard: [necko-active])
Attachments
(2 files, 3 obsolete files)
708.05 KB,
application/x-zip-compressed
|
Details | |
7.68 KB,
patch
|
dragana
:
review+
|
Details | Diff | Splinter Review |
As of nightly build 20170506030204, networking stops occasionally while "Performing a TLS handshake to ...", "Transferring data from ...", "Waiting for ...", ... on various sites at random timing. It seems that disabling TCP fastopen (network.tcp.tcp_fastopen_enable=false) solves the problem. OS: Windows10 1703 (15063.138) x64 CPU: Intel Core i3 4010U Net: Realtek PCIe GBE Family Controller (driver ver. 9.1.406.2015)
"networking" means each operation(Performing a TLS handshake, Transferring data, Waiting, ..."), "stop" means it won't be completed/finished. Reload or any other actions can be done after this issue.
Assignee | ||
Comment 3•7 years ago
|
||
(In reply to atlanto from comment #0) > As of nightly build 20170506030204, networking stops occasionally while > "Performing a TLS handshake to ...", "Transferring data from ...", "Waiting > for ...", ... on various sites at random timing. > > It seems that disabling TCP fastopen (network.tcp.tcp_fastopen_enable=false) > solves the problem. > > OS: Windows10 1703 (15063.138) x64 > CPU: Intel Core i3 4010U > Net: Realtek PCIe GBE Family Controller (driver ver. 9.1.406.2015) Can you please try to make a http log when this happens. This feature depend a lot on a internet path, so I am not sure it could be easy to reproduce. It would be very helpful if you could capture it in a log: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging rotate:200 limits log to only 200MB, when that limit is reach the oldest entry in the log will be overwritten. maybe increase it a bit or at least stop Firefox as soon as it start happening so that the relevant part of the log is still in the log and is not overwritten. also change the logging to: set MOZ_LOG=timestamp,rotate:200,nsHttp:5,nsSocketTransport:5 Thanks a lot!
Flags: needinfo?(dd.mozilla) → needinfo?(euthanasia_waltz)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → dd.mozilla
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [necko-active]
On startup, opening https://www.yahoo.co.jp/ as home page. Staying "Performing a TLS handshake to s.yimg.jp...", then exit.
Flags: needinfo?(euthanasia_waltz)
Assignee | ||
Comment 5•7 years ago
|
||
Thanks a lot!
Assignee | ||
Comment 6•7 years ago
|
||
The problem here is that a TFO connection is in progress when another connection that uses H2 to the same host is established. Therefore the backup HalfOpenSocket for the TFO connection gets destroyed and also wrongly clears input/outputStream callbacks. In this case we can close the TFO connection and return transaction to the pending queue.
Assignee | ||
Comment 7•7 years ago
|
||
I will write this down to see if I do not mix up logic here: a TFO connection has transport and that transport has a reference to HalfOpenSocket as a backup. This HalfOpenSocket is in the HalfOpenSocket list but it is not count as active socket: 1) socketTransport can use TFO (mFastOpenCallback->FastOpenEnabled() == true). It calls mFastOpenCallback->StartFastOpen() to start a new connection but HalfOpenSocket is kept as well. The connection is counted as active but HalfOpenSocket is not counted as active (i.e. it calls gHttpHandler->ConnMgr()->RecvdConnect() to decrease counter) The halfOpenSocket has reference to SocketTransport but that socketTransport belongs to the connection. It also keeps reference to the connection(mConnectionNegotiatingFastOpen). If Fast open is not supported we continue in the same way (this is to keep a timer for a backup transport). 2) a) the TFO socketTransport succeeds to connect -> socketTransport::OnSocketConnected calls SetFastOpenConnected. The HalfOpenSocket backup timer is canceled and the HalfOpenSocket's reference to socketTransport is deleted. If backup transport has started this HalfOpenSocket will be counted as active for its backup stream (gHttpHandler->ConnMgr()->StartedConnect() is called in SetupStream) b) socketTransport receives an error and it cannot recover from that error -> nsSocketTransport::OnSocketDetached calls SetFastOpenConnected -> The HalfOpenSocket backup timer is canceled and the HalfOpenSocket's reference to socketTransport is deleted (the same as in case of success) c) socketTransport receives an error and it can recover. This errors are: NS_ERROR_CONNECTION_REFUSED and NS_ERROR_NET_TIMEOUT -> disable fastopen and try again, i.e. nsSocketTransport::RecoverFromError calls SetFastOpenConnected -> it closes connection (but it does not close socketTransport), transaction is return to pending queue and the halfOpenSocket takes the socketTransport and continues as if TFO has never been used. It also add its self to the active halfOpenSocket counter(i.e. it calls gHttpHandler->ConnMgr()->StartedConnect()). NS_ERROR_PROXY_CONNECTION_REFUSED -> I will add this one to the list above. NS_ERROR_UNKNOWN_HOST and NS_ERROR_UNKNOWN_PROXY_HOST are happening before TFO has started. 3) The backup transport for the connection is connected, i.e. nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady for backup transport is called. if the HalfOpenSocket still has mConnectionNegotiatingFastOpen, we will cancel the connection also closed its socketTransport and return transaction to the pending queue (this transaction is going to be picked up by this new backup socketTransport that is connected) 4) transaction gets canceled during TFO: https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpConnectionMgr.cpp#2240 The transaction has connection and transaction cannot be IsDone() so we call conn->CloseTransaction(trans, closeCode); This will set socketTransport condition to error code. And in the next poll iteration the socket will be detached -> OnSocketDetached is called and this calls SetFastOpenConnected which performs action described in 2b above. Here everything is fine, this code path does not cause this hang or crashes. 5) nsHttpConnectionMgr::TimeoutTick calls the tick handler for each active connection. The TFO connection is one of them. and then it cancels stalled halfOpenSockets. This will be called for the HalfOpenSocket belonging to TFO connection as well. I think this will never happen because of timing but if someone changes timeout durations it could happen. I will fix this: do not cancel HalfOpenSocket belonging to TFO connection here. 6) nsHttpConnectionMgr::OnMsgShutdown: conn->CloseTransaction for active connection is called first. The TFO connections are among them, then all HalfOpenSockets are Abandon(). The HalfOpenSocket belonging to a TFO connection as well. This can be a problem because socketTransport holds reference to the HalfOpenSocket that will be deleted. But is probably not causing bug 1362832. I will fix this one. 7) nsHttpConnectionMgr::UpdateCoalescingForNewConn this is this bug and I am almost 100% sure bug 1362832 as well. Patrick rewrote this code when TFO was almost done and I have not look into this properly, my fault. We could cancel the socketTransport and move transaction into pending queue. This probably causes bug 1362959 as well :)
Assignee | ||
Comment 8•7 years ago
|
||
Calling mFastOpenCallback->SetTransportConnected(reason) in nsSocketTransport::Close will solve problem under 6).
Assignee | ||
Comment 9•7 years ago
|
||
Attachment #8865422 -
Flags: review?(mcmanus)
Assignee | ||
Comment 10•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=34a9426f59caf6ceacf2e7235043ee3f0015d69a
Comment 11•7 years ago
|
||
comment 7 is helpful. sounds like 1362959 will be solved with this - I'm not sure about 1362832. dragana?
Updated•7 years ago
|
Attachment #8865422 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #11) > comment 7 is helpful. sounds like 1362959 will be solved with this - I'm not > sure about 1362832. dragana? I have added a check where 1362832 fails: if (mFastOpenCallback) { mFastOpenCallback->SetFastOpenConnected(mCondition); } and also adding mFastOpenCallback->SetFastOpenConnected(mCondition); mFastOpenCallback = nullptr; to nsSocketTransport::Close will resolved bug 1362832. I think, but I will observe crash stats.
Updated•7 years ago
|
Blocks: 1188435
status-firefox53:
--- → unaffected
status-firefox54:
--- → unaffected
status-firefox55:
--- → affected
status-firefox-esr45:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Keywords: regression
Assignee | ||
Comment 13•7 years ago
|
||
I have realized that in the same patch I have added mFastOpenCallback->SetFastOpenConnected(mCondition); to nsSocketTransport::Close which is going to cancel BackupTimer and probably destroy halfOpenSocket. So we need to hold a reference. Sorry.
Attachment #8865422 -
Attachment is obsolete: true
Attachment #8865433 -
Flags: review?(mcmanus)
Assignee | ||
Updated•7 years ago
|
Attachment #8865433 -
Flags: review?(mcmanus)
Assignee | ||
Comment 14•7 years ago
|
||
Attachment #8865433 -
Attachment is obsolete: true
Attachment #8865434 -
Flags: review?(mcmanus)
Assignee | ||
Comment 15•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4c0f63313da2829da4df5ccf235c94a15e147559
Comment 16•7 years ago
|
||
Comment on attachment 8865434 [details] [diff] [review] bug_1362821_v1.patch Review of attachment 8865434 [details] [diff] [review]: ----------------------------------------------------------------- the interdiff aspects of mozreview would really shine here :) ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ +793,4 @@ > LOG(("UpdateCoalescingForNewConn() forcing halfopen abandon %p\n", > + half)); > + > + RefPtr<nsHalfOpenSocket> halfPtr; maybe name it deleteProtector
Attachment #8865434 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 17•7 years ago
|
||
Attachment #8865434 -
Attachment is obsolete: true
Attachment #8865485 -
Flags: review+
Comment 18•7 years ago
|
||
Pushed by dd.mozilla@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b31a06dd4936 Properly destroy a HalfOpeSocket that is used as a backup for a TFO connection if another H2 connection is established for the same host. r=mcmanus
Comment 19•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b31a06dd4936
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 20•7 years ago
|
||
I still see this on nightly with network.tcp.tcp_fastopen_enable=true. Certain script fetches (usually from google-analytics.com or amazon ad subdomains) frequently hang on "Performing TLS handshake" or "Waiting for...". uBlock Origin usually prevents this due to ad-blocking, but when uBlock is disabled I see hangs that can only be stopped once the slow-script warning appears. I can consistently reproduce this on https://servethehome.com/.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 21•7 years ago
|
||
Sorry for the bug-spam; turns out this is not related to tcp fastopen. I'll collect more info and open another bug report.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Comment 22•7 years ago
|
||
I am getting the error message on startup, opening https://www.google.com as home page. Staying "Performing a TLS handshake to ...", then exit. The issue appears might be resolved or has it been reopened? My problem started 5 days ago when a Windows 10 patch was done. OS: Windows10 Home 1703 (15063.483) x64. Can anyone confirm what the fix is?
Comment 23•7 years ago
|
||
Same issue as 1desechalhan. Started a few days ago. Will frequently, and somewhat randomly hang on "Performing a TLS handshake to...". Sometimes refreshing the page or restarting the browser fixes the issue, but only temporarily. Already tried creating a new Firefox profile, but it doesn't help. Same thing happens on Firefox Developer Edition. Google search is where this most commonly happens, but it also happened while browsing to various other sites such as Patreon.
Comment 24•7 years ago
|
||
Same issue here (randomly triggered). FF: 15.0.2 (64bit), windows 10 pro N (1703) os build 15063.54
Assignee | ||
Comment 25•7 years ago
|
||
I will open another bug for the last 3 comments. It is easier to track the issue, since this bug is closed.
Assignee | ||
Comment 26•7 years ago
|
||
The new bug is 1391543.
Comment 27•7 years ago
|
||
Needs "perf" key word?
You need to log in
before you can comment on or make changes to this bug.
Description
•