Closed
Bug 1397686
Opened 5 years ago
Closed 5 years ago
Sometimes network never finish loading
Categories
(Core :: Networking, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla58
People
(Reporter: kanru, Assigned: dragana)
References
Details
Attachments
(4 files, 1 obsolete file)
1.71 MB,
video/webm
|
Details | |
103.31 KB,
text/plain
|
Details | |
440.65 KB,
text/plain
|
Details | |
3.44 KB,
patch
|
u408661
:
review+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Recently in Nightly some websites never finish loading. When it happens all new connection to the same website hang forever. I've captured a recording and a network log.
Reporter | ||
Comment 1•5 years ago
|
||
Reporter | ||
Comment 2•5 years ago
|
||
Comment 3•5 years ago
|
||
Might be the same issue as bug 1396380.
Comment 4•5 years ago
|
||
I met the same problem. Here is the main process log. From devtool, it seems getting blocked on loading javascript file from "cdnjs.cloudflare.com" domain.
Comment 5•5 years ago
|
||
From the log provided by @evelyn, the http request is pending because cdnjs.cloudflare.com already has six connections. >nsHttpConnectionMgr::AtActiveConnectionLimit [ci=.S.....[tlsflags0x00000000]cdnjs.cloudflare.com:443 caps=129,totalCount=6, maxPersistConns=6] cdnjs.cloudflare.com support HTTP2, however in this log we hit the six connection limit for this domain. @kershaw and I guess that there were 6 pending halfopened connections and we didn't aggressively prune these "half dead" connections. We should do that before this line of code https://searchfox.org/mozilla-central/rev/05c4c3bc0cfb9b0fc66bdfc8c47cac674e45f151/netwerk/protocol/http/nsHttpConnectionMgr.cpp#1438.
Assignee | ||
Comment 6•5 years ago
|
||
I am going through my bug mail back from vacation. We should really look into this and not lose track of it. I will try to look at it tomorrow. If someone else have time please take it.
Assignee | ||
Updated•5 years ago
|
tracking-firefox57:
--- → ?
Priority: -- → P1
Assignee | ||
Comment 7•5 years ago
|
||
We have 6 http2 connections and non of them can accept new transaction. I do not know why is so because log starts to late. At the end of the log the 6 connections are closed. Kan-Ru, Evelyn, can one of you make a http log that starts before opening the affected page? To make http log: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Flags: needinfo?(kchen)
Flags: needinfo?(ehung)
Comment 8•5 years ago
|
||
Unfortunately I don't have a reliable STR but I will try to log more next time if I meet the problem.
Flags: needinfo?(ehung)
Comment 9•5 years ago
|
||
FWIW I created a clean profile for my previous log. There were only three pages being loaded with this profile: 1. https://bugzilla.mozilla.org/show_bug.cgi?id=1397686 2. http://localhost:8000/example/index.html 3. https://kanru.info/blog/ and the problematic "cdnjs.cloudflare.com" domain was loaded by (2) I tried to do it again but couldn't reproduce it. It was relatively easy to reproduce two days ago. I applied Nightly updated everyday, btw.
Assignee | ||
Comment 10•5 years ago
|
||
Can you download the 2-days-old Nightly and try to reproduce. I do not remember seeing anything landing in last 2-3 days that could influence this behavior. You can find Nightlies at: https://download-installer.cdn.mozilla.net/pub/firefox/nightly/2017/09/
Comment 11•5 years ago
|
||
I downloaded a 9/16 build but still failed to reproduce. :( What I did last time was keep refreshing the localhost page because I was testing some css properties. The page I ran was very simple, it just loaded two JS files (hard-coded urls to cdnjs.cloudflare.com). These two files should be cached in this case (I hit ctrl+r for reload), so I didn't know why it's blocked by networking.
Assignee | ||
Comment 12•5 years ago
|
||
I missed to see the first time that http2 connections are not reading any data for a log time: 2017-09-16 18:40:14.825905 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x11ea13000 delta since last read 18s 2017-09-16 18:40:14.825923 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x12ea15800 delta since last read 18s 2017-09-16 18:40:14.825940 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x1354d4000 delta since last read 18s 2017-09-16 18:40:14.825958 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x12ab39000 delta since last read 36s 2017-09-16 18:40:14.825974 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x12ab47000 delta since last read 36s 2017-09-16 18:40:14.825989 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x135bed000 delta since last read 45s This sounds like bug 1395494, but TFO is turned off since 9th so it is off on the 9/16 Nightly build.
Comment 13•5 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #12) > I missed to see the first time that http2 connections are not reading any > data for a log time: > > 2017-09-16 18:40:14.825905 UTC - [Socket Thread]: I/nsHttp > Http2Session::ReadTimeoutTick 0x11ea13000 delta since last read 18s > 2017-09-16 18:40:14.825923 UTC - [Socket Thread]: I/nsHttp > Http2Session::ReadTimeoutTick 0x12ea15800 delta since last read 18s > 2017-09-16 18:40:14.825940 UTC - [Socket Thread]: I/nsHttp > Http2Session::ReadTimeoutTick 0x1354d4000 delta since last read 18s > 2017-09-16 18:40:14.825958 UTC - [Socket Thread]: I/nsHttp > Http2Session::ReadTimeoutTick 0x12ab39000 delta since last read 36s > 2017-09-16 18:40:14.825974 UTC - [Socket Thread]: I/nsHttp > Http2Session::ReadTimeoutTick 0x12ab47000 delta since last read 36s > 2017-09-16 18:40:14.825989 UTC - [Socket Thread]: I/nsHttp > Http2Session::ReadTimeoutTick 0x135bed000 delta since last read 45s > > > This sounds like bug 1395494, but TFO is turned off since 9th so it is off > on the 9/16 Nightly build. It doesn't seem to be the case of bug 1395494. In that case you should see "Ping Timer Exhaustion" from the log.
Assignee | ||
Comment 14•5 years ago
|
||
(In reply to Shian-Yow Wu [:swu] from comment #13) > (In reply to Dragana Damjanovic [:dragana] from comment #12) > > I missed to see the first time that http2 connections are not reading any > > data for a log time: > > > > 2017-09-16 18:40:14.825905 UTC - [Socket Thread]: I/nsHttp > > Http2Session::ReadTimeoutTick 0x11ea13000 delta since last read 18s > > 2017-09-16 18:40:14.825923 UTC - [Socket Thread]: I/nsHttp > > Http2Session::ReadTimeoutTick 0x12ea15800 delta since last read 18s > > 2017-09-16 18:40:14.825940 UTC - [Socket Thread]: I/nsHttp > > Http2Session::ReadTimeoutTick 0x1354d4000 delta since last read 18s > > 2017-09-16 18:40:14.825958 UTC - [Socket Thread]: I/nsHttp > > Http2Session::ReadTimeoutTick 0x12ab39000 delta since last read 36s > > 2017-09-16 18:40:14.825974 UTC - [Socket Thread]: I/nsHttp > > Http2Session::ReadTimeoutTick 0x12ab47000 delta since last read 36s > > 2017-09-16 18:40:14.825989 UTC - [Socket Thread]: I/nsHttp > > Http2Session::ReadTimeoutTick 0x135bed000 delta since last read 45s > > > > > > This sounds like bug 1395494, but TFO is turned off since 9th so it is off > > on the 9/16 Nightly build. > > It doesn't seem to be the case of bug 1395494. In that case you should see > "Ping Timer Exhaustion" from the log. In this log I firefox or the tab had been closed before the timeout. That cause canceling connection before timeout (I think it must be closing of ff not the tab)
Reporter | ||
Comment 15•5 years ago
|
||
I haven't been able to reproduce this for a while. I'll update once I see this again.
Flags: needinfo?(kchen)
Reporter | ||
Comment 16•5 years ago
|
||
It happened today. From about:networking HTTP tab Hostname Port HTTP/2 SSL Active Idle cdnjs.cloudflare.com 443 true true 6 0 When this happens all sites using cdnjs will stuck on reload. It looks like so far this has only happened on cdnjs.cloudflare.com
Reporter | ||
Comment 17•5 years ago
|
||
I found a way to reproduce. Just open a url that uses cdnjs.cloudflare.com, for example https://kanru.github.io/hasal-dashboard/?app=xxx and repeatedly hit F5 to reload. The connection count in about:networking will increase slowly and finally reach 6.
Reporter | ||
Comment 18•5 years ago
|
||
The log files are huge so I put them in a dropbox folder. https://www.dropbox.com/sh/i70n0txr3hmsgdo/AAB0sCHdG2Zngy4Njg3GkteYa?dl=0
Assignee | ||
Comment 19•5 years ago
|
||
Thank you! I will take a look.
Assignee | ||
Comment 21•5 years ago
|
||
The problem is TLS early-data and transactions that are not safe to send early data, e.g. POST method. During early-data we find a transaction in mReadyForWrite that cannot do early data. We remove it from the queue but we never put it back to the queue after early-data finishes. It is not surprise that this is happening with cloudflare, since they were the first to support early-data on the server side.
Assignee | ||
Comment 22•5 years ago
|
||
Updated•5 years ago
|
status-firefox57:
--- → affected
Comment 23•5 years ago
|
||
Comment on attachment 8910747 [details] [diff] [review] bug_1397686.patch Review of attachment 8910747 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/Http2Session.cpp @@ +3388,5 @@ > mOutputQueueSent = 0; > } > } else { > // 0RTT succeeded > + for (size_t i = 0; i < mCannotDo0RTTStreams.Length(); ++i) { This looks like it's fine for when 0rtt succeeds, but what about the case just above here, where 0rtt fails, but we still are doing h2? I think we need to put these transactions back in the queue then too, right?
Attachment #8910747 -
Flags: review?(hurley)
Assignee | ||
Comment 24•5 years ago
|
||
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #23) > Comment on attachment 8910747 [details] [diff] [review] > bug_1397686.patch > > Review of attachment 8910747 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: netwerk/protocol/http/Http2Session.cpp > @@ +3388,5 @@ > > mOutputQueueSent = 0; > > } > > } else { > > // 0RTT succeeded > > + for (size_t i = 0; i < mCannotDo0RTTStreams.Length(); ++i) { > > This looks like it's fine for when 0rtt succeeds, but what about the case > just above here, where 0rtt fails, but we still are doing h2? I think we > need to put these transactions back in the queue then too, right? yes, :)
Assignee | ||
Comment 25•5 years ago
|
||
Attachment #8910747 -
Attachment is obsolete: true
Attachment #8910886 -
Flags: review?(hurley)
Attachment #8910886 -
Flags: review?(hurley) → review+
Comment 26•5 years ago
|
||
Pushed by dd.mozilla@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5cf7391a9825 The streams that cannot do 0RTT data need to be put in list and added to mReadyForWrite when 0RTT is done. r=hurley
Comment 27•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5cf7391a9825
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment 28•5 years ago
|
||
Do we want this in ff57?
Comment 29•5 years ago
|
||
Oh, forget I asked. This won't ever reach release.
Assignee | ||
Comment 30•5 years ago
|
||
I will uplift it to 57, it is a rather small isolated change. I assume that TLS 1.3 will be disabled on 57?
Comment 31•5 years ago
|
||
Should be: bug 1400291.
Assignee | ||
Comment 32•5 years ago
|
||
Anyway we need this for beta, it is annoying behavior that this bug produces.
Assignee | ||
Comment 33•5 years ago
|
||
Comment on attachment 8910886 [details] [diff] [review] bug_1397686.patch Approval Request Comment [Feature/Bug causing the regression]: TLS early-data with Http2 (bug 1322373) [User impact if declined]: If TLS early-data is use it can happen that a request never finish loading. Depens which resource is affected it can produce no content on the page and the load will never finish. [Is this code covered by automated tests?]: no [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: It a small isolated piece of code. This is a TLS 1.3 only feature and tls 1.3 will be disabled on the 57 release(bug 1400291), but it is still enabled on beta. [String changes made/needed]: none
Attachment #8910886 -
Flags: approval-mozilla-beta?
Comment 34•5 years ago
|
||
Comment on attachment 8910886 [details] [diff] [review] bug_1397686.patch Sure, let's take it in 57b3
Attachment #8910886 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 35•5 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/d285547c68af
You need to log in
before you can comment on or make changes to this bug.
Description
•