Closed
Bug 1397686
Opened 8 years ago
Closed 8 years ago
Sometimes network never finish loading
Categories
(Core :: Networking, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla58
People
(Reporter: kanru, Assigned: dragana)
References
Details
Attachments
(4 files, 1 obsolete file)
|
1.71 MB,
video/webm
|
Details | |
|
103.31 KB,
text/plain
|
Details | |
|
440.65 KB,
text/plain
|
Details | |
|
3.44 KB,
patch
|
u408661
:
review+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Recently in Nightly some websites never finish loading. When it happens all new connection to the same website hang forever.
I've captured a recording and a network log.
| Reporter | ||
Comment 1•8 years ago
|
||
| Reporter | ||
Comment 2•8 years ago
|
||
Comment 3•8 years ago
|
||
Might be the same issue as bug 1396380.
Comment 4•8 years ago
|
||
I met the same problem. Here is the main process log. From devtool, it seems getting blocked on loading javascript file from "cdnjs.cloudflare.com" domain.
Comment 5•8 years ago
|
||
From the log provided by @evelyn, the http request is pending because cdnjs.cloudflare.com already has six connections.
>nsHttpConnectionMgr::AtActiveConnectionLimit [ci=.S.....[tlsflags0x00000000]cdnjs.cloudflare.com:443 caps=129,totalCount=6, maxPersistConns=6]
cdnjs.cloudflare.com support HTTP2, however in this log we hit the six connection limit for this domain. @kershaw and I guess that there were 6 pending halfopened connections and we didn't aggressively prune these "half dead" connections. We should do that before this line of code https://searchfox.org/mozilla-central/rev/05c4c3bc0cfb9b0fc66bdfc8c47cac674e45f151/netwerk/protocol/http/nsHttpConnectionMgr.cpp#1438.
| Assignee | ||
Comment 6•8 years ago
|
||
I am going through my bug mail back from vacation. We should really look into this and not lose track of it.
I will try to look at it tomorrow. If someone else have time please take it.
| Assignee | ||
Updated•8 years ago
|
tracking-firefox57:
--- → ?
Priority: -- → P1
| Assignee | ||
Comment 7•8 years ago
|
||
We have 6 http2 connections and non of them can accept new transaction. I do not know why is so because log starts to late. At the end of the log the 6 connections are closed.
Kan-Ru, Evelyn, can one of you make a http log that starts before opening the affected page?
To make http log:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Flags: needinfo?(kchen)
Flags: needinfo?(ehung)
Comment 8•8 years ago
|
||
Unfortunately I don't have a reliable STR but I will try to log more next time if I meet the problem.
Flags: needinfo?(ehung)
Comment 9•8 years ago
|
||
FWIW I created a clean profile for my previous log. There were only three pages being loaded with this profile:
1. https://bugzilla.mozilla.org/show_bug.cgi?id=1397686
2. http://localhost:8000/example/index.html
3. https://kanru.info/blog/
and the problematic "cdnjs.cloudflare.com" domain was loaded by (2)
I tried to do it again but couldn't reproduce it. It was relatively easy to reproduce two days ago. I applied Nightly updated everyday, btw.
| Assignee | ||
Comment 10•8 years ago
|
||
Can you download the 2-days-old Nightly and try to reproduce. I do not remember seeing anything landing in last 2-3 days that could influence this behavior.
You can find Nightlies at:
https://download-installer.cdn.mozilla.net/pub/firefox/nightly/2017/09/
Comment 11•8 years ago
|
||
I downloaded a 9/16 build but still failed to reproduce. :(
What I did last time was keep refreshing the localhost page because I was testing some css properties.
The page I ran was very simple, it just loaded two JS files (hard-coded urls to cdnjs.cloudflare.com). These two files should be cached in this case (I hit ctrl+r for reload), so I didn't know why it's blocked by networking.
| Assignee | ||
Comment 12•8 years ago
|
||
I missed to see the first time that http2 connections are not reading any data for a log time:
2017-09-16 18:40:14.825905 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x11ea13000 delta since last read 18s
2017-09-16 18:40:14.825923 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x12ea15800 delta since last read 18s
2017-09-16 18:40:14.825940 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x1354d4000 delta since last read 18s
2017-09-16 18:40:14.825958 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x12ab39000 delta since last read 36s
2017-09-16 18:40:14.825974 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x12ab47000 delta since last read 36s
2017-09-16 18:40:14.825989 UTC - [Socket Thread]: I/nsHttp Http2Session::ReadTimeoutTick 0x135bed000 delta since last read 45s
This sounds like bug 1395494, but TFO is turned off since 9th so it is off on the 9/16 Nightly build.
Comment 13•8 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #12)
> I missed to see the first time that http2 connections are not reading any
> data for a log time:
>
> 2017-09-16 18:40:14.825905 UTC - [Socket Thread]: I/nsHttp
> Http2Session::ReadTimeoutTick 0x11ea13000 delta since last read 18s
> 2017-09-16 18:40:14.825923 UTC - [Socket Thread]: I/nsHttp
> Http2Session::ReadTimeoutTick 0x12ea15800 delta since last read 18s
> 2017-09-16 18:40:14.825940 UTC - [Socket Thread]: I/nsHttp
> Http2Session::ReadTimeoutTick 0x1354d4000 delta since last read 18s
> 2017-09-16 18:40:14.825958 UTC - [Socket Thread]: I/nsHttp
> Http2Session::ReadTimeoutTick 0x12ab39000 delta since last read 36s
> 2017-09-16 18:40:14.825974 UTC - [Socket Thread]: I/nsHttp
> Http2Session::ReadTimeoutTick 0x12ab47000 delta since last read 36s
> 2017-09-16 18:40:14.825989 UTC - [Socket Thread]: I/nsHttp
> Http2Session::ReadTimeoutTick 0x135bed000 delta since last read 45s
>
>
> This sounds like bug 1395494, but TFO is turned off since 9th so it is off
> on the 9/16 Nightly build.
It doesn't seem to be the case of bug 1395494. In that case you should see "Ping Timer Exhaustion" from the log.
| Assignee | ||
Comment 14•8 years ago
|
||
(In reply to Shian-Yow Wu [:swu] from comment #13)
> (In reply to Dragana Damjanovic [:dragana] from comment #12)
> > I missed to see the first time that http2 connections are not reading any
> > data for a log time:
> >
> > 2017-09-16 18:40:14.825905 UTC - [Socket Thread]: I/nsHttp
> > Http2Session::ReadTimeoutTick 0x11ea13000 delta since last read 18s
> > 2017-09-16 18:40:14.825923 UTC - [Socket Thread]: I/nsHttp
> > Http2Session::ReadTimeoutTick 0x12ea15800 delta since last read 18s
> > 2017-09-16 18:40:14.825940 UTC - [Socket Thread]: I/nsHttp
> > Http2Session::ReadTimeoutTick 0x1354d4000 delta since last read 18s
> > 2017-09-16 18:40:14.825958 UTC - [Socket Thread]: I/nsHttp
> > Http2Session::ReadTimeoutTick 0x12ab39000 delta since last read 36s
> > 2017-09-16 18:40:14.825974 UTC - [Socket Thread]: I/nsHttp
> > Http2Session::ReadTimeoutTick 0x12ab47000 delta since last read 36s
> > 2017-09-16 18:40:14.825989 UTC - [Socket Thread]: I/nsHttp
> > Http2Session::ReadTimeoutTick 0x135bed000 delta since last read 45s
> >
> >
> > This sounds like bug 1395494, but TFO is turned off since 9th so it is off
> > on the 9/16 Nightly build.
>
> It doesn't seem to be the case of bug 1395494. In that case you should see
> "Ping Timer Exhaustion" from the log.
In this log I firefox or the tab had been closed before the timeout. That cause canceling connection before timeout (I think it must be closing of ff not the tab)
| Reporter | ||
Comment 15•8 years ago
|
||
I haven't been able to reproduce this for a while. I'll update once I see this again.
Flags: needinfo?(kchen)
| Reporter | ||
Comment 16•8 years ago
|
||
It happened today. From about:networking HTTP tab
Hostname Port HTTP/2 SSL Active Idle
cdnjs.cloudflare.com 443 true true 6 0
When this happens all sites using cdnjs will stuck on reload.
It looks like so far this has only happened on cdnjs.cloudflare.com
| Reporter | ||
Comment 17•8 years ago
|
||
I found a way to reproduce.
Just open a url that uses cdnjs.cloudflare.com, for example https://kanru.github.io/hasal-dashboard/?app=xxx and repeatedly hit F5 to reload. The connection count in about:networking will increase slowly and finally reach 6.
| Reporter | ||
Comment 18•8 years ago
|
||
The log files are huge so I put them in a dropbox folder.
https://www.dropbox.com/sh/i70n0txr3hmsgdo/AAB0sCHdG2Zngy4Njg3GkteYa?dl=0
| Assignee | ||
Comment 19•8 years ago
|
||
Thank you! I will take a look.
| Assignee | ||
Comment 21•8 years ago
|
||
The problem is TLS early-data and transactions that are not safe to send early data, e.g. POST method. During early-data we find a transaction in mReadyForWrite that cannot do early data. We remove it from the queue but we never put it back to the queue after early-data finishes.
It is not surprise that this is happening with cloudflare, since they were the first to support early-data on the server side.
| Assignee | ||
Comment 22•8 years ago
|
||
Updated•8 years ago
|
status-firefox57:
--- → affected
Comment 23•8 years ago
|
||
Comment on attachment 8910747 [details] [diff] [review]
bug_1397686.patch
Review of attachment 8910747 [details] [diff] [review]:
-----------------------------------------------------------------
::: netwerk/protocol/http/Http2Session.cpp
@@ +3388,5 @@
> mOutputQueueSent = 0;
> }
> } else {
> // 0RTT succeeded
> + for (size_t i = 0; i < mCannotDo0RTTStreams.Length(); ++i) {
This looks like it's fine for when 0rtt succeeds, but what about the case just above here, where 0rtt fails, but we still are doing h2? I think we need to put these transactions back in the queue then too, right?
Attachment #8910747 -
Flags: review?(hurley)
| Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #23)
> Comment on attachment 8910747 [details] [diff] [review]
> bug_1397686.patch
>
> Review of attachment 8910747 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: netwerk/protocol/http/Http2Session.cpp
> @@ +3388,5 @@
> > mOutputQueueSent = 0;
> > }
> > } else {
> > // 0RTT succeeded
> > + for (size_t i = 0; i < mCannotDo0RTTStreams.Length(); ++i) {
>
> This looks like it's fine for when 0rtt succeeds, but what about the case
> just above here, where 0rtt fails, but we still are doing h2? I think we
> need to put these transactions back in the queue then too, right?
yes, :)
| Assignee | ||
Comment 25•8 years ago
|
||
Attachment #8910747 -
Attachment is obsolete: true
Attachment #8910886 -
Flags: review?(hurley)
Attachment #8910886 -
Flags: review?(hurley) → review+
Comment 26•8 years ago
|
||
Pushed by dd.mozilla@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5cf7391a9825
The streams that cannot do 0RTT data need to be put in list and added to mReadyForWrite when 0RTT is done. r=hurley
Comment 27•8 years ago
|
||
| bugherder | ||
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment 28•8 years ago
|
||
Do we want this in ff57?
Comment 29•8 years ago
|
||
Oh, forget I asked. This won't ever reach release.
| Assignee | ||
Comment 30•8 years ago
|
||
I will uplift it to 57, it is a rather small isolated change.
I assume that TLS 1.3 will be disabled on 57?
Comment 31•8 years ago
|
||
Should be: bug 1400291.
| Assignee | ||
Comment 32•8 years ago
|
||
Anyway we need this for beta, it is annoying behavior that this bug produces.
| Assignee | ||
Comment 33•8 years ago
|
||
Comment on attachment 8910886 [details] [diff] [review]
bug_1397686.patch
Approval Request Comment
[Feature/Bug causing the regression]: TLS early-data with Http2 (bug 1322373)
[User impact if declined]: If TLS early-data is use it can happen that a request never finish loading. Depens which resource is affected it can produce no content on the page and the load will never finish.
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: It a small isolated piece of code. This is a TLS 1.3 only feature and tls 1.3 will be disabled on the 57 release(bug 1400291), but it is still enabled on beta.
[String changes made/needed]: none
Attachment #8910886 -
Flags: approval-mozilla-beta?
Comment 34•8 years ago
|
||
Comment on attachment 8910886 [details] [diff] [review]
bug_1397686.patch
Sure, let's take it in 57b3
Attachment #8910886 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 35•8 years ago
|
||
| bugherder uplift | ||
You need to log in
before you can comment on or make changes to this bug.
Description
•