Closed
Bug 1430659
(ipv4-faster-fallback)
Opened 6 years ago
Closed 6 years ago
Network requests in "blocked" state for long time (1+ minute) when ipv6 hosts are no longer available
Categories
(Core :: Networking: DNS, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: kats, Assigned: mayhemer)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged])
Attachments
(3 files, 2 obsolete files)
Backstory: I've noticed a few times now that if I've left Firefox open for a long time (many days to a few weeks) I sometimes get really bad network behaviour, where page loads get really slow. Once it starts happening it seems easy enough to repro, and I usually have to restart Firefox to fix it. It happened again today and I opened the network console while loading the page. This was from reloading a google sheets page - it took many minutes to finish, and per the network console, a couple of requests were in the "blocked" state for a long time (see attached screenshot). Is this expected behaviour, or is there something I can do to debug this further when this happens?
Comment 1•6 years ago
|
||
I've seen this too. I have no idea how to reproduce it, but when it happens, sites that aren't fully functional before the load event, and sites that have UI elements that block on AJAX, become pretty much unusable until I restart.
Comment 2•6 years ago
|
||
@kats and @kmag, it sounds like the network blocking issue happens randomly but it's reproducible once it happened? Can you follow https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking and provide corresponding MOZ_LOG when you hit this issue again? It will be also helpful if you can provide the URL that seems reproducing this issue, therefore we can try it on our own. BTW, what platform you use when encountering this issue?
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(bugmail)
Comment 3•6 years ago
|
||
Not sure if it is the same as bug 1405231 based on current information.
See Also: → 1405231
Reporter | ||
Comment 4•6 years ago
|
||
Thanks, I'll update this bug with the requested info next time I see it. In this instance I saw it on https://docs.google.com/spreadsheets/d/1DcqaMvt3I38HIl3-RDS_6aiX67nQWItNcwmVt5utbDk/edit#gid=659543591 on OS X, Firefox 57.0.4.
Flags: needinfo?(bugmail)
Assignee | ||
Updated•6 years ago
|
Whiteboard: ni?2018-01-16
Reporter | ||
Comment 5•6 years ago
|
||
This happened again, same URL as comment 4. Firefox 58.0.1 release this time. I got the log as requested - i started the log, reloaded the google sheet, let it go for a few minutes, then stopped the log. At this point the page still hadn't finished loading. gzipped log is attached.
Assignee | ||
Comment 7•6 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5) > Created attachment 8949854 [details] > network-slow.txt-main.796.gz > > This happened again, same URL as comment 4. Firefox 58.0.1 release this > time. I got the log as requested - i started the log, reloaded the google > sheet, let it go for a few minutes, then stopped the log. At this point the > page still hadn't finished loading. gzipped log is attached. There is only one channel for that URL in this log and it's done in 740ms. Child logs are missing, so hard to say if the problem is on the child side. But I bed this is more a devtools bug.
Assignee | ||
Comment 8•6 years ago
|
||
Ah! There is a number of other channels in a connected state (unfinished). Will take a look!
Reporter | ||
Comment 9•6 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #7) > There is only one channel for that URL in this log and it's done in 740ms. > Child logs are missing, so hard to say if the problem is on the child side. > But I bed this is more a devtools bug. Note that there were no child logs produced, at least not in the same folder as the parent log. Also this time I didn't have the devtools open at all. The page itself does not complete loading.
Assignee | ||
Comment 10•6 years ago
|
||
Thanks for the log! So, the nsHalfOpenSocket @10f2f3ef0 created 2018-02-09 20:15:47.518, key = .S.....[tlsflags0x00000000]ssl.gstatic.com:443, fails to connect ssl.gstatic.com (2 sockets, prim/back, each trying the same ipv6 address 2607:f8b0:400b:80d::2003 twice [1], all fail to connect with 804b000e = NS_ERROR_NET_TIMEOUT, creates two http connections: 2018-02-09 20:17:17.954073 UTC - [Socket Thread]: V/nsHttp Creating nsHttpConnection @0x12279a400 - used to dispatch one of the 5 pending transactions for the host and propagate the connection error to it. 2018-02-09 20:17:17.956008 UTC - [Socket Thread]: V/nsHttp Creating nsHttpConnection @0x120f9c000 - immediately closed (no activation). This took nearly a minute and half. The problem is that the remaning 4 transactions nsHttpTransaction @1274bc400 nsHttpTransaction @12acb4800 nsHttpTransaction @128e0ec00 nsHttpTransaction @12acb4c00 are left in the pending queue. We then try to connect the server again to dispatch those remaining transactions, what again takes time. nsHalfOpenSocket @10f2f3ae0 created 2018-02-09 20:17:17.964 doesn't even finish the time you stopped the log. Its sockets still use the same, blacklisted address to connect to. I think this is a DNS problem. We definitely blacklist the address (see [1] ***) but sockets still are getting it and try to connect it. Pretty weird. [1] 2018-02-09 20:17:02.796306 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::OnSocketDetached [this=0x11e588800 cond=804b000e] 2018-02-09 20:17:02.796318 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::RecoverFromError [this=0x11e588800 state=3 cond=804b000e] 2018-02-09 20:17:02.796330 UTC - [Socket Thread]: D/nsHostResolver Adding address to blacklist for host [ssl.gstatic.com], host record [0x11eeec200]. (***) 2018-02-09 20:17:02.796342 UTC - [Socket Thread]: D/nsHostResolver Successfully adding address [2607:f8b0:400b:80d::2003] to blacklist for host [ssl.gstatic.com]. 2018-02-09 20:17:02.796360 UTC - [Socket Thread]: D/nsSocketTransport failed to connect all ipv4-only or ipv6-only hosts, trying lookup/connect again with both ipv4/ipv6 2018-02-09 20:17:02.796373 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::PostEvent [this=0x11e588800 type=0 status=0 param=0x0] 2018-02-09 20:17:02.796412 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport: calling PR_Close [this=0x11e588800] 2018-02-09 20:17:02.796942 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::OnSocketEvent [this=0x11e588800 type=0 status=0 param=0x0] 2018-02-09 20:17:02.796964 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::ResolveHost [this=0x11e588800 ssl.gstatic.com:443] 2018-02-09 20:17:02.796977 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::SendStatus [this=0x11e588800 status=804b0003] 804b0003 = STATUS_RESOLVING 2018-02-09 20:17:02.796990 UTC - [Socket Thread]: V/nsHttp nsHttpTransaction::OnSocketStatus [this=0x127b6fc00 status=804b0003 progress=0] 2018-02-09 20:17:02.797299 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::PostEvent [this=0x11e588800 type=1 status=0 param=0x1151f2910] 2018-02-09 20:17:02.797552 UTC - [Socket Thread]: D/nsSocketTransport active [0] { handler=0x120f9c000 condition=0 pollflags=6 } 2018-02-09 20:17:02.797840 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::OnSocketEvent [this=0x11e588800 type=1 status=0 param=0x1151f2910] 2018-02-09 20:17:02.797860 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::SendStatus [this=0x11e588800 status=804b000b] 804b000b = STATUS_RESOLVED 2018-02-09 20:17:02.797881 UTC - [Socket Thread]: V/nsHttp nsHttpTransaction::OnSocketStatus [this=0x127b6fc00 status=804b000b progress=0] 2018-02-09 20:17:02.798016 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::InitiateSocket [this=0x11e588800] 2018-02-09 20:17:02.798104 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::BuildSocket [this=0x11e588800] 2018-02-09 20:17:02.798397 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransportService::AttachSocket [handler=0x11e588800] 2018-02-09 20:17:02.798411 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransportService::AddToIdleList [handler=0x11e588800] 2018-02-09 20:17:02.798450 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::SendStatus [this=0x11e588800 status=804b0007] 804b0007 = STATUS_CONNECTING_TO 2018-02-09 20:17:02.798463 UTC - [Socket Thread]: V/nsHttp nsHttpTransaction::OnSocketStatus [this=0x127b6fc00 status=804b0007 progress=0] 2018-02-09 20:17:02.798499 UTC - [Socket Thread]: D/nsSocketTransport trying address: 2607:f8b0:400b:80d::2003
Component: Networking → Networking: DNS
Summary: Network requests in "blocked" state for long time (1+ minute) → Network requests in "blocked" state for long time (1+ minute) , DNS black-listing broken?)
Assignee | ||
Comment 11•6 years ago
|
||
Another snippet from the log, see the marked spots, the address is already blacklisted before: 2018-02-09 20:17:17.970027 UTC - [Socket Thread]: D/nsHostResolver Resolving host [ssl.gstatic.com]. *** 2018-02-09 20:17:17.970065 UTC - [Socket Thread]: D/nsHostResolver Using cached record for host [ssl.gstatic.com]. 2018-02-09 20:17:17.970141 UTC - [Socket Thread]: D/nsHostResolver Using positive cache entry for host [ssl.gstatic.com] but starting async renewal. 2018-02-09 20:17:17.970178 UTC - [Socket Thread]: D/nsHostResolver DNS thread counters: total=3 any-live=0 idle=3 pending=1 2018-02-09 20:17:17.970243 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::PostEvent [this=0x11e588800 type=1 status=0 param=0x111980af0] 2018-02-09 20:17:17.970261 UTC - [Socket Thread]: D/nsSocketTransport STS dispatch [0x111adcdf0] 2018-02-09 20:17:17.970333 UTC - [Socket Thread]: D/nsSocketTransport OnDispatchedEvent Same Thread Skip Signal 2018-02-09 20:17:17.970352 UTC - [Socket Thread]: D/nsSocketTransport advancing to STATE_RESOLVING 2018-02-09 20:17:17.970369 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::OnSocketEvent [this=0x11e588800 type=0 status=0 param=0x0] 2018-02-09 20:17:17.970382 UTC - [Socket Thread]: D/nsSocketTransport MSG_ENSURE_CONNECT 2018-02-09 20:17:17.970394 UTC - [Socket Thread]: D/nsSocketTransport ignoring redundant event 2018-02-09 20:17:17.970409 UTC - [Socket Thread]: D/nsSocketTransport STS poll iter 2018-02-09 20:17:18.222270 UTC - [Socket Thread]: D/nsHostResolver Resolving host [ssl.gstatic.com]. 2018-02-09 20:17:18.222281 UTC - [Socket Thread]: D/nsHostResolver Using cached record for host [ssl.gstatic.com]. 2018-02-09 20:17:18.222291 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::PostEvent [this=0x12279a400 type=1 status=0 param=0x115205040] 2018-02-09 20:17:18.222299 UTC - [Socket Thread]: D/nsSocketTransport STS dispatch [0x11521a940] 2018-02-09 20:17:18.222306 UTC - [Socket Thread]: D/nsSocketTransport OnDispatchedEvent Same Thread Skip Signal 2018-02-09 20:17:18.222313 UTC - [Socket Thread]: D/nsSocketTransport advancing to STATE_RESOLVING 2018-02-09 20:17:18.222322 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::OnSocketEvent [this=0x12279a400 type=0 status=0 param=0x0] 2018-02-09 20:17:18.222329 UTC - [Socket Thread]: D/nsSocketTransport MSG_ENSURE_CONNECT 2018-02-09 20:17:18.222335 UTC - [Socket Thread]: D/nsSocketTransport ignoring redundant event 2018-02-09 20:17:18.222342 UTC - [Socket Thread]: D/nsSocketTransport STS poll iter 2018-02-09 20:17:18.222349 UTC - [Socket Thread]: D/nsSocketTransport active [4] { handler=0x11e588800 condition=0 pollflags=6 } 2018-02-09 20:17:18.222358 UTC - [Socket Thread]: D/nsSocketTransport active [3] { handler=0x11f5a8c00 condition=0 pollflags=6 } 2018-02-09 20:17:18.222365 UTC - [Socket Thread]: D/nsSocketTransport active [2] { handler=0x12ae71400 condition=0 pollflags=5 } 2018-02-09 20:17:18.222372 UTC - [Socket Thread]: D/nsSocketTransport active [1] { handler=0x11f512400 condition=0 pollflags=5 } 2018-02-09 20:17:18.222379 UTC - [Socket Thread]: D/nsSocketTransport active [0] { handler=0x111c07400 condition=0 pollflags=6 } 2018-02-09 20:17:18.222385 UTC - [Socket Thread]: D/nsSocketTransport calling PR_Poll [active=5 idle=0] 2018-02-09 20:17:18.222392 UTC - [Socket Thread]: D/nsSocketTransport timeout = 0 milliseconds 2018-02-09 20:17:18.222463 UTC - [Socket Thread]: D/nsSocketTransport ...returned after 0 milliseconds 2018-02-09 20:17:18.222472 UTC - [Socket Thread]: D/nsSocketTransport STS dispatch [0x11193e8c0] 2018-02-09 20:17:18.222479 UTC - [Socket Thread]: D/nsSocketTransport OnDispatchedEvent Same Thread Skip Signal 2018-02-09 20:17:18.222486 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::OnSocketEvent [this=0x12279a400 type=1 status=0 param=0x115205040] 2018-02-09 20:17:18.222493 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::SendStatus [this=0x12279a400 status=804b000b] 804b000b = STATUS_RESOLVED 2018-02-09 20:17:18.222500 UTC - [Socket Thread]: D/nsSocketTransport MSG_DNS_LOOKUP_COMPLETE *** 2018-02-09 20:17:18.222508 UTC - [Socket Thread]: D/nsHostResolver Checking blacklist for host [ssl.gstatic.com], host record [0x11eeec200]. 2018-02-09 20:17:18.222515 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::InitiateSocket [this=0x12279a400] 2018-02-09 20:17:18.222522 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::BuildSocket [this=0x12279a400] 2018-02-09 20:17:18.222529 UTC - [Socket Thread]: D/nsSocketTransport pushing io layer [0:ssl] 2018-02-09 20:17:18.222595 UTC - [Socket Thread]: D/nsSocketTransport [secinfo=0x10f37e5d0 callbacks=0x10f37b190] 2018-02-09 20:17:18.222620 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransportService::AttachSocket [handler=0x12279a400] 2018-02-09 20:17:18.222630 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransportService::AddToIdleList [handler=0x12279a400] 2018-02-09 20:17:18.222642 UTC - [Socket Thread]: D/nsSocketTransport active=5 idle=1 2018-02-09 20:17:18.222683 UTC - [Socket Thread]: D/nsSocketTransport advancing to STATE_CONNECTING 2018-02-09 20:17:18.222699 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::SendStatus [this=0x12279a400 status=804b0007] 804b0007 = STATUS_CONNECTING_TO *** 2018-02-09 20:17:18.222712 UTC - [Socket Thread]: D/nsSocketTransport trying address: 2607:f8b0:400b:80d::2003
Assignee | ||
Updated•6 years ago
|
Whiteboard: ni?2018-01-16
Assignee | ||
Comment 12•6 years ago
|
||
Daniel, since you are slowly becoming our DNS expert, could you take a look please?
Flags: needinfo?(daniel)
Comment 13•6 years ago
|
||
The address blacklist logic resets the blacklist again if it all addresses for that single record are blacklisted (otherwise it would end up with no addresses to try) and that seems to happen a lot in this log, including for host "ssl.gstatic.com". Search for "Resetting blacklist". They all seem to get blacklisted when they've all failed connections, so I suspect the blacklisting a secondary outcome of connection problems and that the connection problems is the primary problem. Unfortunately, I can't explain why it would have connect problems.
Flags: needinfo?(daniel)
Assignee | ||
Comment 14•6 years ago
|
||
Thanks Daniel. Kartikaya, just for completeness, what is your setting for network.http.fast-fallback-to-IPv4 preference? I expect true (the default). We probably need to change these things here: - lower the connection timeout of the backup socket to only say 5 seconds (quite arbitrarily chosen const) when family already has a preference from earlier time, have a pref for the timeout - make sure the socket transport tries a different family on error recovery, mainly on the timeout error: when disable ipv4 was spec'd, switch to disable ipv6 and retry, (and vise versa) - drop the family preference flag on connentry when we fail connecting that family (in 5 seconds or in the regular timeout?) or rather reset the preference when we recover using a different family (needs a callback from st to hl or to expose the specific family failure somehow) to support that, I can see in the log: - we do (also) get an ipv4 address for the ssl.gstatic.com host from getaddrinfo - we never try to connect that ipv4 address through out the log, though - when nsSocketTransport @1253be400 (primary) recovers from its first connection error: 2018-02-09 20:14:01.728967 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::RecoverFromError [this=0x1253be400 state=3 cond=804b000e] ...it resolves the host again, because: 2018-02-09 20:14:01.729037 UTC - [Socket Thread]: D/nsSocketTransport failed to connect all ipv4-only or ipv6-only hosts, trying lookup/connect again with both ipv4/ipv6 2018-02-09 20:14:01.729335 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::ResolveHost [this=0x1253be400 ssl.gstatic.com:443] and again tries to connect the same ipv6 address because it gets it first as a result of that resolve: 2018-02-09 20:14:01.730397 UTC - [Socket Thread]: D/nsSocketTransport trying address: 2607:f8b0:400b:80d::2003 despite the fact we just blacklisted that ip address (according Daniel's comment, we restart the blacklisting when all ip addresses we have are blacklisted - here it was the only one - but we still have the ipv4 address, so I'm not fully clear on this bit) the logs are quite poor in the area what family we resolve for which socket transport, so the following is based on studying the code. I assume that in the past we were able to connect that ipv6 address and remembered that in mEnt->mPreferIPv6. That makes the primary transport use only ipv6 addresses. But, the code at [1] makes even the backup socket use ipv6 only, still, it's likely correct - bug 725587. I can see in the log we go via the same steps for the backup socket as well: 2018-02-09 20:14:02.720711 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::RecoverFromError [this=0x126e8e000 state=3 cond=804b000e] 2018-02-09 20:14:02.720739 UTC - [Socket Thread]: D/nsSocketTransport failed to connect all ipv4-only or ipv6-only hosts, trying lookup/connect again with both ipv4/ipv6 2018-02-09 20:14:02.720874 UTC - [Socket Thread]: D/nsSocketTransport nsSocketTransport::ResolveHost [this=0x126e8e000 ssl.gstatic.com:443] It then again tries an IPv6 address as first again (we should try an ipv4 address here..) and goes via the same dance again. We only do a try for the ipv6 address (and not the ipv4 address that's waiting there to be picked) since something called Close on the transport and thus set the mDoNotRetryToConnect flag. [1] https://searchfox.org/mozilla-central/rev/33cc9e0331da8d9ff3750f1e68d72d61201176cb/netwerk/protocol/http/nsHttpConnectionMgr.cpp#4079-4085
Flags: needinfo?(bugmail)
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Assignee | ||
Updated•6 years ago
|
Priority: -- → P2
Assignee | ||
Updated•6 years ago
|
Whiteboard: [necko-triaged]
Reporter | ||
Comment 15•6 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #14) > Thanks Daniel. > > Kartikaya, just for completeness, what is your setting for > network.http.fast-fallback-to-IPv4 preference? I expect true (the default). > Yes, it's the default (true).
Flags: needinfo?(bugmail)
Assignee | ||
Updated•6 years ago
|
Summary: Network requests in "blocked" state for long time (1+ minute) , DNS black-listing broken?) → Network requests in "blocked" state for long time (1+ minute) when ipv6 hosts are no longer available
Assignee | ||
Comment 16•6 years ago
|
||
the patch does what's described at the top of comment 14, also see the detailed commit message I've done a successful local manual testing using the hosts file and iis. https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e73d7b5a5dcf7f6f5bb1148d0d290f0bb455bdd
Attachment #8957636 -
Flags: review?(mcmanus)
Comment 17•6 years ago
|
||
Comment on attachment 8957636 [details] [diff] [review] v1 Review of attachment 8957636 [details] [diff] [review]: ----------------------------------------------------------------- I'm ok with this in prinicple but I haven't done a real code review.
Attachment #8957636 -
Flags: review?(mcmanus) → review?(valentin.gosu)
Assignee | ||
Comment 19•6 years ago
|
||
I think we should make the "try other family" a specific connection flag. When connecting a host the first time, no preference for an ip fam, we do primary ipv6+ipv4 and backup ipv4+ipv6 attempts, as both sockets have a DISABLE_IPVx flag. This is not necessary IMO. It should be like this: - no preference (so far first connection attempt, or we never reached the server): - primary: disable ipv4 - backup: disable ipv6 - preference is already known: - primary: disable ipv-non-preferred - backup: timeout=5s, disable ipv-non-preferred + try-other-ip-fam
Assignee | ||
Comment 20•6 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #19) > I think we should make the "try other family" a specific connection flag. Filed bug 1444923 for that.
Comment 21•6 years ago
|
||
Comment on attachment 8957636 [details] [diff] [review] v1 Review of attachment 8957636 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/libpref/init/all.js @@ +1801,5 @@ > // Close a connection if tls handshake does not finish in given number of > // seconds. > pref("network.http.tls-handshake-timeout", 30); > > +// The number of seconds we time out a connection of a retry (fallback) socket nit: seconds <after which> we time out. ::: netwerk/protocol/http/nsHttpHandler.cpp @@ +206,5 @@ > , mNetworkChangedTimeout(5000) > , mMaxRequestAttempts(6) > , mMaxRequestDelay(10) > , mIdleSynTimeout(250) > + , mFallbackSynTimeout(5000) Since we clamp the pref to (0, 600) I guess we should default this to either 0 or 600.
Attachment #8957636 -
Flags: review?(valentin.gosu) → review+
Assignee | ||
Comment 22•6 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #21) > > + , mFallbackSynTimeout(5000) > > Since we clamp the pref to (0, 600) I guess we should default this to either > 0 or 600. Ups! This should actually be 5, it's in seconds (!), the default I added to all.js. thanks!
Updated•6 years ago
|
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 23•6 years ago
|
||
One more problem found, uncovered by a completely unrelated test on android: netwerk/test/unit/test_fallback_request-error_canceled.js the test shuts the server down and then tries if some appcache stuff is loading as expected. the state of the conn-entry is to prefer ipv4. so, after the server has been shutdown, any connection attempt to localhost is made with the disable-ipv6 flag. we try 127.0.0.1 and then try to resolve ipv6. that fails with "unknown host" as there is no ipv6 on android test infra! this actually reveals a long standing bug, we never drop the mAttached flag and when handling the unknown-host error in nsSocketTransport::OnSocketEvent, we believe sts will signal with detach and we handle the error there. but at that moment the socket has already been detached from sts, and thus we bypass call to OnSocketDetached from inside OnSocketEvent. the socket never notifies upper layers about the connection failures, it's timed out by the http conn manager 90s timeout logic and never resets the preference. adding mAttached = false to OnSocketDetached helps. going another try round...
Assignee | ||
Comment 24•6 years ago
|
||
then, I start to believe we should have the TRY_OTHER_IP_FAMILY flag within this patch to prevent delays and unnecessary conn attempts.
Assignee | ||
Comment 25•6 years ago
|
||
After more research, I will rewrite the patch here. the retry-other-family logic will move to half-open.
Flags: needinfo?(polo.hellfire)
Assignee | ||
Updated•6 years ago
|
Attachment #8957636 -
Attachment is patch: false
Attachment #8957636 -
Flags: review+
Assignee | ||
Updated•6 years ago
|
Attachment #8957636 -
Attachment is obsolete: true
Attachment #8957636 -
Attachment is patch: true
Assignee | ||
Comment 26•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd88732414c1bc84bd0be6d45765297062a95af8 - eventually I rather implemented RETRY_WITH_DIFFERENT_IP_FAMILY connection flag, as moving the logic to half-open turned out to be too complicated (the motivation was a cleanup + possibility to ignore errors from backup sockets under certain conditions, but that can be a followup, not needed until we decide what to do with bug 1444928) - I deliberately keep reference to mSynTimer (timer for the backup socket) to not fire it more than once (I've seen logs where HOs were scheduling another backup sockets prolonging timeout detection even more) ; no leak detected with that change otherwise the patch is identical to the v1 patch.
Attachment #8958928 -
Flags: review?(valentin.gosu)
Assignee | ||
Comment 28•6 years ago
|
||
Valentin, the interdiff by bugzilla [1] seems sane, so you can review that [1] https://bugzilla.mozilla.org/attachment.cgi?oldid=8957636&action=interdiff&newid=8958928&headers=1
Assignee | ||
Updated•6 years ago
|
Comment 29•6 years ago
|
||
Comment on attachment 8958928 [details] [diff] [review] v2 Review of attachment 8958928 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/base/nsISocketTransport.idl @@ +253,5 @@ > */ > const unsigned long REFRESH_CACHE = (1 << 9); > > /** > + * If connecting the preferred ip family has failed, retry with Let's rephrase this: "If this flag is set then it means that..." ::: netwerk/base/nsSocketTransport2.h @@ +300,5 @@ > bool mProxyTransparent; > bool mProxyTransparentResolvesHost; > bool mHttpsProxy; > uint32_t mConnectionFlags; > + // When we fail to connect a prefered family, we tell the consumer to reset nit: to connect _using a preferred_ family ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ +4084,5 @@ > + // From the same reason, let the backup socket fail faster to try the other family. > + uint16_t fallbackTimeout = isBackup ? gHttpHandler->GetFallbackSynTimeout() : 0; > + if (fallbackTimeout) { > + socketTransport->SetTimeout(nsISocketTransport::TIMEOUT_CONNECT, > + fallbackTimeout); nit: You don't need the temporary fallbackTimeout. Just do if (isBackup) ... unless you think this is clearer. @@ +4242,5 @@ > if (!mSynTimer) > return; > > LOG(("nsHalfOpenSocket::CancelBackupTimer()")); > mSynTimer->Cancel(); Add a comment about why we deliberately don't null mSynTimer, otherwise someone might add it back @@ +4329,5 @@ > MOZ_ASSERT(mEnt); > > DebugOnly<nsresult> rv = SetupBackupStreams(); > MOZ_ASSERT(NS_SUCCEEDED(rv)); > Same - needs comment. ::: netwerk/protocol/http/nsHttpConnectionMgr.h @@ +330,5 @@ > // Set the IP family preference flags according the connected family > void RecordIPFamilyPreference(uint16_t family); > // Resets all flags to their default values > void ResetIPFamilyPreference(); > + // True iff there is currently an established preference for the host nit: typo iff Also, the preference is for IP family, not host.
Attachment #8958928 -
Flags: review?(valentin.gosu) → review+
Assignee | ||
Comment 30•6 years ago
|
||
thanks for r! repushed to try https://treeherder.mozilla.org/#/jobs?repo=try&revision=81606b96bc337f9b80d05e3332703e472064c3de there were some tests failing, so again on top of bug 1446117
Assignee | ||
Comment 31•6 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #29) > > + if (fallbackTimeout) { > > + socketTransport->SetTimeout(nsISocketTransport::TIMEOUT_CONNECT, > > + fallbackTimeout); > > nit: You don't need the temporary fallbackTimeout. Just do if (isBackup) ... > unless you think this is clearer. 0 means to use default (so, don't change the TO). > > void ResetIPFamilyPreference(); > > + // True iff there is currently an established preference for the host > > nit: typo iff iff == if and only if abbrev (equivalence, not implication)
Assignee | ||
Comment 32•6 years ago
|
||
Attachment #8958928 -
Attachment is obsolete: true
Attachment #8959575 -
Flags: review+
Assignee | ||
Comment 33•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0d10297f26289966f213dded341bac6c68310ac on top of bug 1446117 that I will likely add as a dep.
Assignee | ||
Comment 35•6 years ago
|
||
The latest green try for v2.1: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0d10297f26289966f213dded341bac6c68310ac
Keywords: checkin-needed
Comment 36•6 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/2f4395f5ca24 Network requests in "blocked" state for long time (1+ minute) when ipv6 hosts are no longer available. r=valentin
Keywords: checkin-needed
Comment 37•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2f4395f5ca24
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Assignee | ||
Updated•6 years ago
|
Alias: ipv4-faster-fallback
You need to log in
before you can comment on or make changes to this bug.
Description
•