Closed
Bug 1191253
Opened 9 years ago
Closed 9 years ago
"Link monitor" thread isn't started properly on Linux
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla43
People
(Reporter: bagder, Assigned: bagder)
References
Details
Attachments
(1 file, 2 obsolete files)
4.08 KB,
patch
|
bagder
:
review+
|
Details | Diff | Splinter Review |
When landing that work, the function call was changed from NS_NewThread() to NS_NewNamedThread() late in the game but without using the latter function with the correct set of arguments and thus the "Link Monitor" thread is never actually running! (Linux-specific code-path)
Assignee | ||
Updated•9 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•9 years ago
|
||
The patch only adds an argument to make the initial event gets dispatched
Attachment #8643597 -
Flags: review?(mcmanus)
Updated•9 years ago
|
Attachment #8643597 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 2•9 years ago
|
||
Try-run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=56807702c4d9
Keywords: checkin-needed
Comment 4•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/fea6620f36b8
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Depends on: 1193541
Assignee | ||
Comment 5•9 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/3c8f6736b07c since it caused Bug 1193541. Backing this out, we're back to not running the thread so it will not properly detect network changes and online/offline situations.
Assignee | ||
Comment 6•9 years ago
|
||
Thanks to the guest account I got from :botond I could (finally) see the problem trigger in real-time and figure out what it was going on that triggered bug 1193541 and why we had to back this patch out. In these networks we apparently rather frequently get a whole bunch of IPv6 routing table updates involving a link-local address as gateway. I'm not sure that the significance of these updates are, but they're not really the sort of updates we're looking for here anyway since we use the routing table updates as a signal that the "network" changed and these updates on the link-local level are clearly not changing the network in a notable way that would need us to trigger network-changed events. I'll polish up my patch and take it through a try-run before I post it. Stay tuned.
Comment 7•9 years ago
|
||
so these are extraneous events that lead to too many network change events.. and that's going to clear and reset a bunch of things un-necessarily, so it makes sense to filter them out. but why did it break connectivity to google? It doesn't seem like we should get into a fail state?
Flags: needinfo?(daniel)
Assignee | ||
Comment 8•9 years ago
|
||
These events keep coming on this network, so it caused network-changed events get sent in bursts every few seconds, which we should avoid. But then as to exactly what those network-changed events did that caused google and spotify-webplayer to fail, I'm not sure. I've not seen a very accurate description of what exactly happened. It could indeed be valuable for me to make a local build that just sends network-change repeatedly at a fair interval to see what breaks and why since we I can't think of any reason why we shouldn't be able to should handle that pretty gracefully. I'll run some tests like this and see what I learn. (the actual patch for this issue is still pending mostly since try has been closed this evening)
Flags: needinfo?(daniel)
Assignee | ||
Comment 9•9 years ago
|
||
Here's a version of this patch that starts the helper thread properly _and_ ignores all routing table updates using a link-local IPv6 address as gateway. Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6c1945a20dd
Attachment #8643597 -
Attachment is obsolete: true
Attachment #8653985 -
Flags: review?(mcmanus)
Assignee | ||
Comment 10•9 years ago
|
||
:botond, do you think you can give this a quick try in your end to see if anything weird pops up? I ran with this using my guest account in your machine but it isn't really possible to run the browser properly and for an extended period in such a running-remote setup. (Sweden<->Canada)
Flags: needinfo?(botond)
Comment 11•9 years ago
|
||
Comment on attachment 8653985 [details] [diff] [review] v2-0001-bug-1191253-start-the-link-monitor-ignore-link-lo.patch Review of attachment 8653985 [details] [diff] [review]: ----------------------------------------------------------------- I think its important to understand the failure mode before re-enabling this.. so let's review it then.
Attachment #8653985 -
Flags: review?(mcmanus)
Comment 12•9 years ago
|
||
(In reply to Daniel Stenberg [:bagder] from comment #10) > :botond, do you think you can give this a quick try in your end to see if > anything weird pops up? I ran with this using my guest account in your > machine but it isn't really possible to run the browser properly and for an > extended period in such a running-remote setup. (Sweden<->Canada) I've been using a build with this patch for a few hours, and I haven't experienced any issues accessing Google properties, or any other networking-related issues.
Flags: needinfo?(botond)
Assignee | ||
Comment 13•9 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #11) > I think its important to understand the failure mode before re-enabling > this.. so let's review it then. Okay, let's do that. I'm fairly confident though that this bug that caused too many network-change events to happen was one bug that we have to fix and the inability to deal with that large amount of events is a different one that should warrant its own bug and remedies. I'm on it.
Assignee | ||
Comment 14•9 years ago
|
||
Here's what I've learned so far: My test-Firefox sends a fake network change event every 100ms, non-stop, to really torture everything. I then pop a youtube video (meaning HTTP/2) and it really doesn't work properly. Turns out I can make it really usable again by simply not sending off the HTTP/2 pings nsHttpConnection::CheckForTraffic() will result in. And then it struck me that I've previously seen that Google's HTTP/2 server doesn't like a series of pings and sends a protocol error on the third ping within a short period of time (I don't know what heuristics it uses). I could easily verify this with h2i too.
Comment 15•9 years ago
|
||
(In reply to Daniel Stenberg [:bagder] from comment #14) > Turns out I can make it really usable again by simply not sending off the > HTTP/2 pings nsHttpConnection::CheckForTraffic() will result in. And then it > struck me that I've previously seen that Google's HTTP/2 server doesn't like > a series of pings and sends a protocol error on the third ping within a > short period of time (I don't know what heuristics it uses). I could easily > verify this with h2i too. that makes sense - and its a great find, because that was a latent issue that was going to impact more than Linux :) So it all works fine (if suboptimally, obviously) if just the ping is suppressed? what do you think we should do here?
Assignee | ||
Comment 16•9 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #15) > So it all works fine (if suboptimally, obviously) if just the ping is > suppressed? Yes. Removing the pings made Firefox usable again. > what do you think we should do here? 1) land this patch, as it makes the Linux version more in line with how Firefox works on Windows and Mac already 2) file a new bug and work on rate limitation logic for network-change events. I'll write down my thinking first thing in that bug. Primarily I think delaying the first event so that we can catch rapid subsequent ones and avoid generating many, and then I'll work on logic to avoid sending further pings over a h2/spdy connection that hasn't transferred any data since the previous ping. (and instead switch over to the more passive HTTP/1.1 method)
Comment 17•9 years ago
|
||
make it so!
Comment 18•9 years ago
|
||
Comment on attachment 8653985 [details] [diff] [review] v2-0001-bug-1191253-start-the-link-monitor-ignore-link-lo.patch Review of attachment 8653985 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp @@ +179,2 @@ > > // We are just intrested in main routing table this isn't part of this patch, so we can open a new bug if necessary, but this is a odd assumption right? @@ +196,5 @@ > + if (route_entry->rtm_family == AF_INET6) { > + unsigned char *g = (unsigned char *) > + RTA_DATA(attr); > + if((g[0] == 0xFE) && ((g[1] & 0xc0) == 0x80)) { > + link_local = true; break; @@ +206,5 @@ > + if (!link_local) { > + LOG(("OnNetlinkMessage: route update\n")); > + networkChange = true; > + } > + else { same line as }
Attachment #8653985 -
Flags: review+
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #18) > > // We are just intrested in main routing table > > this isn't part of this patch, so we can open a new bug if necessary, but > this is a odd assumption right? Probably. I've struggled to understand and found documentation on exactly how these tables are used and not used so I've tried to take the conservative path -- trying to avoid superfluous notifications. But yes, we can deal with this separately.
Assignee | ||
Comment 20•9 years ago
|
||
Fixed review remarks.
Attachment #8653985 -
Attachment is obsolete: true
Attachment #8656512 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 21•9 years ago
|
||
(In reply to Daniel Stenberg [:bagder] from comment #19) > (In reply to Patrick McManus [:mcmanus] from comment #18) > > > > // We are just intrested in main routing table > > > > this isn't part of this patch, so we can open a new bug if necessary, but > > this is a odd assumption right? > > Probably. I've struggled to understand and found documentation on exactly > how these tables are used and not used so I've tried to take the > conservative path -- trying to avoid superfluous notifications. But yes, we > can deal with this separately. I've used them to introduce src address based routing in the past. (i.e. you use the src address to find a table, and each table has a different gateway).
Comment 22•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/20ab8a8bdd20
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/20ab8a8bdd20
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
status-firefox43:
--- → fixed
Resolution: --- → FIXED
Target Milestone: mozilla42 → mozilla43
You need to log in
before you can comment on or make changes to this bug.
Description
•