Closed Bug 1234548 Opened 4 years ago Closed 4 years ago

firefox 43 ipv6 request timeout in long requests

Categories

(Core :: Networking, defect)

43 Branch
x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: kroustou, Assigned: dragana)

References

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0
Build ID: 20151210085006

Steps to reproduce:

Connect to an ipv6 network from linux, get a url, which takes a long time to respond (10-15sec)


Actual results:

in the developer tools, network tab the request was a square from the beginning and after it was dropped, the value of type went from blank to "plain", the response time to 0.8ms


Expected results:

Firefox should be waiting for the request to finish and get the results. This works properly in ipv4 (tested in a url wich sleeps for 40s).
OS: Unspecified → Linux
Hardware: Unspecified → x86
Component: Untriaged → Networking
Product: Firefox → Core
I'm also having this same problem with Firefox 43 Linux on multiple sites. Disabling IPv6 by setting network.dns.disableIPv6 to true in about:config in order to connect via IPv4 works.

I've seen this happen with GETs and POSTs, and the problem only happens in Firefox (I also tested Chrome on Linux and Edge on Win10).

I also had a site that uses socket.io for websockets continually disconnect and reconnect; this behavior only started happening in Firefox 43, while the same site worked fine in Chrome. I'm pretty certain that socket.io wasn't falling back to long polling in this instance, but I can check again tomorrow.
I've found a way where I can reproduce this about 1 out of 3 times. First of all, it doesn't seem like I can reproduce this using the loopback interface [::1]. So you will need to run an IPv6 compatible server on one computer and access it from another. Next, I was not able to reproduce the problem with GET, but I could reliably reproduce it with an XMLHttpRequest POST.

The error event listener logs an error event. Usually it takes a while before the error occurs, but on one occassion it happened within the first 15 or 20 seconds after initiating the request. Unfortunately I couldn't figure out how to get any more info out of the error event object, but from what I read in the whatwg xhr standard, the network error doesn't necessarily contain any useful information. https://xhr.spec.whatwg.org/#request-error-steps

To reproduce the problem I launched the following server (written in Go) on one computer and then accessed it from another computer. Note that you will have to replace the IPv6 address with your actual address in the req.open line in the template's javascript.

package main

import (
	"fmt"
	"log"
	"net/http"
	"time"
)

var template string = `
<!DOCTYPE html>
<html>
	<head>
	</head>
	<body>
		<script>
			var req = new XMLHttpRequest();
			req.addEventListener("load", function() { console.log(this.responseText) });
			req.addEventListener("error", function(evt) { console.error("error", evt) });
			req.addEventListener("abort", function(evt) { console.error("abort", evt) });
			req.open("POST", "http://[f00f:c7c8]/post");  // REPLACE THIS ADDRESS (keep it inside the []s)
			req.send();
		</script>
	</body>
</html>
`

func index(w http.ResponseWriter, r *http.Request) {
	fmt.Fprintf(w, template)
}

func sleepHandler(w http.ResponseWriter, r *http.Request) {
	log.Println("post received from", r.RemoteAddr)
	ch := make(chan bool, 1)
	go func() {
		time.Sleep(120 * time.Second)
		ch <- true
	}()
	<-ch
	w.Header().Set("Content-Type", "text/plain")
	fmt.Fprintf(w, "ok")
	log.Println("post complete from", r.RemoteAddr)
}

func main() {
	http.HandleFunc("/", index)
	http.HandleFunc("/post", sleepHandler)

	log.Fatal(http.ListenAndServe(":80", nil))
}
dragana, can you look at this?
Flags: needinfo?(dd.mozilla)
Ok, I've discovered something more worrisome. This is occurring even if a site doesn't support IPv6 but Firefox has IPv6 enabled.

When using a domain name where the AAAA record is an authority response (so no IPv6 exposed, IIUC, but I'm a DNS noob), 2 out of 5 attempts succeeded.

When using an explicit IPv4 address instead of a domain name, 3 out of 5 attempts succeeded.

Every time this error has occurred, the error event from the XMLHttpRequest is emitted while the backend executes the request successfully. In my test server the handler sleeps for 120 seconds before returning a response, but the error event in the client occurs well before that.

To reproduce this problem, change the IPv6 address in the above server code to a hostname without IPv6 DNS records or to an IPv4 address and then access the server in the same way.
Well, this is strange. With IPv6 _disabled_ in Firefox:

Using a domain name with no AAAA record worked 3 out of 5 times. The first time it failed it was almost immediate.
 
Using an IPv4 address worked 4 out of 5 times. 

Previously I thought that disabling IPv6 would resolve the problem, but apparently that was just a coincidence. I really hope I'm not doing something wrong myself here to cause these problems, but I think that the bare bones server I coded is unlikely to be the source of the problem. Also, I work at a web hosting company so I think we'd be aware of any problems on our network that would be causing the long running request to be terminated.
I will take a look
Flags: needinfo?(dd.mozilla)
Assignee: nobody → dd.mozilla
Hi,
 can you attach (or send to my mail) logging:

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Thank you!
Flags: needinfo?(kroustou)
Flags: needinfo?(clementraphael)
I have got log from kroustou. Thanks!

The connection are killed by nsHttpConnectionMgr::PruneNoTraffic
http://hg.mozilla.org/mozilla-central/annotate/tip/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l956
log:
2015-12-29 15:06:14.738457 UTC - -672192768[7f8defb38fa0]:   pruning no traffic [ci=......localhost:8000]
2015-12-29 15:06:14.738462 UTC - -672192768[7f8defb38fa0]: nsHttpConnection::Close [this=7f8db17fe6d0 reason=80004004]

Which should happen only if network change event is fired. 
@kroustou does you network changes during this (network goes down and up)

I will needinfo Daniel who implemented this code and know it better.
Flags: needinfo?(daniel)
No the connection is always up (i am connected by cable and i work at a NOC)
Flags: needinfo?(kroustou)
For the record on firefox 42 there was no problem
(In reply to kroustou from comment #10)
> For the record on firefox 42 there was no problem

So there is probably a regression in FF43. Could you install the tool mozregression to find a regression range. See http://mozilla.github.io/mozregression/ for details.
Mozreg is a python-2.7 package, so you need to install it before running mozreg.

As FF42 worked, just run the command "mozregression --good=42" and copy here the final pushlog given in the console output.
This sounds an awful lot like Bug 1235509; network change events were turned on for Linux in Bug 1191253 (i.e. in Fx43). Try setting network.notify.changed=false and see if that helps.
(In reply to dagger.bugzilla from comment #12)
> This sounds an awful lot like Bug 1235509; network change events were turned
> on for Linux in Bug 1191253 (i.e. in Fx43). Try setting
> network.notify.changed=false and see if that helps.

It works by setting network.notify.changed to false indeed.
Yes, indeed it must be the same with Bug 1235509.

@Loic done! The results of mozregression are attached (Last good revision: c60a6aab34a0c09e5e8cd6294037114a83628f73 and First bad revision: 6ebbe5ee0e8101d71a6ac7c6b36fdcc6a8d41b9f)
Attached file mozregression.txt
mozregression
I will make a patch for bug 1235509 and kroustou may I ask you to test it if you have time.
@Dragana, of course, at your service.
Have also been seeing this on some slow responding sites since FF43. Setting network.notify.changed=false seems to help/fix for now.
Can you try this version:

http://archive.mozilla.org/pub/firefox/try-builds/dd.mozilla@gmail.com-38a73b6f5d7d91f23f2f62be305c787eea243f4f/try-linux64/firefox-46.0a1.en-US.linux-x86_64.tar.bz2

This is the build with patch that I have attached to bug 1235509

Thanks a lot!
Flags: needinfo?(kroustou)
I tried using the patched build with network.dns.disableIPv6 set to the default of false and network.notify.changed set to the default of true. I still consistently experience disconnects with long running GET and POST requests. In the case of POSTs done via XMLHttpRequest, I experienced the behavior I described above.

In the case of GETs, I experienced problems similar to another bug I filed, 1234720. If I open a new tab and perform the GET, which Firefox then times out, the URL changes to "about:newtab", clicking refresh does nothing, the inspector shows nothing, and document.URL returns "about:blank". After opening the console I am then able to refresh, after which the URL becomes blank and document.URL still returns "about:blank". 

If I open the URL in a tab that has already successfully loaded a page and the GET is aborted, the URL remains unchanged but the inspector becomes blank. document.URL returns the previously loaded succesful page instead of "about:blank" or the requested URL.

Note that all of the above was done over IPv4. 

If I set network.notify.changed to false, everything seems to work fine.
Flags: needinfo?(clementraphael)
(In reply to muraiki from comment #20)
> I tried using the patched build with network.dns.disableIPv6 set to the
> default of false and network.notify.changed set to the default of true. I
> still consistently experience disconnects with long running GET and POST
> requests. In the case of POSTs done via XMLHttpRequest, I experienced the
> behavior I described above.
> 
> In the case of GETs, I experienced problems similar to another bug I filed,
> 1234720. If I open a new tab and perform the GET, which Firefox then times
> out, the URL changes to "about:newtab", clicking refresh does nothing, the
> inspector shows nothing, and document.URL returns "about:blank". After
> opening the console I am then able to refresh, after which the URL becomes
> blank and document.URL still returns "about:blank". 
> 
> If I open the URL in a tab that has already successfully loaded a page and
> the GET is aborted, the URL remains unchanged but the inspector becomes
> blank. document.URL returns the previously loaded succesful page instead of
> "about:blank" or the requested URL.
> 
> Note that all of the above was done over IPv4. 
> 
> If I set network.notify.changed to false, everything seems to work fine.

Hi, 
can you run this command in a terminal:
ip monitor

It monitors address and route changes. I am interested to see what is changing when your problem occurs.
Thanks
Flags: needinfo?(clementraphael)
`ip monitor` showed various IPs going STALE and REACHABLE, but the IP of the server I was connecting to did not show up. I tried a few times.
Flags: needinfo?(clementraphael)
(In reply to muraiki from comment #22)
> `ip monitor` showed various IPs going STALE and REACHABLE, but the IP of the
> server I was connecting to did not show up. I tried a few times.

Was there anything else besides STALE and REACHABLE like 
Deleted 2: eth0    inet 192.168.42.140/24 brd 192.168.42.255 scope global eth0
       valid_lft forever preferred_lft forever
or
2: eth0    inet 192.168.42.140/24 brd 192.168.42.255 scope global eth0
       valid_lft forever preferred_lft forever
or
default via 192.168.42.2 dev eth0  proto static 

or something similar. Firefox monitors such message to see if network has changed. The message does not have to have ip address for the server.
I just tested it two more times and got two failures. All of the messages were only for STALE and REACHABLE. :(
Clearly the network-change detection has triggered in this case, and when that is triggered it will check all HTTP/1 connections and allow them N seconds to have traffic before they are deemed dead and they are "pruned".

Those N seconds can be tweaked with the 'network.http.network-changed.timeout' pref by the way.

Have anyone of you who see this in 43 tried a later Firefox version? We've fixed a few additional things in the network detection since and it would be very interesting to learn if they make any difference to this scenario.
Flags: needinfo?(daniel)
interesting - I didn't realize until comment 25 that this was restricted to resources that are idle. That's pretty much what the change detection code is trying to identify.

So in some sense that's a perfect storm. but why are so many frequent changes going on here? The whole assumption with that feature is that these are rare events tied to mobility or sleep or something..
According to bug 1235509, these events are fired every time the lifetime on an address changes, even if the address (or total set of addresses) doesn't change. The lifetimes are refreshed every time an RA is broadcast, which can be as often as once every 3 seconds according to the RFC.

(From staring at MXR, I believe we also empty the DNS cache and re-request the proxy PAC file on every network change event, but I guess those are less important than closing connections is.)
This change (http://hg.mozilla.org/mozilla-central/rev/4e3c40a187a1) is not in 43, which this makes it suffer from change "bursts" too. 

Since the patch from bug 1235509 doesn't seem to fix this (which is what I take from comment #20), which implements a hash to avoid repeated notifications from the same addresses then can really repeated RA advertisements be the culprit? If they are, the patch seems to not do what it attempts to. Maybe we could add some logging there and see if this is really what triggers these network change events?

Also, even if 3-second interval RAs can be sent, I don't think it is normal to actually set a new address that often.

The 4e3c40a change only makes the shortest possible time between two network events never be shorter than 1000 milliseconds so it might not change this particular problem.
(In reply to dagger.bugzilla from comment #27)

> (From staring at MXR, I believe we also empty the DNS cache and re-request
> the proxy PAC file on every network change event, but I guess those are less
> important than closing connections is.)

Yes we are, and they are probably just less notable as too many such flushes only makes things slightly slower.

We need to identify what causes these network changes to trigger and make sure we don't act on them (like this).
(In reply to Daniel Stenberg [:bagder] from comment #28)
> This change (http://hg.mozilla.org/mozilla-central/rev/4e3c40a187a1) is not
> in 43, which this makes it suffer from change "bursts" too. 
> 
> Since the patch from bug 1235509 doesn't seem to fix this (which is what I
> take from comment #20), which implements a hash to avoid repeated
> notifications from the same addresses then can really repeated RA
> advertisements be the culprit? If they are, the patch seems to not do what
> it attempts to. Maybe we could add some logging there and see if this is
> really what triggers these network change events?
> 
> Also, even if 3-second interval RAs can be sent, I don't think it is normal
> to actually set a new address that often.
> 
> The 4e3c40a change only makes the shortest possible time between two network
> events never be shorter than 1000 milliseconds so it might not change this
> particular problem.

muraiki tried build from bug 1235509 which is based on nightly, so the newest firefox still has this problem. Also with ip monitor he does not see any events except STALE REACHABLE and any network change events should be seen there (that is what we monitor)

I will add some logging and make new build
This is a try run (only build) with patch from bug 1235509 (it is a corrected version of that patch, I will submitted it shortly) and some additional logging

https://treeherder.mozilla.org/#/jobs?repo=try&revision=af383d5959c4
Hi muraiki,

May I ask you to try this build:

http://archive.mozilla.org/pub/firefox/try-builds/dd.mozilla@gmail.com-af383d5959c46218a9c4d8f6302f59a1aa5a7ff4/try-linux/firefox-46.0a1.en-US.linux-i686.tar.bz2

I have added some logging. Can you also make firefox log:

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

but instead of NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5
 write: NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketTransport:5,nsNotifyAddr:5
Flags: needinfo?(clementraphael)
I'm not able to execute the firefox binary in that archive. This is possibly because it's a 32-bit binary and I'm running 64-bit Ubuntu, although I confess that this is not something I know much about :)

`file firefox` reports:

firefox: ELF 32-bit LSB  executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, BuildID[sha1]=aad417befe02610ca1e732502fb1a020da9ae9b2, not stripped

My currently installed Firefox 43 reports:

`file /usr/lib/firefox/firefox`

/usr/lib/firefox/firefox: ELF 64-bit LSB  shared object, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=b7fb53e39c4e51d87ec38f43d103757bb4eacb6f, stripped
Flags: needinfo?(clementraphael)
I emailed you the log file and some more info. Thanks!
Oh the tension! I take it the problem didn't completely go away then?
Depends on: 1237775
So there are route change mesages. They are all over the same gateway, but different rta_dst (I have not print out any other parameter, only rta_dsr and gateway address).

Some are deletes and some are new

2016-01-07 16:07:04.788020 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:04.788044 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4000:808::1012
2016-01-07 16:07:04.788061 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:04.788076 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:08.229113 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:08.229127 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:400d:c07::8b
2016-01-07 16:07:08.229136 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:08.229143 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:08.245385 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:08.245396 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:400d:c07::71
2016-01-07 16:07:08.245403 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:08.245409 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:08.246398 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:08.246412 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:400d:c07::8a
2016-01-07 16:07:08.246420 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:08.246427 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:08.462795 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:08.462806 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4004:809::100f
2016-01-07 16:07:08.462812 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:08.462818 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:08.468299 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:08.468308 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:400d:c07::84
2016-01-07 16:07:08.468314 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:08.468319 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:09.115138 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:09.115153 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4006:808::1018
2016-01-07 16:07:09.115163 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:09.115172 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:09.156263 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:09.156295 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4006:808::1000
2016-01-07 16:07:09.156308 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:09.156316 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:09.161765 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:09.161781 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4006:808::100e
2016-01-07 16:07:09.161791 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:09.161890 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:07:09.210997 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:09.211016 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4006:808::1003
2016-01-07 16:07:09.211038 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8

2016-01-07 16:07:34.844331 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage del route
2016-01-07 16:07:34.844344 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:07:34.844367 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4000:808::1012
2016-01-07 16:07:34.844383 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:07:34.844395 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:08:04.732202 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage del route
2016-01-07 16:08:04.732215 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:08:04.732238 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2600:4:0:2::41ac:1f10
2016-01-07 16:08:04.732254 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:08:04.732265 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:08:04.732536 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage del route
2016-01-07 16:08:04.732547 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:08:04.732563 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2600:4:0:2::41ac:1f22
2016-01-07 16:08:04.732576 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:08:04.732588 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:08:04.732717 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage del route
2016-01-07 16:08:04.732727 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:08:04.732741 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4004:809::100f
2016-01-07 16:08:04.732754 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:08:04.732765 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:08:04.732882 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage del route
2016-01-07 16:08:04.732892 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:08:04.732905 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4006:808::1000
2016-01-07 16:08:04.732918 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:08:04.732953 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:08:04.733092 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage del route
2016-01-07 16:08:04.733102 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:08:04.733116 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4006:808::1003
2016-01-07 16:08:04.733129 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2016-01-07 16:08:04.733141 UTC - -2056587520[7f2d9c760fe0]: OnNetlinkMessage: route update
2016-01-07 16:08:04.733255 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage del route
2016-01-07 16:08:04.733265 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage new or del route
2016-01-07 16:08:04.733279 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage rta_dst address 2607:f8b0:4006:808::100e
2016-01-07 16:08:04.733291 UTC - -2056587520[7f2d9c760fe0]: nsNotifyAddrListener::OnNetlinkMessage gateway address 2607:f440::d144:1e8
2607:f440::d144:1e8 is my workplace's ip allocation. I'm not sure why `ip monitor` didn't show the same results. I'm having our sysadmins look into why these route change messages are happening.

So is what I'm experiencing expected behavior in terms of what should happen when routes are added/removed, but turning into something not useful since the messages are happening very often?
We should probably consider dropping the routing table updates completely. This could be backed up by the fact that Chrome doesn't seem to need them, see line 132 at https://src.chromium.org/viewvc/chrome/trunk/src/net/base/address_tracker_linux.cc

I added them to the notifications because I figured such updates would be relevant network changes, but we also can't deal with busloads of spurious information! :-/
Hello,
I work with Kroustou who first reported this bug.

Dragana's patched build fixes the problem. The connection succeeds everytime.

I'll mail the http log to Dragana.
(In reply to muraiki from comment #38)
> 2607:f440::d144:1e8 is my workplace's ip allocation. I'm not sure why `ip
> monitor` didn't show the same results. I'm having our sysadmins look into
> why these route change messages are happening.
> 
> So is what I'm experiencing expected behavior in terms of what should happen
> when routes are added/removed, but turning into something not useful since
> the messages are happening very often?

The route change was added by Daniel (see comment above) and in my opinion route changes are a reason to thing that the network as it is has changed. We want to detect that we are on another network, or some route has change, so we do not leek some infos that went through a vpn to a non-vpn connection, for example. 

It would be interesting to know why your network is configured like that, why it is changing routes so often. Can you ask your admins for some input here, so we can understand the situation better?
Flags: needinfo?(clementraphael)
I wasn't saying that the route change behavior is not a good idea, but rather that it becomes a problem due to whatever is happening on my network. :) I did ask for more input, but so far it's been a mystery.

I also need to try this at home, because I was also experiencing the problem there.
Flags: needinfo?(clementraphael)
This patch changes the network change event monitor not to react on route change events. Until now we know of on network that this causes problems. Probably there are more.

Taking with Daniel on irc, we decided to react only on address change events and not to react on route changes any more.

Asking also Patrick?
Attachment #8706392 - Flags: feedback?(mcmanus)
Attachment #8706392 - Flags: feedback?(daniel)
are these real routing changes or just noise that a hash would filter out?
From the log, I seen route advertisement for many different destination address over a single gateway address. We can check if the gateway is the default gateway in that case the routes are not useful and we can ignore them. other than that I do not know how to filter them and the network admins do not know why is it happening (comment #42)
(In reply to alex kiousis from comment #40)
> Hello,
> I work with Kroustou who first reported this bug.
> 
> Dragana's patched build fixes the problem. The connection succeeds everytime.
> 
> I'll mail the http log to Dragana.

Thank you for checking the patch, really appreciated. We now know that the patch fixed your problem.
Flags: needinfo?(kroustou)
(In reply to muraiki from comment #38)
> 2607:f440::d144:1e8 is my workplace's ip allocation. I'm not sure why `ip
> monitor` didn't show the same results. I'm having our sysadmins look into
> why these route change messages are happening.
> 
> So is what I'm experiencing expected behavior in terms of what should happen
> when routes are added/removed, but turning into something not useful since
> the messages are happening very often?

Is this address the ip address of your computer or is it a gateway, maybe the default gateway? I am trying to figure out if we can filter this somehow.
Flags: needinfo?(clementraphael)
Comment on attachment 8706392 [details] [diff] [review]
bug_1234548_v1.patch

Review of attachment 8706392 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, I'm for doing this change to get us to a more stable and reliable position.
Attachment #8706392 - Flags: feedback?(daniel) → feedback+
(In reply to Dragana Damjanovic [:dragana] from comment #48)
> (In reply to muraiki from comment #38)
> > 2607:f440::d144:1e8 is my workplace's ip allocation. I'm not sure why `ip
> > monitor` didn't show the same results. I'm having our sysadmins look into
> > why these route change messages are happening.
> > 
> > So is what I'm experiencing expected behavior in terms of what should happen
> > when routes are added/removed, but turning into something not useful since
> > the messages are happening very often?
> 
> Is this address the ip address of your computer or is it a gateway, maybe
> the default gateway? I am trying to figure out if we can filter this somehow.

That's the ip address of my default v6 gateway.

Also, I tried the patched build at home and it looks like it resolved the problem there. I didn't see any route changes in `ip monitor`.
Flags: needinfo?(clementraphael)
Attachment #8706392 - Flags: review+
Attachment #8706392 - Flags: feedback?(mcmanus)
Attachment #8706392 - Flags: feedback+
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c85c704cfa75
https://hg.mozilla.org/mozilla-central/rev/5bb7a0de940d
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Duplicate of this bug: 1239194
You need to log in before you can comment on or make changes to this bug.