Closed Bug 832056 Opened 11 years ago Closed 8 years ago

Lack of answer from safebrowsing.clients.google.com can lead to random connection failures

Categories

(Toolkit :: Safe Browsing, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: glandium, Unassigned)

References

Details

Attachments

(2 files)

Attached file Wireshark capture
There are possibly several bugs hidden in there, but here's my story: to perform some testing, I added an entry to /etc/hosts making safebrowsing.clients.google.com point to 127.0.0.1. While testing, I had a local web server, but after a reboot, I still had the /etc/hosts entry but not the web server running.

I had this setup for a little while.

Then, I noticed I couldn't connect to lwn.net for some reason, while everything else seemed to work fine. The throbber would keep turning and the "statusbar" shows "Waiting for lwn.net" (or something like that) indefinitely, like when the server doesn't respond. Interestingly, running another instance of firefox on the same machine, under a different profile, made lwn.net appear.

I initially thought it was a DNS cache problem, but Wireshark showed something interesting, a capture of which I'm attaching here: I was connecting to the right server, and I was getting answers from it. In fact, the full TCP initial hand-shake is happening, except Firefox is sending nothing, and ends up closing the connection.

Removing the /etc/hosts entry and restarting Firefox makes lwn.net available again. Interestingly, re-adding the entry and restarting breaks access to lwn.net again.

I think there are two problems here:
- That the unavailability of safebrowsing.clients.google.com can break connections to some servers.
- That a connection that was actively closed on the client side is still "Waiting" in the UI. But then, maybe it just has a weird status in Firefox due to the first problem.
this gets my vote for weirdest str of the day :)
I can't confirm this on linux-64 using a local build of m-c..

mike, maybe a http log would give some insight?

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging?redirectlocale=en-US&redirectslug=HTTP_Logging
I did that yesterday with nsHttp only, the log was massive because of my 2000+ tabs (and there's another bug i need to file about that, btw, because there's no reason it should be doing so much http considering only about 20 tabs are loaded). I want to try to make my session smaller before going any further.
(In reply to Mike Hommey [:glandium] from comment #3)
> I did that yesterday with nsHttp only, the log was massive because of my
> 2000+ tabs (and there's another bug i need to file about that, btw, because
> there's no reason it should be doing so much http considering only about 20
> tabs are loaded). I want to try to make my session smaller before going any
> further.

oh dear. I bet at the bottom of this is a bug that hangs a connection once in a while and you get the lwn.net problem because 
a] with 2K tabs you probably trigger it a lot
and b] bug 421128
We're having a similar problem where Firefox will stop loading a page on seemingly random sites while trying to contact Google safebrowsing. http://www.irs.gov is one example. After the initial failure to load the page, the page will not load under the Firefox profile (even across restarts) until you "Reset Firefox" in the troubleshooting dialog. The site will also load just fine under a fresh Firefox profile.

I discovered today that deleting or renaming the safebrowsing folder in the profil, and restarting Firefox clears up the issue for a time, similar to doing a reset.

We are going through a proxy that uses authentication and does filtering, if that helps. We noticed this started about 2 weeks ago. We've tested this on version ESR 17.0.3 and version 18.0.2. Both behave the same way.
The same problem here, in a closed network, without internet connection.

If safe browsing is active, the browser do not redirect, if "303 See Other" is recived.

I can provide traces or log files or whatever requiered.
We recently began experiencing hangs in FF when accessing certain URLs within one of our internal web applications.  I observed the communication using Fiddler.  Fiddler showed that when a hang occurred, the last communication attempt was to host safebrowsing.clients.google.com, for which the http response code was 307.  Our temporary workaround is to disable Google Safe Browsing protection in FF.
Hi Dan - can you characterize the URLs that you couldn't access in any way?

e.g. http or https

were they the first visit to the origin in the url?

was the safebrowsing url 307 succesfully resolved? and if not - do you konw why not? (that might help build a STR)

Thanks!
Hi Patrick - I attached the Fiddler communication log for when a user clicks on a particular link that hangs in FF.  You can see that the requests to our internal application server are https and the request to safebrowsing is http.  I don't understand this stuff well enough to know if the 307 is resolved.  If I do an nslookup of safebrowsing.clients.google.com (on a PC that is experiencing this issue) it resolves.

I believe we first experienced this issue about a month ago when users accessed via. wireless (which put them on different subnet served from our corporate headquarters).  Now it seems to be happening on our local wired subnet also.  

Also, we only experience this for certain URLs in our production application.  We serve multiple non-production instances of this same application from separate physical servers.  Users can click on the same links in the non-production instances and FF does not hang.
Beijing office got hundreds of user reports in the past few days, saying certain pages from Baidu (similar to Google), Taobao (similar to eBay) and CNTV(website of CCTV, China's national TV station) are broken. Urls like http://a.tbcdn.cn/??s/kissy/1.2.0/kissy-min.js,apps/sportalapps/global/1.0/seller-global-min.js?t=20131028.js will trigger request to Google's safebrowsing server, and unsuccessful connection[1] to browser.safebrowsing.gethashURL will make the page blank.

[1]: http://ars.to/1pCYsdE
This just happened to our website, currently any pages on "www.cdu.edu.au" and "cdu.edu.au" are blocked to users that don't have access to the internet.

You can reproduce this now with just about any version of Firefox (win/osx/linux) including the latest by doing the hosts file thing Mike suggested, although also add an entry for safebrowsing.google.com (some versions use this host name instead, you can check which one you need in about:config and search for safebrowsing).

Put the hosts entries in place and restart firefox to a clean session. Then visit our websites and watch as the page never loads.

I also used an intentionally broken proxy.pac file to demonstrate the issue on machines without the need for admin access. Here's my proxy config file...

function FindProxyForURL(url, host)
{
 if (shExpMatch(host, "safebrowsing.google.com")) {
  return "PROXY 127.0.0.1:9999";
 if (shExpMatch(host, "safebrowsing.clients.google.com")) {
  return "PROXY 127.0.0.1:9999";
 } else {
  return "DIRECT";
 }
}

Then go into Advanced | Connection Settings....

Automatic proxy configuration URL:
file:///C:/Path/To/File/block-safebrowsing.pac
Just tested Firefox 33 and this appears to have been fixed by #1023767.

However, the new behaviour where Safe Browsing is silently disabled if the Google service is unavailable raises a new issue. I'll log a separate bug report for that.
Depends on: 1063316
Based on comment 13, I'll assume this got fixed in bug 1023767.

Furthermore, there is now a timeout on gethash calls (added in bug 1024555).
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: