blocked gethash requests cause the browser to hang

RESOLVED FIXED in mozilla35

Status

()

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: mmc, Assigned: mmc)

Tracking

Trunk
mozilla35
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

This issue affects users in China, where the Great Firewall is blocking responses to gethash requests. This makes safebrowsing useless in China, but the collateral damage is that nsIUrlClassifierHashCompleter can't specify a timeout and doesn't get onStopRequest (http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#400).

This means that if a popular website like baidu includes blocked content as it is currently, Firefox hangs waiting for the gethash response.
The problem seems to me that there is no good fail-safe option.

If the gethash request for an URL that is on the prefix-blocklist times out, what will you do?
I'm still catching up on email, but is the problem that a SuspendChannel on some subresource causes many other unrelated requests to block?

I would expect a random image/JS file on a big site hanging not to hold up the rest of the site load.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #2)
> I'm still catching up on email, but is the problem that a SuspendChannel on
> some subresource causes many other unrelated requests to block?

I couldn't reproduce, but please see bug 1023767.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #1)
> The problem seems to me that there is no good fail-safe option.
> 
> If the gethash request for an URL that is on the prefix-blocklist times out,
> what will you do?

Fail open, same as if all safebrowsing services are down.
Is it possible to set a timeout, say 10s, for each gethash request?
Yes, if you implement it (last time I looked at it, we couldn't configure HTTP timeouts, certainly not depending on the connection origin). We might need to make our own timers + timeout handling inside SafeBrowsing.

But we should figure out why the site loads block entirely, because taking at least 10s to download random things is going to make Firefox performance look very bad.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #5)
> Yes, if you implement it (last time I looked at it, we couldn't configure
> HTTP timeouts, certainly not depending on the connection origin). We might
> need to make our own timers + timeout handling inside SafeBrowsing.
> 

> But we should figure out why the site loads block entirely, because taking
> at least 10s to download random things is going to make Firefox performance
> look very bad.

Yes, but it's better than endless page loading.

About why the site loads block entirely, take taobao.com for example, there are some taobao's pages that contain script[1], the script hit safebrowsing hash db, so Firefox hold the script[1] loading (which means the whole page loading is consequently hold too, unless the script[1] is |deferred|), then Firefox send a gethash request to Google to check if the script[1] is a malware url, but because of GFW, the gethash request never got response (HashCompleterRequest.onStopRequest never called), so the page is endless loading.

[1] http://a.tbcdn.cn/??s/kissy/1.2.0/kissy-min.js,apps/sportalapps/global/1.0/seller-global-min.js?t=20131028.js
Hey Patrick,

We need some necko advice. Is it possible for nsIChannel.asyncOpen to specify a timeout? Otherwise a safebrowsing outage (as happens regularly for China users because of the Great Firewall) can cause Firefox to hang waiting for safebrowsing responses. The relevant code is here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#251

Thanks,
Monica
Flags: needinfo?(mcmanus)
I think comment 5 sums up the expected usage - call asyncopen and set a timer in the calling code and cancel() the channel if your timer expires. iirc that's what the xhr code does.

Perhaps we should do something better.

There are a couple of necko internal timers, but they aren't going to be fine grained enough for you. Basically they are there to reap and reclaim network sockets for stuff that is truly and thoroughly hung - they won't give you responsiveness. (they are at 5 minutes and 90 seconds depending on the state of things and are global configs).

So do you want to implement the timer in the calling code or do you want it to be the property of nsIHttp*mumble* ? I don't mind implementing a necko interface, it should be quite easy, but it would ride the trains at the normal pace.

I would suggest that one problem with either approach is that it is subject to false positives with queueing delays.. i.e. your timer fires because the channel has just been waiting in line for a long time to get started rather than is actually taking a long time itself - which is a function of the length of the line. This can lead to "livelock" on slow networks. (real work never gets done even though the code appears to be making progress)

setting the timer only on admission is harder than it sounds because channels aren't bound tightly to particular tcp connections until those tcp connections are established. (this is related to the reason I mentioned two separate timers already exist).

fun - eh?
Flags: needinfo?(mcmanus)
Time to fix this one, I think.
Blocks: 1029886
Pin, is this still an issue? I tried to reproduce by pointing my hosts file at localhost but couldn't get Firefox to hang.
Flags: needinfo?(pzhang)
Assignee: nobody → mmc
Status: NEW → ASSIGNED
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #10)
> Pin, is this still an issue? I tried to reproduce by pointing my hosts file
> at localhost but couldn't get Firefox to hang.

Yes, it's still an issue. I don't think it could be reproduced by just changing hosts file.
Flags: needinfo?(pzhang)
Comment on attachment 8486105 [details] [diff] [review]
Cancel gethash requests after 5 seconds if they are still pending

Review of attachment 8486105 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know how to test this :( It does not interfere with normal operation, or when the hosts file hijacks safebrowsing.
Attachment #8486105 - Flags: review?(gpascutto)
Comment on attachment 8486105 [details] [diff] [review]
Cancel gethash requests after 5 seconds if they are still pending

Review of attachment 8486105 [details] [diff] [review]:
-----------------------------------------------------------------

When you say you don't know how to test, do you mean automated or, "at all"? If it's "at all" then I'm tempted to r- until you do. It indeed won't activate during normal operation, but what if my Wifi hiccups for 5s and Firefox get stuck for 2 minutes, or forever, because of a missing callback or the backoff logic breaking?

"Emulating" by just firing the timer callback instantly would work. As would replacing the completion URLs by the IP or your own machine, and running a simple TCP proxy on it (that delays). Not sure if you can find something off-the-shelf for it, but a TCP proxy in Unix that multiplexes select/read/send is a few hours of reasonably entertaining coding.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js
@@ +207,5 @@
> +  notify: function HCR_notify() {
> +    // If we haven't gotten onStopRequest, just cancel.
> +    if (this._channel && this._channel.isPending()) {
> +      dump("hashcompleter: cancelling request to " + this.gethashUrl + "\n");
> +      this._channel.cancel(Cr.NS_BINDING_ABORTED);

It's totally non-obvious to me whether this will correctly end up calling notifySuccess / notifyFailure? Failure to open the channel would, but canceling it on the timeout?

What happens to the RequestBackoff logic?
Attachment #8486105 - Flags: review?(gpascutto) → review+
I reproduced the hang and manually verified this patch fixes it.

STR:
1) Write node server that accepts requests but never responds
2) Point gethash URL at this
3) Go to http://algremh.com/Check/ppl/ (still live as of right now) and Firefox hangs

With this patch, the hash completer gets an onStopRequest which falls through to notifyFailure and backs off as expected.
Attachment #8486105 - Attachment is obsolete: true
Comment on attachment 8486865 [details] [diff] [review]
Cancel gethash requests after 5 seconds if they are still pending

Review of attachment 8486865 [details] [diff] [review]:
-----------------------------------------------------------------

Improved comments, carry over gcp's review.
Attachment #8486865 - Flags: review+
Comment on attachment 8486865 [details] [diff] [review]
Cancel gethash requests after 5 seconds if they are still pending

Review of attachment 8486865 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js
@@ +231,5 @@
> +    // Set a timer that cancels the channel after 5 seconds in case we don't
> +    // get a gethash response.
> +    this.timer_ = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +    // Ask the timer to use nsITimerCallback (.notify()) when ready
> +    this.timer_.initWithCallback(this, 5000, this.timer_.TYPE_ONE_SHOT);

Is it possible to get the timeout from a pref, say browser.safebrowsing.gethash.timeout? Then we could set it with some proper value based on the local network condition.
Sure, I can do that. It would also be great to have confirmation if this fix works, if you can reproduce that from the TBPL link.
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #20)
> Sure, I can do that. It would also be great to have confirmation if this fix
> works, if you can reproduce that from the TBPL link.

I tested the patch, it works perfectly, thanks!
Attachment #8486865 - Attachment is obsolete: true
Comment on attachment 8487322 [details] [diff] [review]
Cancel gethash requests after 5 seconds if they are still pending

Review of attachment 8487322 [details] [diff] [review]:
-----------------------------------------------------------------

Update to use pref for setting timeout. gcp, are you ok with this naming?
Attachment #8487322 - Flags: review?(gpascutto)
Comment on attachment 8487322 [details] [diff] [review]
Cancel gethash requests after 5 seconds if they are still pending

Review of attachment 8487322 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/init/all.js
@@ +4290,5 @@
>  // UDPSocket API
>  pref("dom.udpsocket.enabled", false);
> +
> +// Gethash timeout for Safebrowsing.
> +pref("browser.safebrowsing.gethash.timeout_ms", 5000);

I would throw this in with the other browser.safebrowsing.* prefs, which are in the project specific configs. That even makes some sense as mobile may want a higher timeout.

Also, most of the prefs related to completions are like urlclassifier.* not browser.safebrowsing.*, so for consistency (haha) it should probably be urlclassifier.gethash.timeout_ms.
Attachment #8487322 - Flags: review?(gpascutto) → review+
Attachment #8487322 - Attachment is obsolete: true
Comment on attachment 8487348 [details] [diff] [review]
Cancel gethash requests after 5 seconds if they are still pending

Review of attachment 8487348 [details] [diff] [review]:
-----------------------------------------------------------------

Renamed as gcp suggested. I actually filed bug 1033450 to consolidate prefs in all.js a while back.
Attachment #8487348 - Flags: review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c52e9871c9e1 - xpcshell/tests/toolkit/components/url-classifier/tests/unit/test_hashcompleter.js didn't care for that one bit.
Attachment #8487348 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/542e6d9497d4
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
See Also: → 1165816
Depends on: 1172688
No longer depends on: 1172688
See Also: → 1172688
You need to log in before you can comment on or make changes to this bug.