Closed Bug 1254323 Opened 8 years ago Closed 6 years ago

Reduce the number of identical gethash requests done by the URL Classifier

Categories

(Toolkit :: Safe Browsing, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: francois, Assigned: francois)

References

Details

Attachments

(2 files, 2 obsolete files)

Attached patch bug1164518-skip_caching.patch (obsolete) — Splinter Review
Bug 1164518 highlighted the fact that if a page contains a large number of sub-resources on a domain that conflicts with a partial hash in a Safe Browsing table, we don't merge all of these requests together and we end up going to the server several times because the first response is processed and cached.

It's not clear exactly how we could avoid this without cutting down some of the parallelism in the network requests.

The attached patch extends the current request merging code to remove unnecessary caching of merged gethash requests.
Priority: -- → P2
Assignee: nobody → hchang
Assignee: hchang → nobody
I took a look at the current code, there are something may be helpful:
- If we are about to send the getHash of prefixes which are in the air, these prefixes should be skipped. Maybe an in-flight state/request in HashCompleter.js would be helpful. 
- The cache is well updated every time we get GetHash result, presumably at this time we have to scan the getHashing queue (the prefixes we are going to send getHash request) and filter out prefixes which have just been cached. If the queue is empty, we may not need to worry (because the next time lookup we should look into the cache before sending gethash).
FWIW,
https://developers.google.com/safe-browsing/v4/request-frequency
We had min wait duration implemented but currently, we likely allow the next gethash if there's a request in flight.
(In reply to François Marier [:francois] from comment #0)
> Created attachment 8727629 [details] [diff] [review]
> bug1164518-skip_caching.patch
> 
> Bug 1164518 highlighted the fact that if a page contains a large number of
> sub-resources on a domain that conflicts with a partial hash in a Safe
> Browsing table, we don't merge all of these requests together and we end up
> going to the server several times because the first response is processed
> and cached.
> 
> It's not clear exactly how we could avoid this without cutting down some of
> the parallelism in the network requests.
> 
> The attached patch extends the current request merging code to remove
> unnecessary caching of merged gethash requests.

I did not follow the problem, is this something like:

- We sent a fullhash request to network, and the request is sent to server but server may take long time or network delay, we have not received the response yet.
- At that time, we may send several identical requests to network

If I understood correctly, how about creating a queue of all in-processing gethash requests in urlclassifier. In-processing means from complete is called:

http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#202

until we get result from gethash (onstoprequest) and completely update our cache.
http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1305

The queue member may include table name and prefix of the matching url. 
So should code flow should be: 
1. If there's a prefix matched, and we are about sent fullhash request A, add the request A (table and prefix) to in-processing queue
2. Send the request A (we may need a state variable)
3 There's another fullhash B need to be sent, firstly check the queue
- If the request B is not in queue, then add this to the queue and send the request B
- If the request B is in queue, skip the request B. At this point, it is important to update the callback to make the request B can asyn wait for the previously identical request A completes, then take A's result.

4 Get response of request A, update cache
5 Remove request A from queue
(then the same with B, if B was sent and we received reponse B, update cache and remove B from queue)

We may only accept a maximum length of the queue, 100 for example.
Flags: needinfo?(francois)
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5)

> - If the request B is in queue, skip the request B. At this point, it is
> important to update the callback to make the request B can asyn wait for the
> previously identical request A completes, then take A's result.
> 
And have to resolve race condition between worker and main thread if there's any
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5)
> (In reply to François Marier [:francois] from comment #0)
> > Created attachment 8727629 [details] [diff] [review]
> > bug1164518-skip_caching.patch
> > 
> 
> The queue member may include table name and prefix of the matching url. 
> So should code flow should be: 
Eh, it should be a map or a list.
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5)
> I did not follow the problem, is this something like:
> 
> - We sent a fullhash request to network, and the request is sent to server
> but server may take long time or network delay, we have not received the
> response yet.
> - At that time, we may send several identical requests to network

Yes, that's what I remember seeing when I investigated the original report.

> If I understood correctly, how about creating a queue of all in-processing
> gethash requests in urlclassifier. In-processing means from complete is
> called:

It might be tricky to get the details right (and to avoid perf regressions), but that's the general idea I had too.
Flags: needinfo?(francois)
Looks interesting, I would like to take this
Assignee: nobody → tnguyen
Hi francois,
Base on this discussion last week, so if this bug is solved then we can remove |mLastResult|[1] right?
I just want to confirm this and make sure we won't forget to remove it once this is done.

[1] https://searchfox.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.h#294
(In reply to Dimi Lee[:dimi][:dlee] from comment #10)
> Base on this discussion last week, so if this bug is solved then we can
> remove |mLastResult|[1] right?

Do we need to?

I think it might be worth keeping it as an extra safety check. The problem it works around is quite serious and doing that extra check might be worth it.
(In reply to François Marier [:francois] from comment #11)
> (In reply to Dimi Lee[:dimi][:dlee] from comment #10)
> > Base on this discussion last week, so if this bug is solved then we can
> > remove |mLastResult|[1] right?
> 
> Do we need to?
> 
> I think it might be worth keeping it as an extra safety check. The problem
> it works around is quite serious and doing that extra check might be worth
> it.

Sure, we can keep this one as an extra safety check, I am ok with this :)

The reason I think we could remove it is that it is cached in DBServiceWorker while most of data
are cached in LookupCache. You'll need some extra effort to figure this out when there is a bug related to it (bug 1325888).
Assignee: tnguyen → dlee
Status: NEW → ASSIGNED
Hi Thomas.
This is an urgent request. When you have time, could you help check if this patch is the same as you have in mind? thanks!
Attachment #8940957 - Flags: feedback?(tnguyen)
(In reply to Dimi Lee[:dimi][:dlee] from comment #13)
> This is an urgent request. When you have time, could you help check if this
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Should be : This is NOT an urgent request

sorry...
Comment on attachment 8940957 [details] [diff] [review]
Patch - Reduce the number of identical gethash requests

Review of attachment 8940957 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js
@@ +260,4 @@
>      // Clear everything on shutdown
>      if (this._shuttingDown) {
>        this._currentRequest = null;
> +      this._ongoingRequest = null;

It's not used anywhere.

@@ +425,4 @@
>      if (!this._completer.canMakeRequest(this.gethashUrl)) {
>        log("Can't make request to " + this.gethashUrl + "\n");
>        this.notifyFailure(Cr.NS_ERROR_ABORT);
> +      return false;

Should be true? We could ignore can-not-make request and notify error immediately. But we may want to move the notifyFailure after the line this._ongoingRequests.push(this._currentRequest); you added above. It's just my idea.
Attachment #8940957 - Flags: feedback?(tnguyen) → feedback+
Thanks for your feedback, thomas!

(In reply to Thomas Nguyen[:tnguyen] ni? plz from comment #15)
> > +      this._ongoingRequest = null;
> 
> It's not used anywhere.
> 

Thanks for finding this :)

> @@ +425,4 @@
> >      if (!this._completer.canMakeRequest(this.gethashUrl)) {
> >        log("Can't make request to " + this.gethashUrl + "\n");
> >        this.notifyFailure(Cr.NS_ERROR_ABORT);
> > +      return false;
> 
> Should be true? We could ignore can-not-make request and notify error
> immediately. But we may want to move the notifyFailure after the line
> this._ongoingRequests.push(this._currentRequest); you added above. It's just
> my idea.

As discussed offline, We will keep the current implementation.
Should be ready to go once the changes from comment 15 are resolved.

Dimi tested it by creating a page with lots of iframes pointing to the phishing URL.
Assignee: dimidiana → francois
Priority: P2 → P1
Attached file Test page
Simple test page as described in comment 17.
While testing Dimi's patch, I found a bug in the way we add more hashes to a request that hasn't been set yet:

hashcompleter: 16:37:57 GMT-0800 (PST): Merge gethash request in googpub-phish-proto for prefix : y1uMiQ==
hashcompleter: 16:37:57 GMT-0800 (PST): Setting googpub-phish-proto in this.tableNames (was: undefined)
hashcompleter: 16:37:57 GMT-0800 (PST): Finished calling fillTableStatesBase64().
hashcompleter: 16:37:57 GMT-0800 (PST): Build v4 gethash request with ["googpub-phish-proto"], ["Cg0IAhAGGAEiAzAwMTABEOXxAxoCGAWD7KiK"], ["771MOg==","K0sggA==","rLz3xw==","stcSog==","y1uMiQ=="]
hashcompleter: 16:37:57 GMT-0800 (PST): actualGethashUrl: https://safebrowsing.googleapis.com/v4/fullHashes:find?$ct=application/x-protobuf&key=[trimmed-google-api-key]&$httpMethod=POST&$req=CgkKB0ZpcmVmb3gSGwoNCAIQBhgBIgMwMDEwARDl8QMaAhgFg-yoihouCAIQAhoGCgTvvUw6GgYKBCtLIIAaBgoErLz3xxoGCgSy1xKiGgYKBMtbjIkgAQ==
hashcompleter: 16:37:57 GMT-0800 (PST): Merge gethash request in googpub-phish-proto for prefix : 771MOg==
hashcompleter: 16:37:57 GMT-0800 (PST): Setting googpub-phish-proto in this.tableNames (was: Cg0IAhAGGAEiAzAwMTABEOXxAxoCGAWD7KiK)
hashcompleter: 16:37:57 GMT-0800 (PST): Merge gethash request in googpub-phish-proto for prefix : K0sggA==
hashcompleter: 16:37:57 GMT-0800 (PST): Setting googpub-phish-proto in this.tableNames (was: undefined)

It looks like the this.tableNames map gets overwritten by later calls to
HashCompleterRequest::add(aTableName, undefined):

https://searchfox.org/mozilla-central/rev/cac28623a15ace458a8f4526e107a71db1519daf/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#354
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map/set

Checking for the existence of the key in the this.tableNames map first fixes the problem.
Attachment #8940957 - Attachment is obsolete: true
Attachment #8727629 - Attachment is obsolete: true
Comment on attachment 8952266 [details]
Bug 1254323 - Reduce identical gethash requests done by the URL Classifier.

https://reviewboard.mozilla.org/r/221510/#review227850
Attachment #8952266 - Flags: review?(gpascutto) → review+
Pushed by fmarier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/106b66081b0d
Reduce identical gethash requests done by the URL Classifier. r=gcp
https://hg.mozilla.org/mozilla-central/rev/106b66081b0d
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: