Closed
Bug 1254323
Opened 8 years ago
Closed 6 years ago
Reduce the number of identical gethash requests done by the URL Classifier
Categories
(Toolkit :: Safe Browsing, defect, P1)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla60
Tracking | Status | |
---|---|---|
firefox60 | --- | fixed |
People
(Reporter: francois, Assigned: francois)
References
Details
Attachments
(2 files, 2 obsolete files)
Bug 1164518 highlighted the fact that if a page contains a large number of sub-resources on a domain that conflicts with a partial hash in a Safe Browsing table, we don't merge all of these requests together and we end up going to the server several times because the first response is processed and cached. It's not clear exactly how we could avoid this without cutting down some of the parallelism in the network requests. The attached patch extends the current request merging code to remove unnecessary caching of merged gethash requests.
Assignee | ||
Updated•8 years ago
|
Priority: -- → P2
Updated•8 years ago
|
Assignee: nobody → hchang
Assignee | ||
Updated•7 years ago
|
Assignee: hchang → nobody
Comment 3•7 years ago
|
||
I took a look at the current code, there are something may be helpful: - If we are about to send the getHash of prefixes which are in the air, these prefixes should be skipped. Maybe an in-flight state/request in HashCompleter.js would be helpful. - The cache is well updated every time we get GetHash result, presumably at this time we have to scan the getHashing queue (the prefixes we are going to send getHash request) and filter out prefixes which have just been cached. If the queue is empty, we may not need to worry (because the next time lookup we should look into the cache before sending gethash).
Comment 4•7 years ago
|
||
FWIW, https://developers.google.com/safe-browsing/v4/request-frequency We had min wait duration implemented but currently, we likely allow the next gethash if there's a request in flight.
Comment 5•7 years ago
|
||
(In reply to François Marier [:francois] from comment #0) > Created attachment 8727629 [details] [diff] [review] > bug1164518-skip_caching.patch > > Bug 1164518 highlighted the fact that if a page contains a large number of > sub-resources on a domain that conflicts with a partial hash in a Safe > Browsing table, we don't merge all of these requests together and we end up > going to the server several times because the first response is processed > and cached. > > It's not clear exactly how we could avoid this without cutting down some of > the parallelism in the network requests. > > The attached patch extends the current request merging code to remove > unnecessary caching of merged gethash requests. I did not follow the problem, is this something like: - We sent a fullhash request to network, and the request is sent to server but server may take long time or network delay, we have not received the response yet. - At that time, we may send several identical requests to network If I understood correctly, how about creating a queue of all in-processing gethash requests in urlclassifier. In-processing means from complete is called: http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#202 until we get result from gethash (onstoprequest) and completely update our cache. http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1305 The queue member may include table name and prefix of the matching url. So should code flow should be: 1. If there's a prefix matched, and we are about sent fullhash request A, add the request A (table and prefix) to in-processing queue 2. Send the request A (we may need a state variable) 3 There's another fullhash B need to be sent, firstly check the queue - If the request B is not in queue, then add this to the queue and send the request B - If the request B is in queue, skip the request B. At this point, it is important to update the callback to make the request B can asyn wait for the previously identical request A completes, then take A's result. 4 Get response of request A, update cache 5 Remove request A from queue (then the same with B, if B was sent and we received reponse B, update cache and remove B from queue) We may only accept a maximum length of the queue, 100 for example.
Flags: needinfo?(francois)
Comment 6•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5) > - If the request B is in queue, skip the request B. At this point, it is > important to update the callback to make the request B can asyn wait for the > previously identical request A completes, then take A's result. > And have to resolve race condition between worker and main thread if there's any
Comment 7•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5) > (In reply to François Marier [:francois] from comment #0) > > Created attachment 8727629 [details] [diff] [review] > > bug1164518-skip_caching.patch > > > > The queue member may include table name and prefix of the matching url. > So should code flow should be: Eh, it should be a map or a list.
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #5) > I did not follow the problem, is this something like: > > - We sent a fullhash request to network, and the request is sent to server > but server may take long time or network delay, we have not received the > response yet. > - At that time, we may send several identical requests to network Yes, that's what I remember seeing when I investigated the original report. > If I understood correctly, how about creating a queue of all in-processing > gethash requests in urlclassifier. In-processing means from complete is > called: It might be tricky to get the details right (and to avoid perf regressions), but that's the general idea I had too.
Flags: needinfo?(francois)
Comment 10•6 years ago
|
||
Hi francois, Base on this discussion last week, so if this bug is solved then we can remove |mLastResult|[1] right? I just want to confirm this and make sure we won't forget to remove it once this is done. [1] https://searchfox.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierDBService.h#294
Assignee | ||
Comment 11•6 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #10) > Base on this discussion last week, so if this bug is solved then we can > remove |mLastResult|[1] right? Do we need to? I think it might be worth keeping it as an extra safety check. The problem it works around is quite serious and doing that extra check might be worth it.
Comment 12•6 years ago
|
||
(In reply to François Marier [:francois] from comment #11) > (In reply to Dimi Lee[:dimi][:dlee] from comment #10) > > Base on this discussion last week, so if this bug is solved then we can > > remove |mLastResult|[1] right? > > Do we need to? > > I think it might be worth keeping it as an extra safety check. The problem > it works around is quite serious and doing that extra check might be worth > it. Sure, we can keep this one as an extra safety check, I am ok with this :) The reason I think we could remove it is that it is cached in DBServiceWorker while most of data are cached in LookupCache. You'll need some extra effort to figure this out when there is a bug related to it (bug 1325888).
Updated•6 years ago
|
Assignee: tnguyen → dlee
Status: NEW → ASSIGNED
Comment 13•6 years ago
|
||
Hi Thomas. This is an urgent request. When you have time, could you help check if this patch is the same as you have in mind? thanks!
Attachment #8940957 -
Flags: feedback?(tnguyen)
Comment 14•6 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #13) > This is an urgent request. When you have time, could you help check if this ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Should be : This is NOT an urgent request sorry...
Comment 15•6 years ago
|
||
Comment on attachment 8940957 [details] [diff] [review] Patch - Reduce the number of identical gethash requests Review of attachment 8940957 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js @@ +260,4 @@ > // Clear everything on shutdown > if (this._shuttingDown) { > this._currentRequest = null; > + this._ongoingRequest = null; It's not used anywhere. @@ +425,4 @@ > if (!this._completer.canMakeRequest(this.gethashUrl)) { > log("Can't make request to " + this.gethashUrl + "\n"); > this.notifyFailure(Cr.NS_ERROR_ABORT); > + return false; Should be true? We could ignore can-not-make request and notify error immediately. But we may want to move the notifyFailure after the line this._ongoingRequests.push(this._currentRequest); you added above. It's just my idea.
Attachment #8940957 -
Flags: feedback?(tnguyen) → feedback+
Comment 16•6 years ago
|
||
Thanks for your feedback, thomas! (In reply to Thomas Nguyen[:tnguyen] ni? plz from comment #15) > > + this._ongoingRequest = null; > > It's not used anywhere. > Thanks for finding this :) > @@ +425,4 @@ > > if (!this._completer.canMakeRequest(this.gethashUrl)) { > > log("Can't make request to " + this.gethashUrl + "\n"); > > this.notifyFailure(Cr.NS_ERROR_ABORT); > > + return false; > > Should be true? We could ignore can-not-make request and notify error > immediately. But we may want to move the notifyFailure after the line > this._ongoingRequests.push(this._currentRequest); you added above. It's just > my idea. As discussed offline, We will keep the current implementation.
Assignee | ||
Comment 17•6 years ago
|
||
Should be ready to go once the changes from comment 15 are resolved. Dimi tested it by creating a page with lots of iframes pointing to the phishing URL.
Assignee: dimidiana → francois
Priority: P2 → P1
Assignee | ||
Comment 18•6 years ago
|
||
Simple test page as described in comment 17.
Assignee | ||
Comment 19•6 years ago
|
||
While testing Dimi's patch, I found a bug in the way we add more hashes to a request that hasn't been set yet: hashcompleter: 16:37:57 GMT-0800 (PST): Merge gethash request in googpub-phish-proto for prefix : y1uMiQ== hashcompleter: 16:37:57 GMT-0800 (PST): Setting googpub-phish-proto in this.tableNames (was: undefined) hashcompleter: 16:37:57 GMT-0800 (PST): Finished calling fillTableStatesBase64(). hashcompleter: 16:37:57 GMT-0800 (PST): Build v4 gethash request with ["googpub-phish-proto"], ["Cg0IAhAGGAEiAzAwMTABEOXxAxoCGAWD7KiK"], ["771MOg==","K0sggA==","rLz3xw==","stcSog==","y1uMiQ=="] hashcompleter: 16:37:57 GMT-0800 (PST): actualGethashUrl: https://safebrowsing.googleapis.com/v4/fullHashes:find?$ct=application/x-protobuf&key=[trimmed-google-api-key]&$httpMethod=POST&$req=CgkKB0ZpcmVmb3gSGwoNCAIQBhgBIgMwMDEwARDl8QMaAhgFg-yoihouCAIQAhoGCgTvvUw6GgYKBCtLIIAaBgoErLz3xxoGCgSy1xKiGgYKBMtbjIkgAQ== hashcompleter: 16:37:57 GMT-0800 (PST): Merge gethash request in googpub-phish-proto for prefix : 771MOg== hashcompleter: 16:37:57 GMT-0800 (PST): Setting googpub-phish-proto in this.tableNames (was: Cg0IAhAGGAEiAzAwMTABEOXxAxoCGAWD7KiK) hashcompleter: 16:37:57 GMT-0800 (PST): Merge gethash request in googpub-phish-proto for prefix : K0sggA== hashcompleter: 16:37:57 GMT-0800 (PST): Setting googpub-phish-proto in this.tableNames (was: undefined) It looks like the this.tableNames map gets overwritten by later calls to HashCompleterRequest::add(aTableName, undefined): https://searchfox.org/mozilla-central/rev/cac28623a15ace458a8f4526e107a71db1519daf/toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js#354 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map/set Checking for the existence of the key in the this.tableNames map first fixes the problem.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•6 years ago
|
Attachment #8940957 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8727629 -
Attachment is obsolete: true
Comment 21•6 years ago
|
||
mozreview-review |
Comment on attachment 8952266 [details] Bug 1254323 - Reduce identical gethash requests done by the URL Classifier. https://reviewboard.mozilla.org/r/221510/#review227850
Attachment #8952266 -
Flags: review?(gpascutto) → review+
Comment 22•6 years ago
|
||
Pushed by fmarier@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/106b66081b0d Reduce identical gethash requests done by the URL Classifier. r=gcp
Comment 23•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/106b66081b0d
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in
before you can comment on or make changes to this bug.
Description
•