Closed Bug 1313629 Opened 8 years ago Closed 8 years ago

Version-aware (v2/v4) hash completer

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1276826

People

(Reporter: hchang, Assigned: hchang)

Details

(Whiteboard: #sbv4-m4)

Attachments

(1 file, 1 obsolete file)

We need to split some code for hashcompleter in order to support v2/v4 at the same time.
Assignee: nobody → hchang
Comment on attachment 8805514 [details] [diff] [review]
0001-Bug-1313629-Build-and-send-gethash-request-based-on-.patch

Not sure if this bug is needed or we'd better of doing everything in Bug 1276826 where we will implement the buildRequestV4 function. Francois, Dimi, what do you think?
Attachment #8805514 - Flags: feedback?(francois)
Attachment #8805514 - Flags: feedback?(dlee)
I just realized this bug is quite important and worth landing as early as possible to avoid a huge change in Bug 1276826. 

About this patch:

1) Since V4 gethash request needs threat types and the associated states [1] 
   (see "Example: fullHashes.find)", |nsIUrlClassifierHashCompleter.complete| 
   requires an extra parameter to pass in the table where the partial hash is found.

2) Merging multiple requests (for the same gethashURL) into one HTTP request 
   becomes a little more complicated when the table name is taken into account. 
   We may have duplicate/distinct partial hashes found in duplicate/distinct tables. 
   So, in the merged request, we use a map to maintain the associated tables to
   avoid duplicates.

3) Getting the table states requires a disk access. Not sure if this would be
   a performance hit. We might need to add a layer of "state cache" to reduce
   the performance hit if any.

4) There is a test case in the patch. In the final implementation, the request
   should be generated by protobuf and attached to the URL. For now we just attach
   the "table name => state" mapping to the URL so that the test case can test
   something with it. For v4 there is still a gap between "prefix match" and "hash completion",
   which means only the test case could trigger the gethashURL by directly using
   nsIUrlHashCompleter.complete().

[1] https://developers.google.com/safe-browsing/v4/update-api
(In reply to Henry Chang [:henry][:hchang] from comment #4)
> I just realized this bug is quite important and worth landing as early as
> possible to avoid a huge change in Bug 1276826. 
> 
> About this patch:
> 
> 1) Since V4 gethash request needs threat types and the associated states [1] 
>    (see "Example: fullHashes.find)",
> |nsIUrlClassifierHashCompleter.complete| 
>    requires an extra parameter to pass in the table where the partial hash
> is found.
> 

I saw you use table name ends with "-proto" to know its v4 or not in HashCompleter
As you suggested before we should merge them, i filed bug 1314842.

> 2) Merging multiple requests (for the same gethashURL) into one HTTP request 
>    becomes a little more complicated when the table name is taken into
> account. 
>    We may have duplicate/distinct partial hashes found in duplicate/distinct
> tables. 
>    So, in the merged request, we use a map to maintain the associated tables
> to
>    avoid duplicates.
> 

From lookupAPI in google's doc, i guess we will also need platform type set for gethash request ?

> 3) Getting the table states requires a disk access. Not sure if this would be
>    a performance hit. We might need to add a layer of "state cache" to reduce
>    the performance hit if any.
> 

Ya, i think we may need this in the future. When i implemented checksum bug i once wrote the checksum/state cache in LookupCacheV4:LoadMetadata, but for simplicity reason i remove those from patch. I guess it might worth doing it if everytime a gethash request is sent we will access disk.
Attachment #8805514 - Flags: feedback?(dlee)
Attachment #8805514 - Attachment is obsolete: true
Attachment #8805514 - Flags: feedback?(francois)
Whiteboard: #sbv4-m4
Comment on attachment 8806280 [details]
Bug 1313629 - Build and send gethash request based on the table name.

https://reviewboard.mozilla.org/r/89782/#review94404

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:287
(Diff revision 1)
>    this.gethashUrl = aGethashUrl;
> +
> +  // Multiple partial hashes can be associated with the same tables
> +  // so we use a set here.
> +  //
> +  // Question: is empty table name possible?

Full hashes need to be associated with at least one table.

I can't think of a way we wouldn't get one.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:307
(Diff revision 1)
>        responses: []
>      });
> +
> +    if (aTableName) {
> +      this.tableNames.set(aTableName);
> +      let isTableNameV4 = aTableName.endsWith('-proto');

I wonder whether we should instead get the provider from the listname and then look at `browser.safebrowsing.provider.???.pver`.

However, we would also need a small in-memory cache for this because we could have 150 gethash calls triggered by a single page load.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:310
(Diff revision 1)
> +    if (aTableName) {
> +      this.tableNames.set(aTableName);
> +      let isTableNameV4 = aTableName.endsWith('-proto');
> +      if (1 === this.tableNames.size) {
> +        // Decide if this request is v4 by the first added partial hash.
> +        this.isV4 = isTableNameV4;

Maybe we should use `this.pver = '4'` or `this.pver = '2.2'` instead?

It seems a little inflexible to have a flag just for V4 when we could end up with both V4 and V4.1 providers in the future.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:313
(Diff revision 1)
> +      if (1 === this.tableNames.size) {
> +        // Decide if this request is v4 by the first added partial hash.
> +        this.isV4 = isTableNameV4;
> +      } else if (this.isV4 !== isTableNameV4) {
> +        log('ERROR: Cannot mix "proto" tables with other types within ' +
> +            'the smae gethash URL.');

"same"

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:356
(Diff revision 1)
>      }
>  
>      Services.obs.addObserver(this, "xpcom-shutdown", false);
>  
> +    // V4 requires table states to build the request so we need
> +    // a async call to retrieve the table states from disk.

We might need to cache the state in memory. gethash blocks network requests and doing a disk access on a slow HDD could delay these requests for a while.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:358
(Diff revision 1)
>      Services.obs.addObserver(this, "xpcom-shutdown", false);
>  
> +    // V4 requires table states to build the request so we need
> +    // a async call to retrieve the table states from disk.
> +    // Note that |HCR_begin| is fine to be sync because
> +    // it is not appeared in a sync call chain.

"it doesn't appear in a sync call chain."

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:417
(Diff revision 1)
>      httpChannel.setRequestHeader("Connection", "close", false);
>  
>      this._channel = channel;
>  
> +    if (this.isV4) {
> +      // TODO (Bug 1276826): Deal with v4 specific HTTP headers.

If that's the only header we need to worry about for V4, we could also just do it here. It would remove one TODO.

::: toolkit/components/url-classifier/tests/unit/head_urlclassifier.js:438
(Diff revision 1)
> +
> +  dbService.getTables(metaData => {
> +    do_print("metadata: " + metaData);
> +    let didCallback = false;
> +    metaData.split("\n").some(line => {
> +      // Parse [tableName];[stateBase64]

You could also add `:[checksumBase64]` to that comment since you're extracting that piece too.

::: toolkit/components/url-classifier/tests/unit/test_hashcompleter_v4.js:1
(Diff revision 1)
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");

Missing public domain header.

::: toolkit/components/url-classifier/tests/unit/test_hashcompleter_v4.js:40
(Diff revision 1)
> +                           TEST_TABLE_DATA_V4.updateUrl,
> +                           TEST_TABLE_DATA_V4.gethashUrl);
> +
> +// This is unfortunately needed since v4 gethash request
> +// requires the threat type (table name) as well as the
> +// state it associated with. We have to run the update once

"state it's associated with."

::: toolkit/components/url-classifier/tests/unit/test_hashcompleter_v4.js:92
(Diff revision 1)
> +    //                    'prefix_size': 4,
> +    //                    'raw_hashes': "00000001000000020000000300000004"}
> +    //   }
> +    // ]
> +    //
> +    let content = "\x0A\x4A\x08\x02\x20\x02\x2A\x18\x08\x01\x12\x14\x08\x04\x12\x10\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x3A\x06\x73\x74\x61\x00\x74\x65\x42\x22\x0A\x20\x30\x67\xC7\x2C\x5E\x50\x1C\x31\xE3\xFE\xCA\x73\xF0\x47\xDC\x34\x1A\x95\x63\x99\xEC\x70\x5E\x0A\xEE\x9E\xFB\x17\xA1\x55\x35\x78\x12\x08\x08\x08\x10\x80\x94\xEB\xDC\x03";

Thanks for using real protobufs in the tests and including a human-readable version too!
Attachment #8806280 - Flags: review?(francois) → review-
Comment on attachment 8806280 [details]
Bug 1313629 - Build and send gethash request based on the table name.

https://reviewboard.mozilla.org/r/89782/#review94404

> I wonder whether we should instead get the provider from the listname and then look at `browser.safebrowsing.provider.???.pver`.
> 
> However, we would also need a small in-memory cache for this because we could have 150 gethash calls triggered by a single page load.

We could and we can do it just like how we look up the provider in Bug 1315097.
However, I'd like to address this issue further more in another mail and we can 
discuss tomorrow :)
Comment on attachment 8806280 [details]
Bug 1313629 - Build and send gethash request based on the table name.

https://reviewboard.mozilla.org/r/89782/#review94404

> We might need to cache the state in memory. gethash blocks network requests and doing a disk access on a slow HDD could delay these requests for a while.

Indeed!
Before we continue working on this bug, Francois, do you think it's fine to land this bug first or 
we should merge this to Bug 1276826? (where the gethash request/response will be fully implemented.)
Flags: needinfo?(francois)
(In reply to Henry Chang [:henry][:hchang] from comment #9)
> Before we continue working on this bug, Francois, do you think it's fine to
> land this bug first or 
> we should merge this to Bug 1276826? (where the gethash request/response
> will be fully implemented.)

Good point. Since this patch doesn't really do anything by itself, it might be better to put everything in bug 1276826.
Flags: needinfo?(francois)
Merging to Bug 1276826.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: