Closed Bug 1311935 Opened 8 years ago Closed 7 years ago

Implement Safe Browsing v4 caching

Categories

(Toolkit :: Safe Browsing, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: tnguyen, Assigned: dlee)

References

Details

(Whiteboard: #sbv4-m6)

Attachments

(5 files, 4 obsolete files)

Whiteboard: #sbv4-m5
Priority: -- → P2
Whiteboard: #sbv4-m5 → #sbv4-m6
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Depends on: 1328821
Attached patch (WIP) P2. patch (obsolete) — Splinter Review
No longer blocks: safebrowsingv4
Depends on: 1331881
Depends on: 1333257
Blocks: 1333328
Attached file Design document
Attachment #8825696 - Attachment is obsolete: true
Attachment #8824351 - Attachment is obsolete: true
Attachment #8823548 - Attachment is obsolete: true
Attachment #8834712 - Attachment is obsolete: true
Hi francois,
I am still working on the Part3 which is about the logic to see if a url is safe by using cache. But to speed up the review process, could you help review first two part when you are free ? Thanks!
Comment on attachment 8846527 [details]
Bug 1311935 - P1. Make ActiveTables() work for safebrowsing v4.

https://reviewboard.mozilla.org/r/119578/#review125038

::: toolkit/components/url-classifier/Classifier.cpp:905
(Diff revision 1)
>      nsCOMPtr<nsIFile> file = do_QueryInterface(supports);
>  
> +    // If |file| is a directory, recurse to find its entries as well.
> +    bool isDirectory;
> +    if (NS_FAILED(file->IsDirectory(&isDirectory))) {
> +      continue;

Should we output a warning here? I imagine this only fails when there are permission problems reading/accessing the file/directory.
Attachment #8846527 - Flags: review?(francois) → review+
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review127404

The spec [seems to suggest](https://developers.google.com/safe-browsing/v4/caching) that the cache duration we get from the API are in seconds. We internally store everything in milliseconds. That's fine if it makes some checks easier, but we should make sure to label all of the parameters with the units so that we avoid conversion errors.

::: toolkit/components/url-classifier/Entries.h:321
(Diff revision 1)
>  
>  typedef nsClassHashtable<nsUint32HashKey, nsCString> PrefixStringMap;
>  
>  typedef nsDataHashtable<nsCStringHashKey, int64_t> TableFreshnessMap;
>  
> +// Cache expire time map keyed by 32-bytes fullhash.

If you name the variable `FullHashExpiryCache` then you can eliminate the comment.

::: toolkit/components/url-classifier/Entries.h:324
(Diff revision 1)
>  typedef nsDataHashtable<nsCStringHashKey, int64_t> TableFreshnessMap;
>  
> +// Cache expire time map keyed by 32-bytes fullhash.
> +typedef nsDataHashtable<nsCStringHashKey, int64_t> FullHashCache;
> +
> +// Cache expire time map keyed by 32-bytes fullhash.

Duplicate comment.

::: toolkit/components/url-classifier/Entries.h:327
(Diff revision 1)
> +typedef nsDataHashtable<nsCStringHashKey, int64_t> FullHashCache;
> +
> +// Cache expire time map keyed by 32-bytes fullhash.
> +struct CachedFullHashResponse {
> +  // Cache expire time in epoch time.
> +  int64_t negativeCacheExpireTime;

This comment needs the units, so it could simply be  `// ms since epoch` at the end of the line.

Or we could name it `negativeCacheExpiryMs` so that the units are clear.

::: toolkit/components/url-classifier/Entries.h:358
(Diff revision 1)
> +    return true;
> +  }
> +};
> +
> +// A Fullhash response map keyed by variable-length prefix.
> +typedef nsClassHashtable<nsCStringHashKey, CachedFullHashResponse> FullHashResponseMap;

I wonder if we should have two identical typedefs here:

- `typedef nsCStringHashKey VLHashPrefixString`
- `typedef nsCStringHashKey FullHashString`

So that we can remove the comment and make it obvious that these two maps are keyed using different kinds of string.

::: toolkit/components/url-classifier/HashStore.cpp:218
(Diff revision 1)
> +TableUpdateV4::NewFullHashResponse(const nsACString& aPrefix,
> +                                   CachedFullHashResponse& aResponse)
> +{
> +  CachedFullHashResponse* response =
> +    mFullHashResponseMap.LookupOrAdd(aPrefix);
> +  *response = aResponse;

We need to do something if response is NULL.

::: toolkit/components/url-classifier/nsIUrlClassifierHashCompleter.idl:36
(Diff revision 1)
>     * A complete hash has been found that matches the partial hash.
>     * This method may be called 0-n times for a given
>     * nsIUrlClassifierCompleter::complete() call.
>     *
>     * @param hash
>     *        The 128-bit hash that was discovered.

We should probably fix that erroneous comment while we're touching this file. This should be 256 bits.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:127
(Diff revision 1)
>  
>  #define CONFIRM_AGE_PREF        "urlclassifier.max-complete-age"
>  #define CONFIRM_AGE_DEFAULT_SEC (45 * 60)
>  
> +// 30 minutes as the maximum negative cache duration.
> +#define MAXIMUM_NEGATIVE_CACHE_DURATION (30 * 60 * 1000)

Let's add `_MS` to the end of that constant to be clear about the unit.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:943
(Diff revision 1)
> +    LOG(("CacheCompletion hash %X, Addchunk %d", result->completion.ToUint32(),
> +         result->addChunk));
> +
> +    nsresult rv = tuV2->NewAddComplete(result->addChunk, result->completion);
> +    if (NS_FAILED(rv)) {
> +      return rv;

We should probably `LOG()` something here.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1181
(Diff revision 1)
>  
>    return NS_OK;
>  }
>  
>  NS_IMETHODIMP
>  nsUrlClassifierLookupCallback::Completion(const nsACString& completeHash,

Would it be too hard to rename this to `CompletionV2`? It would be nice to be able to ensure that we're making a conscious choice everywhere.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1209
(Diff revision 1)
> +{
> +  LOG(("nsUrlClassifierLookupCallback::CompletionV4 [%p, %s, %d]",
> +       this, PromiseFlatCString(tableName).get(), negativeCacheDuration));
> +
> +  MOZ_ASSERT(StringEndsWith(tableName, NS_LITERAL_CSTRING("-proto")));
> +  MOZ_ASSERT(fullHashes);

This one should be a check, not an assert. We need to do something reasonable if it comes up in the wild.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1218
(Diff revision 1)
>      // Bug 1331534 - We temporarily ignore hash completion result
>      // for v4 tables.
>      return NS_OK;
>    }
>  
> -  mozilla::safebrowsing::Completion hash;
> +  // Negative cache duration should be less than 30 mins.

I would omit this comment since it's obvious from the condition as well as the LOG.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1220
(Diff revision 1)
>      return NS_OK;
>    }
>  
> -  mozilla::safebrowsing::Completion hash;
> -  hash.Assign(completeHash);
> +  // Negative cache duration should be less than 30 mins.
> +  if (negativeCacheDuration > MAXIMUM_NEGATIVE_CACHE_DURATION) {
> +    LOG(("Negative cache duration too large, reset it to 30 seconds"));

That's actually 30 **minutes** not **seconds**. We can avoid repeating the number (and possibly getting it out of sync later) by changing the LOG to:

    "Negative cache duration too large, clamping it down to a reasonable value."

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1226
(Diff revision 1)
> +    negativeCacheDuration = MAXIMUM_NEGATIVE_CACHE_DURATION;
> +  }
> +
> +  auto result = new CacheResultV4;
> +
> +  int64_t now = PR_Now() / PR_USEC_PER_MSEC;

I would suggest `nowMs`.

Also, you're adding it to `uint32_t` variables later, so maybe we should cast it to that here?

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1233
(Diff revision 1)
> +  result->table = tableName;
> +  result->prefix = partialHash;
> +  result->response.negativeCacheExpireTime = now + negativeCacheDuration;
> +
> +  // Fill in positive cache entries.
> +  uint32_t fullHashCount;

I would initialize this to zero, just in case.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1236
(Diff revision 1)
> +
> +  // Fill in positive cache entries.
> +  uint32_t fullHashCount;
> +  nsresult rv = fullHashes->GetLength(&fullHashCount);
> +  if (NS_FAILED(rv)) {
> +    return rv;

Maybe add a `LOG` line here.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:593
(Diff revision 1)
>              "NegativeCacheDuration(" + aNegCacheDuration + ")");
>  
>          let minWaitDuration = aMinWaitDuration;
>  
>          if (aMinWaitDuration > MIN_WAIT_DURATION_MAX_VALUE) {
>            minWaitDuration = MIN_WAIT_DURATION_MAX_VALUE;

We should log a debug message here.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:595
(Diff revision 1)
>          let minWaitDuration = aMinWaitDuration;
>  
>          if (aMinWaitDuration > MIN_WAIT_DURATION_MAX_VALUE) {
>            minWaitDuration = MIN_WAIT_DURATION_MAX_VALUE;
>          } else if (aMinWaitDuration < 0) {
>            minWaitDuration = 0;

ditto

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:641
(Diff revision 1)
>          dataLength > body.length - (newlineIndex + 1)) {
>        throw errorWithStack();
>      }
>  
>      let data = body.substr(newlineIndex + 1, dataLength);
>      for (let i = 0; i < (dataLength / COMPLETE_LENGTH); i++) {

Maybe do a check before that to ensure that `dataLength` is evenly divisible by `COMPLETE_LENGTH` in case not all of the data made it through.

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:679
(Diff revision 1)
> -      request.callback.completionFinished(Cr.NS_OK);
> -    }
> +      req.callback.completionFinished(Cr.NS_OK);
> +    };
> +
> +    // V4 completion handler
> +    let completionV4 = (req) => {
> +      let cacheDurationArray = [];

This looks like an unused variable.
Attachment #8846528 - Flags: review?(francois) → review-
No longer blocks: 1351472
Comment on attachment 8846527 [details]
Bug 1311935 - P1. Make ActiveTables() work for safebrowsing v4.

https://reviewboard.mozilla.org/r/119578/#review127426

::: toolkit/components/url-classifier/Classifier.cpp:905
(Diff revision 1)
>      nsCOMPtr<nsIFile> file = do_QueryInterface(supports);
>  
> +    // If |file| is a directory, recurse to find its entries as well.
> +    bool isDirectory;
> +    if (NS_FAILED(file->IsDirectory(&isDirectory))) {
> +      continue;

The right folder we want to find here will be 'goog4' and I guess it won't have a problem.
If it is any other folder that casuse this API return error(As you said, permission errors for example) then we don't care about it.

So i will prefer not adding warning here, how do you think?
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review127428

::: toolkit/components/url-classifier/Entries.h:358
(Diff revision 1)
> +    return true;
> +  }
> +};
> +
> +// A Fullhash response map keyed by variable-length prefix.
> +typedef nsClassHashtable<nsCStringHashKey, CachedFullHashResponse> FullHashResponseMap;

Good idea, Thanks!

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:943
(Diff revision 1)
> +    LOG(("CacheCompletion hash %X, Addchunk %d", result->completion.ToUint32(),
> +         result->addChunk));
> +
> +    nsresult rv = tuV2->NewAddComplete(result->addChunk, result->completion);
> +    if (NS_FAILED(rv)) {
> +      return rv;

Most TableUpdate->NewXXX return error only when out of memory, there are lots of error check like this around the safebrowsing code. I'll prefer we just return error here.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1181
(Diff revision 1)
>  
>    return NS_OK;
>  }
>  
>  NS_IMETHODIMP
>  nsUrlClassifierLookupCallback::Completion(const nsACString& completeHash,

I'll rename it to CompletionV2

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1209
(Diff revision 1)
> +{
> +  LOG(("nsUrlClassifierLookupCallback::CompletionV4 [%p, %s, %d]",
> +       this, PromiseFlatCString(tableName).get(), negativeCacheDuration));
> +
> +  MOZ_ASSERT(StringEndsWith(tableName, NS_LITERAL_CSTRING("-proto")));
> +  MOZ_ASSERT(fullHashes);

This argument shouldn't be null no matter if there is fullhash or not.
If it is null it means the following line
let matches = Cc["@mozilla.org/array;1"].createInstance(Ci.nsIMutableArray)
in HashCompleter.js fail.
So in this case do you think it should be an assert or a check?

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1226
(Diff revision 1)
> +    negativeCacheDuration = MAXIMUM_NEGATIVE_CACHE_DURATION;
> +  }
> +
> +  auto result = new CacheResultV4;
> +
> +  int64_t now = PR_Now() / PR_USEC_PER_MSEC;

The result stored to map is int64_t, and according to c++ integer promotion rules, uint32_t should be cast to int64_t implicitly, or do you prefer casting explicitly here?

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1236
(Diff revision 1)
> +
> +  // Fill in positive cache entries.
> +  uint32_t fullHashCount;
> +  nsresult rv = fullHashes->GetLength(&fullHashCount);
> +  if (NS_FAILED(rv)) {
> +    return rv;

From the implementation[1] it seems |GetLength| won't return error, I think we can keep the error check here because it return nsresult but not adding LOG here.

[1] https://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsArray.cpp#60

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js:641
(Diff revision 1)
>          dataLength > body.length - (newlineIndex + 1)) {
>        throw errorWithStack();
>      }
>  
>      let data = body.substr(newlineIndex + 1, dataLength);
>      for (let i = 0; i < (dataLength / COMPLETE_LENGTH); i++) {

Check above has already done this :)
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review127404

Thanks for review!

The precision we are using now is seconds[1], I will change everythin to seconds in next patch.
I'll leave minimum duration to MS since it is not related to caching, if you think we should use seconds instead I'll file another bug.
And although Part3 should also be changed accordingly but i'll wait until you finish the review.

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierUtils.cpp#431
Comment on attachment 8847450 [details]
Bug 1311935 - P3. Implement safebrowsing v4 caching logic.

https://reviewboard.mozilla.org/r/120422/#review127824

::: toolkit/components/url-classifier/Classifier.cpp:505
(Diff revision 1)
>  
>          // For V2, there is no TTL for caching, so we use table freshness to
>          // decide if matching a completion should trigger a gethash request or not.
>          // For V4, this is done by Positive Caching & Negative Caching mechanism.
> -        bool confirmed = false;
>          if (fromCache) {

Couldn't we roll the table freshness check into the `Has()` function?

It would mean that we don't have to call a function that does nothing in V4 tables.

::: toolkit/components/url-classifier/Classifier.cpp:1271
(Diff revision 1)
>    if (!lookupCache) {
>      return NS_ERROR_UC_UPDATE_TABLE_NOT_FOUND;
>    }
>  
> +  // Remove cache entries whose negative cache time is expired when update.
> +  // Not check if positive cache time is expired here because we want to

"We don't check if"

::: toolkit/components/url-classifier/LookupCacheV4.h:81
(Diff revision 1)
> +  // Map from prefix to fullHashResponse
> +  // key is prefix
> +  // data is response object contain
> +  // - negative cache expire time
> +  // - array of full hashes and its cache expire time
> +  nsClassHashtable<nsCStringHashKey, CachedFullHashResponse> mCache;

Like I suggested in my previous review, if we added a `typedef nsCStringHashKey HashPrefixKey` then it would be self-documenting and we could remove that info from the comment above the line.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:95
(Diff revision 1)
>    nsDependentCSubstring fullhash;
>    fullhash.Rebind((const char *)aCompletion.buf, COMPLETE_SIZE);
>  
>    nsresult rv = mVLPrefixSet->Matches(fullhash, &length);
>    NS_ENSURE_SUCCESS(rv, rv);
>  

I would suggest adding an assertion here:

    MOZ_ASSERT(length <= COMPLETE_SIZE)

::: toolkit/components/url-classifier/LookupCacheV4.cpp:97
(Diff revision 1)
>  
>    nsresult rv = mVLPrefixSet->Matches(fullhash, &length);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
>    *aHas = length >= PREFIX_SIZE;
>    *aMatchLength = length;

I wonder if an assertion like this would be useful:

    MOZ_ASSERT(aHas || length == 0)

This is to ensure that we don't get a match with a length less than 4 bytes but greater than 0.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:135
(Diff revision 1)
> +  // Check if we can find the fullhash in positive cache
> +  if (fullHashes.Get(completion, &expireTime)) {
> +    if (now <= expireTime) {
> +      // Url is NOT safe.
> +      *aConfirmed = true;
> +      LOG(("Found a valid result in positive cache"));

Suggestion:

    "Found a valid fullhash in the positive cache"

::: toolkit/components/url-classifier/LookupCacheV4.cpp:138
(Diff revision 1)
> +      // Url is NOT safe.
> +      *aConfirmed = true;
> +      LOG(("Found a valid result in positive cache"));
> +    } else {
> +      // From google
> +      //   Individual full hash results can be removed from the prefix's

I think we should change that comment because right now, it only describes the code. It doesn't say why that's true.

Correct me if I'm wrong, but the reason we can remove the fullhash in that case is because:

- the fullhash expiry has passed and
- the negative cache expiry has passed

Therefore, if we remove the fullhash entry, future lookups will come back empty, but because the negative cache has expired, that failed lookup will trigger a gethash request.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:141
(Diff revision 1)
> +    } else {
> +      // From google
> +      //   Individual full hash results can be removed from the prefix's
> +      //   cache entry if they expire AND their expire time is after the negative
> +      //   cache expire time.
> +      LOG(("Found a expired result in positive cache"));

"Found an expired fullhash in the positive cache"

::: toolkit/components/url-classifier/LookupCacheV4.cpp:147
(Diff revision 1)
> +
> +      if (fullHashResponse->negativeCacheExpireTime < expireTime) {
> +        fullHashes.Remove(completion);
> +        if (fullHashes.Count() == 0 &&
> +            fullHashResponse->negativeCacheExpireTime < now) {
> +          mCache.Remove(prefix);

At this point, we've removed both the fullhash and the prefix from the cache. Shouldn't we set `aHash == false`?

::: toolkit/components/url-classifier/LookupCacheV4.cpp:155
(Diff revision 1)
> +    }
> +    return NS_OK;
> +  }
> +
> +  // Check negative cache.
> +  if (now <= fullHashResponse->negativeCacheExpireTime) {

This is a little hard to read. I think that the reverse condition (`negativeCache >= now`) would be easier.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:157
(Diff revision 1)
> +  }
> +
> +  // Check negative cache.
> +  if (now <= fullHashResponse->negativeCacheExpireTime) {
> +    // Url is safe.
> +    LOG(("Found a valid result in negative cache"));

I would suggest:

    "Found a valid prefix in the negative cache"

::: toolkit/components/url-classifier/LookupCacheV4.cpp:160
(Diff revision 1)
> +  if (now <= fullHashResponse->negativeCacheExpireTime) {
> +    // Url is safe.
> +    LOG(("Found a valid result in negative cache"));
> +    *aHas = false;
> +  } else {
> +    LOG(("Found a expired result in negative cache"));

"Found an expired prefix in the negative cache"

If I understand correctly, we're returning `aHas == true` and `aConfirmed == false` in this case.

Shouldn't we return `aHas == false` since we just removed the prefix from the cache and the fullhash is also not in the cache?

::: toolkit/components/url-classifier/LookupCacheV4.cpp:639
(Diff revision 1)
> +#if defined(DEBUG)
> +static
> +void CStringToHexString(const nsACString& aIn, nsACString& aOut)
> +{
> +  static const char* const lut = "0123456789ABCDEF";
> +  // 32 bytes is the longest hash

We could just use `COMPLETE_SIZE` here.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:663
(Diff revision 1)
> +    nsAutoCString strPrefix;
> +    CStringToHexString(iter.Key(), strPrefix);
> +
> +    CachedFullHashResponse* response = iter.Data();
> +    LOG(("Caches prefix: %s, Expire time: %lld",
> +         strPrefix.get(), response->negativeCacheExpireTime));

It would be nice to format these with human-readable datetime.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:669
(Diff revision 1)
> +
> +    FullHashCache& fullHashes = response->fullHashes;
> +    for (auto iter2 = fullHashes.ConstIter(); !iter2.Done(); iter2.Next()) {
> +      nsAutoCString strFullhash;
> +      CStringToHexString(iter2.Key(), strFullhash);
> +      LOG(("  - %s, Expire time: %lld", strFullhash.get(), iter2.Data()));

ditto
Attachment #8847450 - Flags: review?(francois) → review-
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review127404

> This one should be a check, not an assert. We need to do something reasonable if it comes up in the wild.

> This argument shouldn't be null no matter if there is fullhash or not.
> If it is null it means the following line
> let matches = Cc["@mozilla.org/array;1"].createInstance(Ci.nsIMutableArray)
> in HashCompleter.js fail.
> So in this case do you think it should be an assert or a check?

While it's unlikely to be a problem in practice, it's a pretty cheap check, so I would personally err on the side of safety and return an error in this case.

> I would suggest `nowMs`.
> 
> Also, you're adding it to `uint32_t` variables later, so maybe we should cast it to that here?

> The result stored to map is int64_t, and according to c++ integer promotion rules, uint32_t should be cast to int64_t implicitly, or do you prefer casting explicitly here?

I'm sure it will work fine, but since we're only checking `now` against `uint32_t` variables, maybe we should make convert it to an `uint32_t` early and avoid any potential surprises (i.e. `uint32_t now = static_cast<uint32_t>(PR_Now() / PR_USEC_PER_MSEC)`).
Comment on attachment 8846527 [details]
Bug 1311935 - P1. Make ActiveTables() work for safebrowsing v4.

https://reviewboard.mozilla.org/r/119578/#review125038

> Should we output a warning here? I imagine this only fails when there are permission problems reading/accessing the file/directory.

> The right folder we want to find here will be 'goog4' and I guess it won't have a problem.
If it is any other folder that casuse this API return error(As you said, permission errors for example) then we don't care about it.
>
> So i will prefer not adding warning here, how do you think?

Fair enough. We don't really need to read the folders if it's not `google4`. I assume that permission problems on `google4` will already be handled properly.
(In reply to Dimi Lee[:dimi][:dlee] from comment #15)
> And although Part3 should also be changed accordingly but i'll wait until
> you finish the review.

Feel free to updates the first 3 patches if you want to. I only have the tests left to review but I haven't started yet.
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review127404

> > The result stored to map is int64_t, and according to c++ integer promotion rules, uint32_t should be cast to int64_t implicitly, or do you prefer casting explicitly here?
> 
> I'm sure it will work fine, but since we're only checking `now` against `uint32_t` variables, maybe we should make convert it to an `uint32_t` early and avoid any potential surprises (i.e. `uint32_t now = static_cast<uint32_t>(PR_Now() / PR_USEC_PER_MSEC)`).

Sorry i am little bit confused, I'll try to explain my thoughts here.
ExpiryTime is stored in int64_t variables (in Entry.h CachedFullHashResponse & FullHashExpiryCache) and it is calculated by
adding PR_Now and duration(uint32_t).
Take the following line as an example:
result->response.aFullHashes.Put(fullHash, now + duration);

now is int64_t, duration is uint32_t, so add these two variables will implicitly convert duration to int64_t and store
the result to aFullHashes as an int64_t integer, then in Classifier::Check we use this value(int64_t) to compare against PR_Now(int64_t) to see if it is expired.

And also in previous version,the precision of now is milli-second, so (PR_Now()/ PR_USEC_PER_MSEC) will be around 1490,947,000,000 and maximum value of uint32_t is 4294,967,296 so int64 is required to avoid overflow.
In next patch I'll change the precision of now to seconds so (PR_Now() / PR_USEC_PER_SEC) will be around 1490,947,000 so uint32_t will be just enough(until 2106).

Another solution is change PR_Now to PR_IntervalNow and use uint32_t in everycase, the only concern here is it maybe harder if we want to use this value in about:url-classifier to show a human-readable time.

Is that make sense?
Comment on attachment 8847450 [details]
Bug 1311935 - P3. Implement safebrowsing v4 caching logic.

https://reviewboard.mozilla.org/r/120422/#review127972

::: toolkit/components/url-classifier/Classifier.cpp:505
(Diff revision 1)
>  
>          // For V2, there is no TTL for caching, so we use table freshness to
>          // decide if matching a completion should trigger a gethash request or not.
>          // For V4, this is done by Positive Caching & Negative Caching mechanism.
> -        bool confirmed = false;
>          if (fromCache) {

Sure, the reason that i separate this is because i want to keep the Has API more general.

Both way works for me so if you prefer puting freshness into Has() then i'll change it in next patch.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:97
(Diff revision 1)
>  
>    nsresult rv = mVLPrefixSet->Matches(fullhash, &length);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
>    *aHas = length >= PREFIX_SIZE;
>    *aMatchLength = length;

If we want to ensure match length should be less than 4 bytes but greater than 0 then i would prefer putting the MOZ_ASSERT in VariableLengthPrefixSet::Matches because it will be more general.
Do you think it is ok?

::: toolkit/components/url-classifier/LookupCacheV4.cpp:147
(Diff revision 1)
> +
> +      if (fullHashResponse->negativeCacheExpireTime < expireTime) {
> +        fullHashes.Remove(completion);
> +        if (fullHashes.Count() == 0 &&
> +            fullHashResponse->negativeCacheExpireTime < now) {
> +          mCache.Remove(prefix);

No, when we set aHas to false, then the url is safe.
In this case it just indicates that we hit an expired negative cache and we should still send the gethash request, so the result would be aHas(True) aConfirmed(false).

This case would be like we match a prefix in database but find nothing in cache.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:160
(Diff revision 1)
> +  if (now <= fullHashResponse->negativeCacheExpireTime) {
> +    // Url is safe.
> +    LOG(("Found a valid result in negative cache"));
> +    *aHas = false;
> +  } else {
> +    LOG(("Found a expired result in negative cache"));

This should be the same reason as the question above.
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review127404

> Sorry i am little bit confused, I'll try to explain my thoughts here.
> ExpiryTime is stored in int64_t variables (in Entry.h CachedFullHashResponse & FullHashExpiryCache) and it is calculated by
> adding PR_Now and duration(uint32_t).
> Take the following line as an example:
> result->response.aFullHashes.Put(fullHash, now + duration);
> 
> now is int64_t, duration is uint32_t, so add these two variables will implicitly convert duration to int64_t and store
> the result to aFullHashes as an int64_t integer, then in Classifier::Check we use this value(int64_t) to compare against PR_Now(int64_t) to see if it is expired.
> 
> And also in previous version,the precision of now is milli-second, so (PR_Now()/ PR_USEC_PER_MSEC) will be around 1490,947,000,000 and maximum value of uint32_t is 4294,967,296 so int64 is required to avoid overflow.
> In next patch I'll change the precision of now to seconds so (PR_Now() / PR_USEC_PER_SEC) will be around 1490,947,000 so uint32_t will be just enough(until 2106).
> 
> Another solution is change PR_Now to PR_IntervalNow and use uint32_t in everycase, the only concern here is it maybe harder if we want to use this value in about:url-classifier to show a human-readable time.
> 
> Is that make sense?

Thanks for the extra details. I was only looking at this file and so I was missing the overall context of this.

It makes sense to keep it the way it is.
Comment on attachment 8847450 [details]
Bug 1311935 - P3. Implement safebrowsing v4 caching logic.

https://reviewboard.mozilla.org/r/120422/#review127824

> Couldn't we roll the table freshness check into the `Has()` function?
> 
> It would mean that we don't have to call a function that does nothing in V4 tables.

> Sure, the reason that i separate this is because i want to keep the Has API more general.
> 
> Both way works for me so if you prefer puting freshness into Has() then i'll change it in next patch.

Since freshness is an internal detail of V2 and it would force us to fake it in V4, I think I'd rather not see it in the V4 class.

> I wonder if an assertion like this would be useful:
> 
>     MOZ_ASSERT(aHas || length == 0)
> 
> This is to ensure that we don't get a match with a length less than 4 bytes but greater than 0.

> If we want to ensure match length should be less than 4 bytes but greater than 0 then i would prefer putting the MOZ_ASSERT in VariableLengthPrefixSet::Matches because it will be more general.
Do you think it is ok?

Sounds good to me. However, you got the assertion backwards in your comment.

We need to check that the match length is either:

- `== 0`, or
- `>= PREFIX_SIZE`
Comment on attachment 8847451 [details]
Bug 1311935 - P4. GTest for safebrowsing v4 caching.

https://reviewboard.mozilla.org/r/120424/#review128248

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:7
(Diff revision 1)
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "Common.h"
> +
> +#define EXPIRE_TIME   (PR_Now() / 1000 - 3600000)

`EXPIRED_TIME_MS` would be clearer

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:8
(Diff revision 1)
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "Common.h"
> +
> +#define EXPIRE_TIME   (PR_Now() / 1000 - 3600000)
> +#define UNEXPIRE_TIME (PR_Now() / 1000 + 3600000)

`NOTEXPIRED_TIME_MS` would be clearer

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:11
(Diff revision 1)
> +
> +#define EXPIRE_TIME   (PR_Now() / 1000 - 3600000)
> +#define UNEXPIRE_TIME (PR_Now() / 1000 + 3600000)
> +
> +static void
> +SetupcacheEntry(LookupCacheV4* aLookupCache,

nit: `SetupCacheEntry`

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:48
(Diff revision 1)
> +                           GeneratePrefix(_Fragment("small.com/"), 4)
> +                         };
> +
> +    UniquePtr<LookupCacheV4> cache = SetupLookupCacheV4(array);
> +
> +    // Create an expire entry and a non-expired entry

"expired entry"

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:116
(Diff revision 1)
> +  prefix.FromPlaintext(_Fragment("cache.notexpired.com/"), cryptoHash);
> +
> +  Completion fullhash;
> +  fullhash.FromPlaintext(_Fragment("firefox.com/"), cryptoHash);
> +
> +  // Substitute prefix with one will hit the cache.

Clever!

To make what you are doing clearer, I would suggest this comment:

    // Overwrite the 4-byte prefix of `fullhash` so that it conflicts with `prefix`.

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:117
(Diff revision 1)
> +
> +  Completion fullhash;
> +  fullhash.FromPlaintext(_Fragment("firefox.com/"), cryptoHash);
> +
> +  // Substitute prefix with one will hit the cache.
> +  memcpy(fullhash.buf, prefix.buf, 16);

Is 16 in bytes?

You could just copy 4 bytes, right?

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:135
(Diff revision 1)
> +  prefix.FromPlaintext(_Fragment("cache.expired.com/"), cryptoHash);
> +
> +  Completion fullhash;
> +  fullhash.FromPlaintext(_Fragment("firefox.com/"), cryptoHash);
> +
> +  // Substitute prefix with one will hit the cache.

ditto

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:144
(Diff revision 1)
> +}
> +
> +#define CACHED_URL                    _Fragment("cache.com/")
> +#define NEG_CACHE_EXPIRED_URL         _Fragment("cache.negExpired.com/")
> +#define POS_CACHE_EXPIRED_URL         _Fragment("cache.posExpired.com/")
> +#define NEG_AND_POS_CACHE_EXPIRED_URL _Fragment("cache.negAndposExpired.com/")

nit: You could also simply use `BOTH_CACHE_EXPIRED_URL`.

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:148
(Diff revision 1)
> +#define POS_CACHE_EXPIRED_URL         _Fragment("cache.posExpired.com/")
> +#define NEG_AND_POS_CACHE_EXPIRED_URL _Fragment("cache.negAndposExpired.com/")
> +
> +// This testcase create 4 cache entries.
> +// 1. unexpired entry.
> +// 2. an entry whose negative cache time is expired

I would add "but whose positive cache is not expired" for clarity.
Attachment #8847451 - Flags: review?(francois)
Comment on attachment 8847451 [details]
Bug 1311935 - P4. GTest for safebrowsing v4 caching.

https://reviewboard.mozilla.org/r/120424/#review129152

::: toolkit/components/url-classifier/tests/gtest/TestCachingV4.cpp:117
(Diff revision 1)
> +
> +  Completion fullhash;
> +  fullhash.FromPlaintext(_Fragment("firefox.com/"), cryptoHash);
> +
> +  // Substitute prefix with one will hit the cache.
> +  memcpy(fullhash.buf, prefix.buf, 16);

Since i added url "cache.notexpired.com/" to database as a 10 bytes prefix(In TestCache function), so we should copy at least 10 bytes to match prefix in database, i pciked "16" as a magic number here.
But i think this is really confusing, i will change it to 10 and add comment to explain this.
Comment on attachment 8847450 [details]
Bug 1311935 - P3. Implement safebrowsing v4 caching logic.

https://reviewboard.mozilla.org/r/120422/#review127824

> I think we should change that comment because right now, it only describes the code. It doesn't say why that's true.
> 
> Correct me if I'm wrong, but the reason we can remove the fullhash in that case is because:
> 
> - the fullhash expiry has passed and
> - the negative cache expiry has passed
> 
> Therefore, if we remove the fullhash entry, future lookups will come back empty, but because the negative cache has expired, that failed lookup will trigger a gethash request.

Yes, you are right. I will update the comment
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review129716

::: toolkit/components/url-classifier/Entries.h:324
(Diff revision 2)
>  typedef nsDataHashtable<nsCStringHashKey, int64_t> TableFreshnessMap;
>  
> +typedef nsCStringHashKey VLHashPrefixString;
> +typedef nsCStringHashKey FullHashString;
> +
> +typedef nsDataHashtable<VLHashPrefixString, int64_t> FullHashExpiryCache;

This should be
typedef nsDataHashtable<FullHashString, int64_t> FullHashExpiryCache;

I'll fix it in next patch
Comment on attachment 8847450 [details]
Bug 1311935 - P3. Implement safebrowsing v4 caching logic.

https://reviewboard.mozilla.org/r/120422/#review129724

::: toolkit/components/url-classifier/LookupCacheV4.cpp:620
(Diff revision 2)
> +// expired but still being removed after calling this API. Right now we call
> +// this on every update.
> +void
> +LookupCacheV4::InvalidateExpiredCacheEntry()
> +{
> +  int64_t now = PR_Now() / PR_USEC_PER_MSEC;

Will fix this one to
int64_t nowSec = PR_Now() / PR_USEC_PER_SEC;
Comment on attachment 8846528 [details]
Bug 1311935 - P2. Process fullHashes.find response.

https://reviewboard.mozilla.org/r/119580/#review130060
Attachment #8846528 - Flags: review?(francois) → review+
Comment on attachment 8847450 [details]
Bug 1311935 - P3. Implement safebrowsing v4 caching logic.

https://reviewboard.mozilla.org/r/120422/#review130062

::: commit-message-981c6:5
(Diff revisions 1 - 2)
>  Bug 1311935 - P3. Implement safebrowsing v4 caching logic. r?francois
>  
> -MozReview-Commit-ID: LmaNVmS8v7v
> +LookupCacheV4::Has implements safebrowsing v4 caching logic.
> +1. Check if fullhash match any prefix in local database:
> +  - If no, the URL is safe.

"If not"

::: commit-message-981c6:8
(Diff revisions 1 - 2)
> -MozReview-Commit-ID: LmaNVmS8v7v
> +LookupCacheV4::Has implements safebrowsing v4 caching logic.
> +1. Check if fullhash match any prefix in local database:
> +  - If no, the URL is safe.
> +2. Check if prefix is in the cache(prefix is always the first 4-byte of
> +   the fullhash, Bug 1323953):
> +  - If no, send fullhash request

ditto

::: toolkit/components/url-classifier/LookupCache.cpp:420
(Diff revisions 1 - 2)
> -void
> -LookupCacheV2::CheckTableFreshness(const TableFreshnessMap& aTableFreshness,
> -                                   uint32_t aFreshnessGuarantee,
> -                                   bool* aConfirmed)
> -{
> -  int64_t age; // in seconds
> +    int64_t age; // in seconds

nit: ageSec

::: toolkit/components/url-classifier/LookupCache.cpp:422
(Diff revisions 1 - 2)
> -  int64_t age; // in seconds
> +    int64_t age; // in seconds
> -  bool found = aTableFreshness.Get(mTableName, &age);
> +    if (aTableFreshness.Get(mTableName, &age)) {
> -  if (!found) {
> -    *aConfirmed = false;
> -  } else {
> -    int64_t now = (PR_Now() / PR_USEC_PER_SEC);
> +      int64_t now = (PR_Now() / PR_USEC_PER_SEC);

nit: nowSec

::: toolkit/components/url-classifier/LookupCacheV4.cpp:98
(Diff revisions 1 - 2)
>    fullhash.Rebind((const char *)aCompletion.buf, COMPLETE_SIZE);
>  
>    nsresult rv = mVLPrefixSet->Matches(fullhash, &length);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  MOZ_ASSERT(length ==0 || (length >= PREFIX_SIZE && length <= COMPLETE_SIZE));

nit: missing space between `==` and `0`

::: toolkit/components/url-classifier/LookupCacheV4.cpp:144
(Diff revisions 1 - 2)
> -      //   Individual full hash results can be removed from the prefix's
> +      LOG(("Found an expired fullhash in the positive cache"));
> -      //   cache entry if they expire AND their expire time is after the negative
> -      //   cache expire time.
> -      LOG(("Found a expired result in positive cache"));
>  
> -      if (fullHashResponse->negativeCacheExpireTime < expireTime) {
> +      // Before return, we should also check cache eviction rule:

"Before returning"

::: toolkit/components/url-classifier/LookupCacheV4.cpp:145
(Diff revisions 1 - 2)
> -      //   cache entry if they expire AND their expire time is after the negative
> -      //   cache expire time.
> -      LOG(("Found a expired result in positive cache"));
>  
> -      if (fullHashResponse->negativeCacheExpireTime < expireTime) {
> +      // Before return, we should also check cache eviction rule:
> +      // Remove fullhash entry from cache when negative cache is also expired

I would suggest clarifying this comment in the following way:

"Remove fullhash entry from the cache when the negative cache is also expired because whether or not the fullhash is cached locally, we will need to consult the server next time we lookup this hash. We may as well remove it from our cache."
Attachment #8847450 - Flags: review?(francois) → review+
Comment on attachment 8847451 [details]
Bug 1311935 - P4. GTest for safebrowsing v4 caching.

https://reviewboard.mozilla.org/r/120424/#review130068
Attachment #8847451 - Flags: review?(francois) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2f185f0d9e26
P1. Make ActiveTables() work for safebrowsing v4. r=francois
https://hg.mozilla.org/integration/autoland/rev/fc8099c8f98a
P2. Process fullHashes.find response. r=francois
https://hg.mozilla.org/integration/autoland/rev/a4e571cb610a
P3. Implement safebrowsing v4 caching logic. r=francois
https://hg.mozilla.org/integration/autoland/rev/18a286ccf1be
P4. GTest for safebrowsing v4 caching. r=francois
Keywords: checkin-needed
Backed out for failing browser_trackingUI_6.js:

https://hg.mozilla.org/integration/autoland/rev/cbce5d365fee136c060ea94aa7b1feab0f1d421e
https://hg.mozilla.org/integration/autoland/rev/0e7b6014b53dbb8512e4370e21d9666ddd8e3df7
https://hg.mozilla.org/integration/autoland/rev/f14b24e854d02b6555595f9fa60462028ca69bde
https://hg.mozilla.org/integration/autoland/rev/778615a3eb63ef1bb142f4b9a4b458ba55fe157e

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=18a286ccf1be043be8ce76bbc4ee1845baeacd4b&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=89538339&repo=autoland

[task 2017-04-07T14:15:14.836207Z] 14:15:14     INFO - GECKO(1293) | [Parent 1293] WARNING: Could not find prefix in PrefixSet during noise lookup: file /home/worker/workspace/build/src/toolkit/components/url-classifier/Classifier.cpp, line 1528
[task 2017-04-07T14:15:14.840070Z] 14:15:14     INFO - GECKO(1293) | [Parent 1293] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file /home/worker/workspace/build/src/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp, line 355
[task 2017-04-07T14:15:14.841863Z] 14:15:14     INFO - GECKO(1293) | [Parent 1293] WARNING: Partial match in a table without a valid completer, ignoring partial match.: file /home/worker/workspace/build/src/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp, line 1152
[task 2017-04-07T14:15:15.403001Z] 14:15:15     INFO - TEST-INFO | started process screentopng
[task 2017-04-07T14:15:16.963598Z] 14:15:16     INFO - TEST-INFO | screentopng: exit 0
[task 2017-04-07T14:15:16.970219Z] 14:15:16     INFO - Buffered messages logged at 14:15:11
[task 2017-04-07T14:15:16.972438Z] 14:15:16     INFO - Entering test bound test_fetch
[task 2017-04-07T14:15:16.974593Z] 14:15:16     INFO - Buffered messages logged at 14:15:13
[task 2017-04-07T14:15:16.977670Z] 14:15:16     INFO - Console message: [JavaScript Error: "TypeError: tpButton is null" {file: "chrome://browser/content/aboutPrivateBrowsing.js" line: 31}]
[task 2017-04-07T14:15:16.979558Z] 14:15:16     INFO - Buffered messages finished
[task 2017-04-07T14:15:16.983438Z] 14:15:16     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_trackingUI_6.js | should have denied the request - false == true - 
[task 2017-04-07T14:15:16.985177Z] 14:15:16     INFO - Stack trace:
[task 2017-04-07T14:15:16.987064Z] 14:15:16     INFO - resource://testing-common/content-task.js line 52 > eval:null:5
Flags: needinfo?(dlee)
update patch to fix try error, the root cause is because I accidentally remove |*matchLength = COMPLETE_SIZE | in LookupCache2::Has
Flags: needinfo?(dlee)
Keywords: checkin-needed
Keywords: checkin-needed
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/a5a6c0f79733
P1. Make ActiveTables() work for safebrowsing v4. r=francois
https://hg.mozilla.org/integration/autoland/rev/73587838ef16
P2. Process fullHashes.find response. r=francois
https://hg.mozilla.org/integration/autoland/rev/4c0381ab0990
P3. Implement safebrowsing v4 caching logic. r=francois
https://hg.mozilla.org/integration/autoland/rev/27e624cd9479
P4. GTest for safebrowsing v4 caching. r=francois
Keywords: checkin-needed
backed out for causing probably assertion crash
https://hg.mozilla.org/integration/autoland/rev/992b8a6ae635
Flags: needinfo?(dlee)
(In reply to Iris Hsiao [:ihsiao] from comment #54)
> backed out for causing probably assertion crash
> https://hg.mozilla.org/integration/autoland/rev/992b8a6ae635

Thank for your help, Iris!
I found a issue when caches contain both v2 and v4 data so ask iris to backout this patch.
Flags: needinfo?(dlee)
Update patch Part2 to fix this issue(Diff6 - Diff8)
Keywords: checkin-needed
Pushed by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/905fdf381f58
P1. Make ActiveTables() work for safebrowsing v4. r=francois
https://hg.mozilla.org/integration/autoland/rev/7e77f122340b
P2. Process fullHashes.find response. r=francois
https://hg.mozilla.org/integration/autoland/rev/a55227734d51
P3. Implement safebrowsing v4 caching logic. r=francois
https://hg.mozilla.org/integration/autoland/rev/efe676688e29
P4. GTest for safebrowsing v4 caching. r=francois
Keywords: checkin-needed
Depends on: 1357207
Depends on: 1359299
Blocks: 1359299
No longer depends on: 1359299
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: