Closed Bug 1298257 Opened 3 years ago Closed 3 years ago

Implement url matching for variable-length prefix set

Categories

(Toolkit :: Safe Browsing, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: dimi, Assigned: tnguyen)

References

Details

(Whiteboard: #sbv4-m3)

Attachments

(1 file, 1 obsolete file)

Url matching for varable-length prefix set is required for V4.
Currently we only support check fix-sized prefix set. (See LookupCache::Has)
Whiteboard: #sbv4-m2
Priority: -- → P2
It would be nice to land this first when we get to Milestone 2. That way we can start matching on the prefixes even before we have support for gethash.
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Whiteboard: #sbv4-m2 → #sbv4-m3
Attached patch WIP Patch (obsolete) — Splinter Review
Hi Francois,
Just discussed with Henry and Thomas, we are wondering if record telemetry every time when matching prefix cause overhead.

Or we should only record it when v2 and v4 is not sync , but by doing so we will lose the fail rate data.
Do you have any suggestion ? Thanks
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #3)
> Just discussed with Henry and Thomas, we are wondering if record telemetry
> every time when matching prefix cause overhead.

I don't believe that's a problem. Telemetry pings are not sent synchronously and might even be sampled.

> Or we should only record it when v2 and v4 is not sync , but by doing so we
> will lose the fail rate data.
> Do you have any suggestion ? Thanks

You're right, it wouldn't be as useful if we only kept track of the failures. I think we can record both.
Flags: needinfo?(francois)
(In reply to François Marier [:francois] from comment #4)
> (In reply to Dimi Lee[:dimi][:dlee] from comment #3)
> > Just discussed with Henry and Thomas, we are wondering if record telemetry
> > every time when matching prefix cause overhead.
> 
> I don't believe that's a problem. Telemetry pings are not sent synchronously
> and might even be sampled.
> 

Thanks for the information!
Assignee: dlee → tnguyen
I've added a patch that implements prefix matching in V4. The patch included matching prefix telemetry accumulating and added a testing telemetry xpcom component to count the telemetry's histogram. The xpcom component will be registered only during running gtest (XRE_AddManifestLocation). 
 We will add more telemetries in this milestone and may need a way to observe them. I don't know if that's a good idea to test telemetry in gtest. Currently nsIObserver is used in the telemetry xpcom to observe the result temporarily.
Attachment #8801956 - Attachment is obsolete: true
Attachment #8803273 - Flags: review?(dlee)
Comment on attachment 8803273 [details]
Bug 1298257 - Implement url matching for variable-length prefix set.

https://reviewboard.mozilla.org/r/87582/#review86850
Comment on attachment 8803273 [details]
Bug 1298257 - Implement url matching for variable-length prefix set.

https://reviewboard.mozilla.org/r/87582/#review86852

Thomas thanks for working on this, the testcases have good coverage, great job!
I would like to review again after addressing the comments
Comment on attachment 8803273 [details]
Bug 1298257 - Implement url matching for variable-length prefix set.

https://reviewboard.mozilla.org/r/87584/#review86834

::: toolkit/components/url-classifier/Classifier.h:149
(Diff revision 2)
> +  // This is used to record the matching statistics for v2 and v4.
> +  enum PREFIX_MATCH {
> +    NO_MATCH = 0x00,
> +    MATCH_V2_PREFIX = 0x01,
> +    MATCH_V4_PREFIX = 0x02,
> +    MATCH_BOTH = 0x03,

We should add a comment about this value is the result from "MATCH_V2_PREFIX  | MATCH_V4_PREFIX"

::: toolkit/components/url-classifier/Classifier.cpp:487
(Diff revision 2)
>  
>      for (uint32_t i = 0; i < cacheArray.Length(); i++) {
>        LookupCache *cache = cacheArray[i];
>        bool has, complete;
> +
> +      if (LookupCache::Cast<LookupCacheV4>(cache)) {

Could you help fire a bug and add TODO about in the future we should use |Match| API here because |Has| API doesn't return the length we match the fullhash, which will be require if we want to support gethash in M4.

::: toolkit/components/url-classifier/Classifier.cpp:489
(Diff revision 2)
>        LookupCache *cache = cacheArray[i];
>        bool has, complete;
> +
> +      if (LookupCache::Cast<LookupCacheV4>(cache)) {
> +        rv = cache->Has(lookupHash, &has, &complete);
> +        NS_ENSURE_SUCCESS(rv, rv);

We don't want to V4 code affect V2 code, so i think we should not return and just print an error message here

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:30
(Diff revision 2)
> +typedef nsTArray<nsCString> _FragmentArray;
> +typedef nsTArray<nsCString> _PrefixArray;
> +
> +// Generate a hash prefix from string
> +static const nsCString
> +GeneratePrefix(_Fragment aFragment, uint8_t aLength = PREFIX_SIZE)

aFragment should be reference ?

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:37
(Diff revision 2)
> +  Completion complete;
> +  nsCOMPtr<nsICryptoHash> cryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID);
> +  complete.FromPlaintext(aFragment, cryptoHash);
> +
> +  nsDependentCString hash;
> +  hash.Assign((const char *)complete.buf, aLength);

If you use Assign then you don't need to use nsDependentCString.

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:46
(Diff revision 2)
> +// Register telemetry XPCOM components
> +static void RegisterTelemetry()
> +{
> +  nsCOMPtr<nsIFile> manifest;
> +  nsresult rv = NS_GetSpecialDirectory(NS_GRE_DIR,
> +                                getter_AddRefs(manifest));

put in single line or align the argument

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:60
(Diff revision 2)
> +static UniquePtr<Classifier>
> +GetClassifier()
> +{
> +  nsCOMPtr<nsIFile> file;
> +  NS_GetSpecialDirectory(NS_APP_USER_PROFILE_50_DIR,
> +      getter_AddRefs(file));

ditto

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:77
(Diff revision 2)
> +  nsCOMPtr<nsIRunnable> r = NS_NewRunnableFunction(mozilla::Forward<Function>(aFunction));
> +  nsCOMPtr<nsIThread> testingThread;
> +  nsresult rv = NS_NewThread(getter_AddRefs(testingThread), r);
> +  ASSERT_TRUE(rv == NS_OK);
> +  testingThread->Shutdown();
> +}

after Bug 1305780 land, you can remove this because this will be included in Common.h

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:104
(Diff revision 2)
> +  for (uint32_t i = 0; i < prefixArray.Length(); i++) {
> +    const nsCString& prefix = prefixArray[i];
> +    nsCString* prefixString = out.LookupOrAdd(prefix.Length());
> +    prefixString->Append(prefix.BeginReading(), prefix.Length());
> +  }
> +}

It will be good if you can rebase on Bug 1305780 (sorry it is not yet landed :( ) and put this function to common file.

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:114
(Diff revision 2)
> +{
> +  out.Clear();
> +
> +  nsresult rv;
> +  nsCOMPtr<nsICryptoHash> cryptoHash = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &rv);
> +  if (NS_FAILED(rv)) return rv;

add brace after if(all cases in this patch)

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:150
(Diff revision 2)
> +
> +  EntrySort(prefixes);
> +  rv = lookupCache->Build(prefixes, completions);
> +  if (NS_FAILED(rv)) return rv;
> +
> +  prefixes.Clear();

if |prefixes| is local variable do we need to clear it here ?

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:171
(Diff revision 2)
> +  PrefixArrayToPrefixStringMap(prefixArray, map);
> +
> +  nsresult rv = lookupCache->Build(map);
> +  if (NS_FAILED(rv)) return rv;
> +
> +  map.Clear();

same question as prefixes above

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:222
(Diff revision 2)
> +  _PrefixArray arrayV4 = { GeneratePrefix(_Fragment("bravo.com/"), 6),
> +                           GeneratePrefix(_Fragment("gound.com/"), 5)
> +                         };
> +  _FragmentArray arrayV2 = { _Fragment("foo.com/"),
> +                             _Fragment("helloworld.com/")
> +                         };

align

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:236
(Diff revision 2)
> +
> +    LookupResultArray results;
> +    rv = classifier->Check(_Fragment("browsing.com/"),
> +                                NS_LITERAL_CSTRING("gtest-malware-simple,gtest-malware-proto"),
> +                                0,
> +                                results);

align argument

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:244
(Diff revision 2)
> +    CheckTelemetry(NS_LITERAL_CSTRING("URLCLASSIFIER_PREFIX_MATCH"),
> +                   u"NO_MATCH");
> +
> +    results.Clear();
> +    classifier->Reset();
> +  });

Those tests are almost doing the same thing with only check parameter and expect result are different, maybe we can merge them together and make the testcases easier to read.

::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTelemetry.js:25
(Diff revision 2)
> +    matchTable["NO_MATCH"] = 0;
> +    matchTable["MATCH_V2_PREFIX"] = 1;
> +    matchTable["MATCH_V4_PREFIX"] = 2;
> +    matchTable["MATCH_BOTH"] = 3;
> +
> +    if (snapshot.counts[matchTable[data]] != 1) {

We should also check all other cases are not accumulated.
Attachment #8803273 - Flags: review?(dlee) → review-
Comment on attachment 8803273 [details]
Bug 1298257 - Implement url matching for variable-length prefix set.

https://reviewboard.mozilla.org/r/87584/#review86942

Looks good to me, thanks!

::: toolkit/components/url-classifier/LookupCacheV4.cpp:99
(Diff revision 3)
> +  *aComplete = length == COMPLETE_SIZE;
> +
> +  if (LOG_ENABLED()) {
> +    uint32_t prefix = aCompletion.ToUint32();
> +    LOG(("Probe in %s: %X, found %d, complete %d", mTableName.get(),
> +          prefix, *aHas, *aComplete));

Maybe add something to specify this is probe in v4,because this looks similar as log in LookupCacheV2:Has

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:82
(Diff revision 3)
> +  }));
> +}
> +
> +static nsresult
> +FragmentArrayToAddPrefixArray(const _FragmentArray& fragmentArray,
> +                             AddPrefixArray& out)

Nit. Align

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:131
(Diff revision 3)
> +
> +  EntrySort(prefixes);
> +  rv = lookupCache->Build(prefixes, completions);
> +  if (NS_FAILED(rv)) {
> +    return rv;
> +  }

We can ignore this and just return rv in the end

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:153
(Diff revision 3)
> +  PrefixArrayToPrefixStringMap(prefixArray, map);
> +
> +  nsresult rv = lookupCache->Build(map);
> +  if (NS_FAILED(rv)) {
> +    return rv;
> +  }

Ditto.

::: toolkit/components/url-classifier/tests/gtest/TestUrlClassifierTelemetry.js:30
(Diff revision 3)
> +    if (snapshot.counts[matchTable[data]] != 1) {
> +      throw Error("TEST-UNEXPECTED-FAIL: telemetry " + data + " is not accumulated");
> +    }
> +
> +    for (let idx in matchTable) {
> +      if (idx != data && snapshot.counts[matchTable[idx]] == 1) {

> 0 would be better ?
Attachment #8803273 - Flags: review?(dlee) → review+
Attachment #8803273 - Flags: review?(gpascutto)
Comment on attachment 8803273 [details]
Bug 1298257 - Implement url matching for variable-length prefix set.

https://reviewboard.mozilla.org/r/87584/#review87088

A Telemetry peer should probably look at the telemetry parts of this.

It's normally OK for a unit test to be aware of the internals of the thing it's testing, but I do wonder if the way this is currently written isn't going too far. It wants to add a few known entries to the V2/V4 table and then probe the table to see if it gets hits (and where). I think this is achievable by using the current public Classifier() interface much more than this code, which is replacing it and messing with LookupCache directly. It's not really "TestClassifier" then, IMHO. If there's some changes to how Classifier works, that aren't necessarily publicly visible, you'll still have to go in and adapt all these tests to match. That reduces their value a bit, IMHO.

::: toolkit/components/url-classifier/Classifier.h:152
(Diff revision 4)
> +  // MATCH_BOTH value is the result from MATCH_V2_PREFIX | MATCH_V4_PREFIX
> +  enum PREFIX_MATCH {
> +    NO_MATCH = 0x00,
> +    MATCH_V2_PREFIX = 0x01,
> +    MATCH_V4_PREFIX = 0x02,
> +    MATCH_BOTH = 0x03,

MATCH_BOTH = MATCH_V2_PREFIX | MATCH_V4_PREFIX

::: toolkit/components/url-classifier/Classifier.cpp:508
(Diff revision 4)
>      } else {
>        return NS_ERROR_FAILURE;
>      }
>    }
>  
> +  uint8_t matchingStatistics = NO_MATCH;

Make it of type enum PREFIX_MATCH? (maybe make the type name not all caps like regular types)

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:110
(Diff revision 4)
> +
> +  return NS_OK;
> +}
> +
> +static nsresult
> +SetupLookupCacheV2(Classifier* classifier,

Is it hard to not poke into LookupCache but construct an update with the data you want to add instead, and call classifier->ApplyTableUpdates()?

GetLookupCache(), FragmentArrayToAddPrefixArray(), EntrySort() all end up being needed because this is poking into internals.

::: toolkit/components/url-classifier/tests/gtest/TestClassifier.cpp:205
(Diff revision 4)
> +TEST(Classifier, CheckNoMatch)
> +{
> +  TestCheckMatchPrefix(_Fragment("nomatchboth.com/"), u"NO_MATCH");
> +}
> +
> +TEST(Classifier, CheckV4MatchComplete)

This is supposed to test Classifier, but it's poking so far into the internals that it's really testing LookupCache.

It would be IMHO much more preferable if this could be made to use Classifier::Check.
Attachment #8803273 - Flags: review?(gpascutto)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #15)
> A Telemetry peer should probably look at the telemetry parts of this.

The telemetry looks good to me, but I think we can remove the tests for it.

It's a lot of code to test this and it's easier to review the telemetry bits and make sure they're correct than to review the test and make sure it's testing the right thing. I don't think Telemetry probes is something we usually have unit tests for.

Of course we should keep the rest of the tests contained in this patch, just not the JS file that adds itself as an observer on the telemetry service.
Depends on: 1312339
Removing telemetry test, I think only LookupCache need to be tested, for V4 (V2 matching is tested in xpcshell and mochitest)
Comment on attachment 8803273 [details]
Bug 1298257 - Implement url matching for variable-length prefix set.

https://reviewboard.mozilla.org/r/87584/#review89746

::: toolkit/components/url-classifier/LookupCacheV4.cpp:252
(Diff revisions 4 - 5)
>            numOldPrefixPicked == removalArray[removalIndex]) {
>          removalIndex++;
>        } else {
>          AppendPrefixToMap(aOutputMap, smallestOldPrefix);
> +
> +        crypto->Update(reinterpret_cast<uint8_t*>(const_cast<char*>(

I'd pull this into a function whose name makes it clear we're keeping track of the checksum.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:252
(Diff revisions 4 - 5)
>            numOldPrefixPicked == removalArray[removalIndex]) {
>          removalIndex++;
>        } else {
>          AppendPrefixToMap(aOutputMap, smallestOldPrefix);
> +
> +        crypto->Update(reinterpret_cast<uint8_t*>(const_cast<char*>(

I'd pull this into a function whose name makes it clear we're keeping track of the checksum. I've seen this code repeated at least 4 times.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:316
(Diff revisions 4 - 5)
> +  if (NS_WARN_IF(NS_FAILED(rv))) {
> +    return rv;
> +  }
> +
> +  rv = aCrypto->Init(nsICryptoHash::SHA256);
> +  Unused << NS_WARN_IF(NS_FAILED(rv));

NS_WARN_IF is meant for cases where "rv" would have side effects - it obviously doesn't here. I think this could just be NS_WARNING_ASSERTION().

On the other hand I can see that this matches up with the check a few lines above so maybe you can leave it.
Attachment #8803273 - Flags: review?(gpascutto) → review+
Try looks good in comment 22
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0797f7f58f22
Implement url matching for variable-length prefix set. r=dimi,gcp
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/0797f7f58f22
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Good job!  Thanks, Thomas, gcp, Francois and Dimi.
You need to log in before you can comment on or make changes to this bug.