Closed Bug 1311931 Opened 8 years ago Closed 7 years ago

Add telemetry to measure full match rate for v2 and v4

Categories

(Toolkit :: Safe Browsing, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: dimi, Assigned: dimi)

References

Details

(Whiteboard: #sbv4-m5)

Attachments

(1 file, 1 obsolete file)

We need to telemetry completion match rate for v2 and v4 to check following:

Rate of complete hash matches on V2 should be similar to V4
- If a prefix matches in V2 and the full hash matches, then it should match in V4 also.
- If a prefix matches in V2 but the full hash doesn't, then it may not match in V4.
- If a prefix matches only in V4 that could mean that:
*our V2 code is broken and misses some legitimate URLs
*our V4 code is broken and returns matches when it shouldn't (if the full URL matches, then it's not this case)
* the V4 list is better (more complete) than the V2 list
Priority: -- → P2
Whiteboard: #sbv4-m4 → #sbv4-m5
Assignee: nobody → hchang
Status: NEW → ASSIGNED
Assignee: hchang → dlee
I suggest the name URLCLASSIFIER_FULL_MATCH for this probe (because it's similar to URLCLASSIFIER_PREFIX_MATCH from bug 1298257).
Attachment #8835240 - Flags: review?(francois)
Attachment #8835240 - Flags: review?(francois)
See Also: → 1338033
Summary: Add telemetry to measure completion match rate for v2 and v4 → Add telemetry to measure full match rate for v2 and v4
Comment on attachment 8835240 [details]
Bug 1311931 - Add telemetry to measure full match rate for v2 and v4.

https://reviewboard.mozilla.org/r/110934/#review112228

I think we want to merge in the results from `URLCLASSIFIER_PREFIX_MATCH` too so that we can check all of the possible cases in a single probe.

::: toolkit/components/telemetry/Histograms.json:4120
(Diff revision 1)
>      "kind": "enumerated",
>      "n_values": 4,
>      "bug_numbers": [1298257],
>      "description": "Classifier prefix matching result (0 = no match, 1 = match only V2, 2 = match only V4, 3 = match both V2 and V4)"
>    },
> +  "URLCLASSIFIER_COMPLETION_MATCH": {

I suggest `URLCLASSIFIER_FULL_MATCH` instead.

::: toolkit/components/telemetry/Histograms.json:4124
(Diff revision 1)
>    },
> +  "URLCLASSIFIER_COMPLETION_MATCH": {
> +    "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> +    "expires_in_version": "60",
> +    "kind": "enumerated",
> +    "n_values": 4,

Let's make this 10 to allow for future expansion.

::: toolkit/components/telemetry/Histograms.json:4126
(Diff revision 1)
> +    "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> +    "expires_in_version": "60",
> +    "kind": "enumerated",
> +    "n_values": 4,
> +    "bug_numbers": [1311931],
> +    "description": "Classifier completion matching result (0 = no match, 1 = match only V2, 2 = match only V4, 3 = match both V2 and V4)"

"The result of each URL lookup in the V2 and V4 versions of a given Safe Browsing table (0 = ...)."

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1168
(Diff revision 1)
>      if (tables.IndexOf(result.mTableName) == nsTArray<nsCString>::NoIndex) {
>        tables.AppendElement(result.mTableName);
>      }
>    }
>  
> +  if (isLookupV2 && isLookupV4) {

Good. We want to ensure that we only send the probe when both V2 and V4 results are available, just like we did in bug 1336909.
Attachment #8835240 - Flags: review?(francois) → review-
Attached patch WIP Patch (obsolete) — Splinter Review
Hi francois,
While working on this bug i also found an issue when we implemented prefix match telemetry. That is we didn't ignore match which was found in tracking protection tables, in that case we can only find match in v2 table but it is correct.
So in this patch I fixed it by ignoring match coming from "-track-" tables.
Attachment #8835818 - Attachment is obsolete: true
Comment on attachment 8835240 [details]
Bug 1311931 - Add telemetry to measure full match rate for v2 and v4.

https://reviewboard.mozilla.org/r/110934/#review115180

Let's restrict this to Google lists before we land it.

::: toolkit/components/telemetry/Histograms.json:4126
(Diff revision 3)
> +    "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> +    "expires_in_version": "60",
> +    "kind": "enumerated",
> +    "n_values": 16,
> +    "bug_numbers": [1311931],
> +    "description": "The result of each URL lookup (0 = no match, 1 = match only V2 prefix, 2 = match only V4 prefix, 3 = match both V2 and V4 prefixes, 4 = match both v2 and v4 completions, 5 = match only v2 completion, 6 = match only v4 completion, 7 = match v2 completion and v4 prefix, 8 = match v2 prefix and v4 completion, 9 = unexpected match result)"

We should specify that we're only looking at google tables:

    The result of each URL lookup against both google and google4 lists

and (nit) we could make this shorter (and use consistent lowercase/uppercase):

    (0 = no match, 1 = V2 prefix only, 2 = V4 prefix only, 3 = V2 and V4 prefixes, 4 = V2 and V4 completions, 5 = V2 completion only, 6 = V4 completion only, 7 = V2 completion and V4 prefix, 8 =  V2 prefix and V4 completion, 9 = unexpected result)

::: toolkit/components/url-classifier/Classifier.cpp:516
(Diff revision 3)
>  
> -        if (LookupCache::Cast<LookupCacheV4>(cache)) {
> -          matchingStatistics |= PrefixMatch::eMatchV4Prefix;
> -          result->mProtocolV2 = false;
> -        } else {
> +        // There are two cases we are going to ignore the result for telemetry:
> +        // 1. Not both v2 & v4 tables have data(shouldDoTelemetry will be false)
> +        // 2. When match was found in tracking protection table(in this case url
> +        //    can only be found in v2 table).
> +        if (!shouldDoTelemetry ||

We should actually ignore more lists than that. We have flashblocking lists too and Mozilla China has some Baidu lists too.

I would suggest instead that we ensure that the table name starts with `"goog"` (which will also match `"googpub"`).

Of course, you'll want to update the comment above too.

::: toolkit/components/url-classifier/Classifier.cpp:540
(Diff revision 3)
>        Telemetry::Accumulate(Telemetry::URLCLASSIFIER_PREFIX_MATCH,
>                              static_cast<uint8_t>(matchingStatistics));
>      }
>    }
>  
> +  // If we cannot find prefix in both v2 and v4 database, record the

"If we cannot find the prefix in neither the V2 nor the V4 database, ..."

::: toolkit/components/url-classifier/Classifier.cpp:541
(Diff revision 3)
>                              static_cast<uint8_t>(matchingStatistics));
>      }
>    }
>  
> +  // If we cannot find prefix in both v2 and v4 database, record the
> +  // telemetry here because we won't reach nsUrlClassifierLookupCallback:::HadnelResult.

typo: `HandleResult`
Attachment #8835240 - Flags: review?(francois) → review-
(In reply to François Marier [:francois] from comment #8)
> 
> ::: toolkit/components/url-classifier/Classifier.cpp:516
> (Diff revision 3)
> >  
> > -        if (LookupCache::Cast<LookupCacheV4>(cache)) {
> > -          matchingStatistics |= PrefixMatch::eMatchV4Prefix;
> > -          result->mProtocolV2 = false;
> > -        } else {
> > +        // There are two cases we are going to ignore the result for telemetry:
> > +        // 1. Not both v2 & v4 tables have data(shouldDoTelemetry will be false)
> > +        // 2. When match was found in tracking protection table(in this case url
> > +        //    can only be found in v2 table).
> > +        if (!shouldDoTelemetry ||
> 
> We should actually ignore more lists than that. We have flashblocking lists
> too and Mozilla China has some Baidu lists too.
> 
> I would suggest instead that we ensure that the table name starts with
> `"goog"` (which will also match `"googpub"`).
> 

great idea, thanks!
Comment on attachment 8835240 [details]
Bug 1311931 - Add telemetry to measure full match rate for v2 and v4.

https://reviewboard.mozilla.org/r/110934/#review115450

r+ once you address these last two comments

::: toolkit/components/url-classifier/Classifier.cpp:519
(Diff revisions 3 - 4)
> -        // 1. Not both v2 & v4 tables have data(shouldDoTelemetry will be false)
> -        // 2. When match was found in tracking protection table(in this case url
> +        // 1. shouldDoTelemetry == false(when either v2 or v4 table is empty)
> +        // 2. When match was found in the table which is not provided by google.
> -        //    can only be found in v2 table).
>          if (!shouldDoTelemetry ||
> -            result->mTableName.Find("-track-") != -1) {
> +            !StringBeginsWith(result->mTableName, NS_LITERAL_CSTRING("goog"))) {
>            continue;

Should we set `result->mMatchResult = eTelemetryDisabled` here?

::: toolkit/components/url-classifier/Classifier.cpp:539
(Diff revisions 3 - 4)
>        Telemetry::Accumulate(Telemetry::URLCLASSIFIER_PREFIX_MATCH,
>                              static_cast<uint8_t>(matchingStatistics));
>      }
>    }
>  
> -  // If we cannot find prefix in both v2 and v4 database, record the
> +  // If we cannot find the prefix in nither the v2 nor the v4 database, record the

typo: neither
Attachment #8835240 - Flags: review?(francois) → review+
(In reply to François Marier [:francois] from comment #11)
> ::: toolkit/components/url-classifier/Classifier.cpp:519
> >          if (!shouldDoTelemetry ||
> > -            result->mTableName.Find("-track-") != -1) {
> > +            !StringBeginsWith(result->mTableName, NS_LITERAL_CSTRING("goog"))) {
> >            continue;
> 
> Should we set `result->mMatchResult = eTelemetryDisabled` here?
> 

Since default value of |mMatchResult| is eTelemetryDisabled, so I didn't set it here.
Do you prefer adding it here to make it more clear ?
(In reply to Dimi Lee[:dimi][:dlee] from comment #12)
> Since default value of |mMatchResult| is eTelemetryDisabled, so I didn't set
> it here.
> Do you prefer adding it here to make it more clear ?

As long as it's already set (via the default value) that's fine.
Oh I found I added telemetry inside the for loop which is wrong.
Commit again to fix this issue and also include code refactor which I think it will be easier to understand than previous patch.

Hi Francois,
Do you mind taking a look at it again ? Thanks!
Flags: needinfo?(francois)
(In reply to Dimi Lee[:dimi][:dlee] from comment #16)
> Oh I found I added telemetry inside the for loop which is wrong.
> Commit again to fix this issue and also include code refactor which I think
> it will be easier to understand than previous patch.

Looks good!
Flags: needinfo?(francois)
Keywords: checkin-needed
Please rebase.
 
hg error in cmd: hg rebase -s 99581ca8ae3a -d e4f1979be0d7: rebasing 377474:99581ca8ae3a "Bug 1311931 - Add telemetry to measure full match rate for v2 and v4. r=francois" (tip) merging toolkit/components/telemetry/Histograms.json merging toolkit/components/url-classifier/Classifier.cpp merging toolkit/components/url-classifier/nsUrlClassifierDBService.cpp warning: conflicts while merging toolkit/components/telemetry/Histograms.json! (edit, then use 'hg resolve --mark') warning: conflicts while merging toolkit/components/url-classifier/nsUrlClassifierDBService.cpp! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)
Flags: needinfo?(dlee)
Keywords: checkin-needed
Hi Iris,
I have rebased the patch, could you help check again? thanks!
Flags: needinfo?(dlee)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2aa2e92d394f
Add telemetry to measure full match rate for v2 and v4. r=francois
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2aa2e92d394f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in before you can comment on or make changes to this bug.