Add telemetry probe to see the distribution of prefix lengths in matches

RESOLVED FIXED in Firefox 55

Status

()

P3
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: hchang, Assigned: tnguyen)

Tracking

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: #sbv4-m7)

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
In order to have an idea that if the long prefix will be a privacy concern, we'd like to add telemetries to know the frequency that the long prefix being matched.
(Reporter)

Updated

2 years ago
Flags: needinfo?(dveditz)
Not sure what the ni? is asking of me.
Flags: needinfo?(dveditz)
(Reporter)

Comment 2

2 years ago
(In reply to Daniel Veditz [:dveditz] from comment #1)
> Not sure what the ni? is asking of me.

It's because I was asked by you in Hawaii if sending 32-byte prefix to google server
leaks privacy info. I brought this up in our safebrowsing meeting and we decided to
have a telemetry to know how often the 32-byte prefix occurs. It it only happens once
in a while, it might not be an serious privacy issue. The ni? is to loop you in the
discussion if you are interested in :)
Blocks: 1167038
Priority: -- → P2
Summary: Add telemetries to know how often the long prefix is matched → Add telemetry probe to see the distribution of prefix lengths in matches
No longer blocks: 1167038
(Assignee)

Updated

2 years ago
Assignee: nobody → tnguyen
Blocks: 1312939
No longer blocks: 1323953
Status: NEW → ASSIGNED
Priority: P2 → P3
Whiteboard: #sbv4-m7
(Assignee)

Comment 4

2 years ago
I think in almost the case we get 4 bytes prefixes, the long prefixes are corner cases and the high value of histogram is 200 should be enough.
Hi Francois, could you please take a look?
(Assignee)

Updated

2 years ago
Attachment #8865364 - Flags: review?(francois)

Comment 5

2 years ago
mozreview-review
Comment on attachment 8865364 [details]
Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes

https://reviewboard.mozilla.org/r/137020/#review140872

The main thing we should collect here is the actual length of the long prefixes so that we can determine how often Google uses 5-byte prefixes, 6-byte, etc.

With respect to the table names, it would be annoying to whitelist the tablenames we expect (which we would have to do in order to avoid revealing custom tables that people may have added). So let's just drop the table names and have a single counter for all V4 tables. So no key needed.

::: commit-message-1fda5:1
(Diff revision 1)
> + Bug 1322523 - Add telemetry probe to see how many long prefixes are stored in variable length prefix set

"Add telemetry to capture the length of long Safe Browsing V4 prefixes."

::: toolkit/components/telemetry/Histograms.json:4244
(Diff revision 1)
>      "description": "Whether or not a variable-length prefix set loaded from disk is corrupted (true = file corrupted)."
>    },
> +  "URLCLASSIFIER_VLPS_LONG_PREFIXES": {
> +    "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> +    "expires_in_version": "61",
> +    "keyed": true,

not keyed

::: toolkit/components/telemetry/Histograms.json:4245
(Diff revision 1)
>    },
> +  "URLCLASSIFIER_VLPS_LONG_PREFIXES": {
> +    "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> +    "expires_in_version": "61",
> +    "keyed": true,
> +    "kind": "linear",

enumerated

::: toolkit/components/telemetry/Histograms.json:4247
(Diff revision 1)
> +    "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> +    "expires_in_version": "61",
> +    "keyed": true,
> +    "kind": "linear",
> +    "high": 200,
> +    "n_buckets": 20,

Let's use 32.

We won't use 0 to 4, but it will make the results a lot more readable since it will be the actual number of bytes in the prefix.

::: toolkit/components/telemetry/Histograms.json:4249
(Diff revision 1)
> +    "keyed": true,
> +    "kind": "linear",
> +    "high": 200,
> +    "n_buckets": 20,
> +    "bug_numbers": [1322523],
> +    "description": "How many long prefixes (> 4 bytes) are stored in variable length prefix set. Keyed by table name."

"Length of each long prefix (> 4 bytes) received in a Safe Browsing V4 table during an update."
Attachment #8865364 - Flags: review?(francois) → review-
(Assignee)

Comment 6

2 years ago
(In reply to François Marier [:francois] from comment #5)
> Comment on attachment 8865364 [details]
> Bug 1322523 - Add telemetry probe to see how many long prefixes are stored
> in variable length prefix set
> 
> https://reviewboard.mozilla.org/r/137020/#review140872
> 
> The main thing we should collect here is the actual length of the long
> prefixes so that we can determine how often Google uses 5-byte prefixes,
> 6-byte, etc.
> 
> With respect to the table names, it would be annoying to whitelist the
> tablenames we expect (which we would have to do in order to avoid revealing
> custom tables that people may have added). So let's just drop the table
> names and have a single counter for all V4 tables. So no key needed.
Opps, you are right, I updated the patch based on your review.
Thanks Francois.
Comment hidden (mozreview-request)

Comment 8

2 years ago
mozreview-review
Comment on attachment 8865364 [details]
Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes

https://reviewboard.mozilla.org/r/137020/#review141360

::: toolkit/components/url-classifier/HashStore.cpp:178
(Diff revision 2)
>        uint8_t* c = (uint8_t*)&p[i];
>        LOG(("%.2X%.2X%.2X%.2X", c[0], c[1], c[2], c[3]));
>      }
>  
>      LOG(("---- %" PRIuSIZE " fixed-length prefixes in total.", aPrefixes.size() / aSize));
> +  } else if (aSize > 4 && aSize <= COMPLETE_SIZE) {

The check that `aSize <= COMPLETE_SIZE` may as well be a `NS_ENSURE_TRUE_VOID(aSize >= 4 && aSize <= COMPLETE_SIZE)` at the top of the function.

So this would become:

```
if (aSize > 4) {
  telemetry stuff
} else if (LOG_ENABLED())
  dump stuff
}
```

::: toolkit/components/url-classifier/HashStore.cpp:179
(Diff revision 2)
>        LOG(("%.2X%.2X%.2X%.2X", c[0], c[1], c[2], c[3]));
>      }
>  
>      LOG(("---- %" PRIuSIZE " fixed-length prefixes in total.", aPrefixes.size() / aSize));
> +  } else if (aSize > 4 && aSize <= COMPLETE_SIZE) {
> +    for (int i = 0; i < numOfPrefixes; i++) {

Can you check with folks on `#telemetry` to see if calling `Accumulate()` in a loop is fine?

I can't think of an alternative, but maybe they know of a better way to do this (or they will say it's fine).
Attachment #8865364 - Flags: review?(francois) → review-
(Assignee)

Comment 9

2 years ago
Thanks you francois,
I had a discussion with telemetry guy and the only thing he is concerned is performance impact, because one Accumulate() could call takes a lock (in the case numOfPrefixes is too high). Personally I think numOfPrefixes is not expected to be high (as it is corner case from server side), but we are not quiet sure about that (one day server could send an update contains huge number of > 4 bytes prefix).
So, I would like to change back to linear or exponential type (keyed by size)
(Assignee)

Comment 10

2 years ago
hmm, changing to keyed is even more expensive. 
If we don't care about performance impact here (we are adding a probe in update process which is running in background), we could use the same loop, otherwise we should add a limitation min(numOfPrefixes, 10). That is what gfritzsche suggested
See Also: → bug 1364043
Thanks for investigating this Thomas!

(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #10)
> If we don't care about performance impact here (we are adding a probe in
> update process which is running in background), we could use the same loop,
> otherwise we should add a limitation min(numOfPrefixes, 10). That is what
> gfritzsche suggested

I like this limit idea a lot.

I'd suggest two things:

- put a limit of 20 prefixes for the loop
- add an #ifdef NIGHTLY_BUILD around the loop

That way we'll capture the data but we're guaranteed not to have a perf impact on release.

Comment 12

2 years ago
mozreview-review
Comment on attachment 8865364 [details]
Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes

https://reviewboard.mozilla.org/r/137020/#review141844

::: toolkit/components/telemetry/Histograms.json:4247
(Diff revision 2)
> +    "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> +    "expires_in_version": "61",
> +    "kind": "enumerated",
> +    "n_values": 32,
> +    "bug_numbers": [1322523],
> +    "description": "Length of each long prefix (> 4 bytes) received in a Safe Browsing V4 table during an update."

"Length of the first 20 long prefixes (> 4 bytes) received in a Safe Browsing V4 table during an update."
Comment hidden (mozreview-request)

Comment 14

2 years ago
mozreview-review
Comment on attachment 8865364 [details]
Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes

https://reviewboard.mozilla.org/r/137020/#review142228

Thanks Thomas!

datareview+ and r+
Attachment #8865364 - Flags: review?(francois) → review+
(Assignee)

Comment 16

2 years ago
Thanks Francois and Georg

Comment 17

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/02fddd4b2fe1
Add telemetry to capture the length of long Safe Browsing V4 prefixes r=francois
Keywords: checkin-needed
Backed out for bustage in telemtry:

https://hg.mozilla.org/integration/autoland/rev/eb52bae91633e62a94968238f9f4d41d8ed4be06

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=02fddd4b2fe173c06d68802365a1299c4f6cc412&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=98903349&repo=autoland

10:19:12     INFO -  /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/_virtualenv/bin/python -m mozbuild.action.file_generate /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/gen-scalar-data.py main TelemetryScalarData.h .deps/TelemetryScalarData.h.pp /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/Scalars.yaml
10:19:12     INFO -  TelemetryScalarEnums.h
10:19:12     INFO -  /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/_virtualenv/bin/python -m mozbuild.action.file_generate /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/gen-scalar-enum.py main TelemetryScalarEnums.h .deps/TelemetryScalarEnums.h.pp /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/Scalars.yaml
10:19:12     INFO -  ../../../config/nsinstall -L /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/toolkit/components/telemetry -m 644 'TelemetryEventEnums.h' '../../../dist/include/mozilla'
10:19:12     INFO -  Error processing histograms:
10:19:12     INFO -  Histogram "URLCLASSIFIER_VLPS_LONG_PREFIXES" must have a "record_in_processes" field.
10:19:12     INFO -  make[5]: *** [TelemetryHistogramEnums.h] Error 1
10:19:12     INFO -  make[5]: *** Deleting file `TelemetryHistogramEnums.h'
10:19:12     INFO -  make[5]: *** Waiting for unfinished jobs....
10:19:12     INFO -  Error processing histograms:
10:19:12     INFO -  Histogram "URLCLASSIFIER_VLPS_LONG_PREFIXES" must have a "record_in_processes" field.
10:19:13     INFO -  make[5]: *** [TelemetryHistogramData.inc] Error 1
10:19:13     INFO -  make[5]: *** Deleting file `TelemetryHistogramData.inc'
10:19:13     INFO -  ../../config/nsinstall -L /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/accessible/xpcom -m 644 'xpcAccEvents.h' '../../dist/include'
10:19:13     INFO -  make[4]: *** [toolkit/components/telemetry/export] Error 2
Flags: needinfo?(tnguyen)
Comment hidden (mozreview-request)
(Assignee)

Comment 20

2 years ago
Thanks you for backing it out. There's a new record_in_processes required field has been added 2 days ago, I've just rebased and it should work now
Flags: needinfo?(tnguyen)
(Assignee)

Updated

2 years ago
Keywords: checkin-needed

Comment 22

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c9060b0b93ce
Add telemetry to capture the length of long Safe Browsing V4 prefixes r=francois
Keywords: checkin-needed

Comment 23

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c9060b0b93ce
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Converting bug 1364043 into a dependency so that this bug is notified when the status changes. We can then file a follow-up bug to remove the loop and the limit.
Depends on: 1364043
See Also: bug 1364043
You need to log in before you can comment on or make changes to this bug.