Closed Bug 1322523 Opened 8 years ago Closed 7 years ago

Add telemetry probe to see the distribution of prefix lengths in matches

Categories

(Toolkit :: Safe Browsing, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: hchang, Assigned: tnguyen)

References

Details

(Whiteboard: #sbv4-m7)

Attachments

(1 file)

In order to have an idea that if the long prefix will be a privacy concern, we'd like to add telemetries to know the frequency that the long prefix being matched.
Flags: needinfo?(dveditz)
Not sure what the ni? is asking of me.
Flags: needinfo?(dveditz)
(In reply to Daniel Veditz [:dveditz] from comment #1) > Not sure what the ni? is asking of me. It's because I was asked by you in Hawaii if sending 32-byte prefix to google server leaks privacy info. I brought this up in our safebrowsing meeting and we decided to have a telemetry to know how often the 32-byte prefix occurs. It it only happens once in a while, it might not be an serious privacy issue. The ni? is to loop you in the discussion if you are interested in :)
Priority: -- → P2
Summary: Add telemetries to know how often the long prefix is matched → Add telemetry probe to see the distribution of prefix lengths in matches
No longer blocks: safebrowsingv4
Assignee: nobody → tnguyen
Blocks: SafeBrowsingv4Telemetry
No longer blocks: 1323953
Status: NEW → ASSIGNED
Priority: P2 → P3
Whiteboard: #sbv4-m7
I think in almost the case we get 4 bytes prefixes, the long prefixes are corner cases and the high value of histogram is 200 should be enough. Hi Francois, could you please take a look?
Attachment #8865364 - Flags: review?(francois)
Comment on attachment 8865364 [details] Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes https://reviewboard.mozilla.org/r/137020/#review140872 The main thing we should collect here is the actual length of the long prefixes so that we can determine how often Google uses 5-byte prefixes, 6-byte, etc. With respect to the table names, it would be annoying to whitelist the tablenames we expect (which we would have to do in order to avoid revealing custom tables that people may have added). So let's just drop the table names and have a single counter for all V4 tables. So no key needed. ::: commit-message-1fda5:1 (Diff revision 1) > + Bug 1322523 - Add telemetry probe to see how many long prefixes are stored in variable length prefix set "Add telemetry to capture the length of long Safe Browsing V4 prefixes." ::: toolkit/components/telemetry/Histograms.json:4244 (Diff revision 1) > "description": "Whether or not a variable-length prefix set loaded from disk is corrupted (true = file corrupted)." > }, > + "URLCLASSIFIER_VLPS_LONG_PREFIXES": { > + "alert_emails": ["safebrowsing-telemetry@mozilla.org"], > + "expires_in_version": "61", > + "keyed": true, not keyed ::: toolkit/components/telemetry/Histograms.json:4245 (Diff revision 1) > }, > + "URLCLASSIFIER_VLPS_LONG_PREFIXES": { > + "alert_emails": ["safebrowsing-telemetry@mozilla.org"], > + "expires_in_version": "61", > + "keyed": true, > + "kind": "linear", enumerated ::: toolkit/components/telemetry/Histograms.json:4247 (Diff revision 1) > + "alert_emails": ["safebrowsing-telemetry@mozilla.org"], > + "expires_in_version": "61", > + "keyed": true, > + "kind": "linear", > + "high": 200, > + "n_buckets": 20, Let's use 32. We won't use 0 to 4, but it will make the results a lot more readable since it will be the actual number of bytes in the prefix. ::: toolkit/components/telemetry/Histograms.json:4249 (Diff revision 1) > + "keyed": true, > + "kind": "linear", > + "high": 200, > + "n_buckets": 20, > + "bug_numbers": [1322523], > + "description": "How many long prefixes (> 4 bytes) are stored in variable length prefix set. Keyed by table name." "Length of each long prefix (> 4 bytes) received in a Safe Browsing V4 table during an update."
Attachment #8865364 - Flags: review?(francois) → review-
(In reply to François Marier [:francois] from comment #5) > Comment on attachment 8865364 [details] > Bug 1322523 - Add telemetry probe to see how many long prefixes are stored > in variable length prefix set > > https://reviewboard.mozilla.org/r/137020/#review140872 > > The main thing we should collect here is the actual length of the long > prefixes so that we can determine how often Google uses 5-byte prefixes, > 6-byte, etc. > > With respect to the table names, it would be annoying to whitelist the > tablenames we expect (which we would have to do in order to avoid revealing > custom tables that people may have added). So let's just drop the table > names and have a single counter for all V4 tables. So no key needed. Opps, you are right, I updated the patch based on your review. Thanks Francois.
Comment on attachment 8865364 [details] Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes https://reviewboard.mozilla.org/r/137020/#review141360 ::: toolkit/components/url-classifier/HashStore.cpp:178 (Diff revision 2) > uint8_t* c = (uint8_t*)&p[i]; > LOG(("%.2X%.2X%.2X%.2X", c[0], c[1], c[2], c[3])); > } > > LOG(("---- %" PRIuSIZE " fixed-length prefixes in total.", aPrefixes.size() / aSize)); > + } else if (aSize > 4 && aSize <= COMPLETE_SIZE) { The check that `aSize <= COMPLETE_SIZE` may as well be a `NS_ENSURE_TRUE_VOID(aSize >= 4 && aSize <= COMPLETE_SIZE)` at the top of the function. So this would become: ``` if (aSize > 4) { telemetry stuff } else if (LOG_ENABLED()) dump stuff } ``` ::: toolkit/components/url-classifier/HashStore.cpp:179 (Diff revision 2) > LOG(("%.2X%.2X%.2X%.2X", c[0], c[1], c[2], c[3])); > } > > LOG(("---- %" PRIuSIZE " fixed-length prefixes in total.", aPrefixes.size() / aSize)); > + } else if (aSize > 4 && aSize <= COMPLETE_SIZE) { > + for (int i = 0; i < numOfPrefixes; i++) { Can you check with folks on `#telemetry` to see if calling `Accumulate()` in a loop is fine? I can't think of an alternative, but maybe they know of a better way to do this (or they will say it's fine).
Attachment #8865364 - Flags: review?(francois) → review-
Thanks you francois, I had a discussion with telemetry guy and the only thing he is concerned is performance impact, because one Accumulate() could call takes a lock (in the case numOfPrefixes is too high). Personally I think numOfPrefixes is not expected to be high (as it is corner case from server side), but we are not quiet sure about that (one day server could send an update contains huge number of > 4 bytes prefix). So, I would like to change back to linear or exponential type (keyed by size)
hmm, changing to keyed is even more expensive. If we don't care about performance impact here (we are adding a probe in update process which is running in background), we could use the same loop, otherwise we should add a limitation min(numOfPrefixes, 10). That is what gfritzsche suggested
See Also: → 1364043
Thanks for investigating this Thomas! (In reply to Thomas Nguyen[:tnguyen] ni plz from comment #10) > If we don't care about performance impact here (we are adding a probe in > update process which is running in background), we could use the same loop, > otherwise we should add a limitation min(numOfPrefixes, 10). That is what > gfritzsche suggested I like this limit idea a lot. I'd suggest two things: - put a limit of 20 prefixes for the loop - add an #ifdef NIGHTLY_BUILD around the loop That way we'll capture the data but we're guaranteed not to have a perf impact on release.
Comment on attachment 8865364 [details] Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes https://reviewboard.mozilla.org/r/137020/#review141844 ::: toolkit/components/telemetry/Histograms.json:4247 (Diff revision 2) > + "alert_emails": ["safebrowsing-telemetry@mozilla.org"], > + "expires_in_version": "61", > + "kind": "enumerated", > + "n_values": 32, > + "bug_numbers": [1322523], > + "description": "Length of each long prefix (> 4 bytes) received in a Safe Browsing V4 table during an update." "Length of the first 20 long prefixes (> 4 bytes) received in a Safe Browsing V4 table during an update."
Comment on attachment 8865364 [details] Bug 1322523 - Add telemetry to capture the length of long Safe Browsing V4 prefixes https://reviewboard.mozilla.org/r/137020/#review142228 Thanks Thomas! datareview+ and r+
Attachment #8865364 - Flags: review?(francois) → review+
Thanks Francois and Georg
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/02fddd4b2fe1 Add telemetry to capture the length of long Safe Browsing V4 prefixes r=francois
Keywords: checkin-needed
Backed out for bustage in telemtry: https://hg.mozilla.org/integration/autoland/rev/eb52bae91633e62a94968238f9f4d41d8ed4be06 Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=02fddd4b2fe173c06d68802365a1299c4f6cc412&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=98903349&repo=autoland 10:19:12 INFO - /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/_virtualenv/bin/python -m mozbuild.action.file_generate /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/gen-scalar-data.py main TelemetryScalarData.h .deps/TelemetryScalarData.h.pp /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/Scalars.yaml 10:19:12 INFO - TelemetryScalarEnums.h 10:19:12 INFO - /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/_virtualenv/bin/python -m mozbuild.action.file_generate /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/gen-scalar-enum.py main TelemetryScalarEnums.h .deps/TelemetryScalarEnums.h.pp /builds/slave/autoland-m64-00000000000000000/build/src/toolkit/components/telemetry/Scalars.yaml 10:19:12 INFO - ../../../config/nsinstall -L /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/toolkit/components/telemetry -m 644 'TelemetryEventEnums.h' '../../../dist/include/mozilla' 10:19:12 INFO - Error processing histograms: 10:19:12 INFO - Histogram "URLCLASSIFIER_VLPS_LONG_PREFIXES" must have a "record_in_processes" field. 10:19:12 INFO - make[5]: *** [TelemetryHistogramEnums.h] Error 1 10:19:12 INFO - make[5]: *** Deleting file `TelemetryHistogramEnums.h' 10:19:12 INFO - make[5]: *** Waiting for unfinished jobs.... 10:19:12 INFO - Error processing histograms: 10:19:12 INFO - Histogram "URLCLASSIFIER_VLPS_LONG_PREFIXES" must have a "record_in_processes" field. 10:19:13 INFO - make[5]: *** [TelemetryHistogramData.inc] Error 1 10:19:13 INFO - make[5]: *** Deleting file `TelemetryHistogramData.inc' 10:19:13 INFO - ../../config/nsinstall -L /builds/slave/autoland-m64-00000000000000000/build/src/obj-firefox/accessible/xpcom -m 644 'xpcAccEvents.h' '../../dist/include' 10:19:13 INFO - make[4]: *** [toolkit/components/telemetry/export] Error 2
Flags: needinfo?(tnguyen)
Thanks you for backing it out. There's a new record_in_processes required field has been added 2 days ago, I've just rebased and it should work now
Flags: needinfo?(tnguyen)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/c9060b0b93ce Add telemetry to capture the length of long Safe Browsing V4 prefixes r=francois
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Converting bug 1364043 into a dependency so that this bug is notified when the status changes. We can then file a follow-up bug to remove the loop and the limit.
Depends on: 1364043
See Also: 1364043
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: