1364611 - Add telemetry to track complete matches per page load

Reporter

Description

•

8 years ago

In order to be able to easily compare how our implementation performs compared to Chrome (and potentially discover bugs/regressions), we should add telemetry probes that keep track of: - percentage of page loads that match a prefix - percentage of page loads that match a full hash Both of these would be keyed on the V4 lists: - goog-phish-proto - goog-malware-proto - goog-unwanted-proto

Thomas Nguyen (:tnguyen)

Updated

•

8 years ago

Assignee: nobody → tnguyen

François Marier [:francois]

Reporter

Comment 1

•

8 years ago

Thomas, I'm going to hold on to this bug while I confirm the exact nature of the numbers that Chrome collects, to make sure I got the description right.

Assignee: tnguyen → francois

Thomas Nguyen (:tnguyen)

Comment 2

•

8 years ago

I did not check deeply Chrome code but I doubt that Chrome will block a page if there's any unsafe content resource loading. In the current FF, we just ignore content loading (img, css,...,) even nested iframe (show warning UI in iframe but still allow the top level page loading). That would be a pretty big different which may have impact on the number of "page loading match". If Chrome collects all URL (include content + top level), I think we already had that telemetry probe

François Marier [:francois]

Reporter

Comment 3

•

8 years ago

(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #2) > I did not check deeply Chrome code but I doubt that Chrome will block a page > if there's any unsafe content resource loading. I'm not sure what they do for images, but they block the whole page in the case of bad iframes: https://bugzilla.mozilla.org/show_bug.cgi?id=1195242#c21

Thomas Nguyen (:tnguyen)

Comment 4

•

8 years ago

I just make a small test, yep, they block img and link stylesheet to unsafe url, so I think any unsafe loading will make the top level page be blocked

François Marier [:francois]

Reporter

Updated

•

8 years ago

Whiteboard: #sbv4-m7 → #sbv4-m8

François Marier [:francois]

Reporter

Updated

•

8 years ago

Summary: Add telemetry to track prefix and complete matches per page load → Add telemetry to track complete matches per page load

François Marier [:francois]

Reporter

Comment 5

•

8 years ago

I discussed options with the Chrome team and the following would match fairly closely the data they are able to track. A single enumerated histogram keyed by threat type (phish, malware, unwanted): 0 = no match on the page 1 = warning page shown due to a match on the top-level URL 2 = warning page shown due to a match on a sub-resource (bug 1195242) We should record this after every page load. Note that determining when a page has finished loading is not trivial.

Assignee: francois → nobody

Thomas Nguyen (:tnguyen)

Comment 6

•

8 years ago

(In reply to François Marier [:francois] from comment #5) > I discussed options with the Chrome team and the following would match > fairly closely the data they are able to track. > > A single enumerated histogram keyed by threat type (phish, malware, > unwanted): > > 0 = no match on the page > 1 = warning page shown due to a match on the top-level URL > 2 = warning page shown due to a match on a sub-resource (bug 1195242) > > We should record this after every page load. Note that determining when a > page has finished loading is not trivial. That may not cover the case multiple lists hit and multiple sub-resource loadings encounter the list (for example a page loading ends up with 3 blocked sub-resource loadings with different types). So we may have to use bits flag to record telemetry, something like enum PageLoadMatchFlag { NO_MATCH = 0x00, TOP_LEVEL_MATCH = 0x01, SUB_RESOURCE_MATCH = 0x02, MALWARE_LIST_MATCH = 0x04, PHISHING_LIST_MATCH = 0x08, UNWANTED_LIST_MATCH = 0x10, };

François Marier [:francois]

Reporter

Comment 7

•

8 years ago

(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #6) > That may not cover the case multiple lists hit and multiple sub-resource > loadings encounter the list (for example a page loading ends up with 3 > blocked sub-resource loadings with different types). I will clarify these points with Google. Hopefully we can get to something that's simple to measure and a useful comparison.

Thomas Nguyen (:tnguyen)

Updated

•

8 years ago

Whiteboard: #sbv4-m8 → #sbv4-m9

François Marier [:francois]

Reporter

Comment 8

•

8 years ago

(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #6) > That may not cover the case multiple lists hit and multiple sub-resource > loadings encounter the list (for example a page loading ends up with 3 > blocked sub-resource loadings with different types). We will be counting the percentage of top-level page views that trigger at least one Safe Browsing warning. If a URL matches more than one list type, we can simply take the "most severe" one. The order is defined here: https://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#67-87 e.g. if a single top-level page has 3 subresources: - 1 that matches the malware list - 1 unwanted, and - 1 phishing then this will count only as 1 malware page view.

François Marier [:francois]

Reporter

Updated

•

8 years ago

Assignee: nobody → ettseng

Status: NEW → ASSIGNED

Ethan Tseng [:ethan]

Comment 9

•

8 years ago

Attached patch bug-1364611.patch (obsolete) — Details — Splinter Review

A WIP patch. It only added a new telemetry data in Histograms.json. Francois, According to comment 8, we don't need to key on the V4 lists as you said in comment 0, do we? Could you take a look to see if the telemetry format is sane?

Flags: needinfo?(francois)

François Marier [:francois]

Reporter

Updated

•

8 years ago

Flags: needinfo?(francois)

Whiteboard: #sbv4-m9 → #sbv4-m10

François Marier [:francois]

Reporter

Comment 10

•

8 years ago

Comment on attachment 8907710 [details] [diff] [review] bug-1364611.patch Review of attachment 8907710 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Ethan Tseng [:ethan] from comment #9) > According to comment 8, we don't need to key on the V4 lists as you said in > comment 0, do we? > Could you take a look to see if the telemetry format is sane? Looks good. We should check with some telemetry folks to see if this is the correct approach or a ping on each page load will cost too much in storage / processing. ::: toolkit/components/telemetry/Histograms.json @@ +5019,5 @@ > + "alert_emails": ["francois@mozilla.com", "safebrowsing-telemetry@mozilla.org"], > + "expires_in_version": "never", > + "releaseChannelCollection": "opt-out", > + "kind": "enumerated", > + "n_values": 16, We probably only need 8 here. @@ +5021,5 @@ > + "releaseChannelCollection": "opt-out", > + "kind": "enumerated", > + "n_values": 16, > + "bug_numbers": [1364611], > + "description": "Whether Safe Browsing detected bad URLs in a page load (0 = none, 1 = malware, 2 = phishing, 3 = unwanted, 4 = others)." I would rephrase as "Whether or not a Safe Browsing warning page was shown, either because of the top-level URL or the URL of a sub-resource, on a page load (0=none, 1=malware, 2=phishing, 3=unwanted, 4=other)"

Ethan Tseng [:ethan]

Comment 11

•

8 years ago

Attached patch Bug 1364611 - Part 1: Add a new telemetry data in Histograms.json (obsolete) — Details — Splinter Review

Refresh the patch based on Francois' comment.

Attachment #8907710 - Attachment is obsolete: true

Ethan Tseng [:ethan]

Comment 12

•

8 years ago

Attached patch Bug 1364611 - Part 1: Add a new telemetry data in Histograms.json — Details — Splinter Review

Fixed typos in the last patch.

Attachment #8913968 - Attachment is obsolete: true

Ethan Tseng [:ethan]

Comment 13

•

8 years ago

Sorry, this bug has been pending on me for a long time. It turned out I am the best person to work on this. Hand it over to Dimi.

Assignee: ettseng → dlee

Ethan Tseng [:ethan]

Comment 14

•

8 years ago

(In reply to Ethan Tseng [:ethan] - 57 Regression Engineering Support from comment #13) > It turned out I am the best person to work on this. Hand it over to Dimi. I am "not"

François Marier [:francois]

Reporter

Updated

•

8 years ago

No longer blocks: SafeBrowsingv4Telemetry

François Marier [:francois]

Reporter

Updated

•

7 years ago

Assignee: dimidiana → nobody

Status: ASSIGNED → NEW

Dimi Lee [:dimi]

Updated

•

6 years ago

Priority: P2 → P3

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

bug-1364611.patch 8 years ago Ethan Tseng [:ethan] 1.26 KB, patch		Details \| Diff \| Splinter Review
Bug 1364611 - Part 1: Add a new telemetry data in Histograms.json 8 years ago Ethan Tseng [:ethan] 1.53 KB, patch		Details \| Diff \| Splinter Review
Bug 1364611 - Part 1: Add a new telemetry data in Histograms.json 8 years ago Ethan Tseng [:ethan] 1.39 KB, patch		Details \| Diff \| Splinter Review