Open
Bug 1364611
Opened 7 years ago
Updated 2 years ago
Add telemetry to track complete matches per page load
Categories
(Toolkit :: Safe Browsing, enhancement, P3)
Toolkit
Safe Browsing
Tracking
()
NEW
People
(Reporter: francois, Unassigned)
Details
(Whiteboard: #sbv4-m10)
Attachments
(1 file, 2 obsolete files)
In order to be able to easily compare how our implementation performs compared to Chrome (and potentially discover bugs/regressions), we should add telemetry probes that keep track of: - percentage of page loads that match a prefix - percentage of page loads that match a full hash Both of these would be keyed on the V4 lists: - goog-phish-proto - goog-malware-proto - goog-unwanted-proto
Updated•7 years ago
|
Assignee: nobody → tnguyen
Reporter | ||
Comment 1•7 years ago
|
||
Thomas, I'm going to hold on to this bug while I confirm the exact nature of the numbers that Chrome collects, to make sure I got the description right.
Assignee: tnguyen → francois
Comment 2•7 years ago
|
||
I did not check deeply Chrome code but I doubt that Chrome will block a page if there's any unsafe content resource loading. In the current FF, we just ignore content loading (img, css,...,) even nested iframe (show warning UI in iframe but still allow the top level page loading). That would be a pretty big different which may have impact on the number of "page loading match". If Chrome collects all URL (include content + top level), I think we already had that telemetry probe
Reporter | ||
Comment 3•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #2) > I did not check deeply Chrome code but I doubt that Chrome will block a page > if there's any unsafe content resource loading. I'm not sure what they do for images, but they block the whole page in the case of bad iframes: https://bugzilla.mozilla.org/show_bug.cgi?id=1195242#c21
Comment 4•7 years ago
|
||
I just make a small test, yep, they block img and link stylesheet to unsafe url, so I think any unsafe loading will make the top level page be blocked
Reporter | ||
Updated•7 years ago
|
Whiteboard: #sbv4-m7 → #sbv4-m8
Reporter | ||
Updated•7 years ago
|
Summary: Add telemetry to track prefix and complete matches per page load → Add telemetry to track complete matches per page load
Reporter | ||
Comment 5•7 years ago
|
||
I discussed options with the Chrome team and the following would match fairly closely the data they are able to track. A single enumerated histogram keyed by threat type (phish, malware, unwanted): 0 = no match on the page 1 = warning page shown due to a match on the top-level URL 2 = warning page shown due to a match on a sub-resource (bug 1195242) We should record this after every page load. Note that determining when a page has finished loading is not trivial.
Assignee: francois → nobody
Comment 6•7 years ago
|
||
(In reply to François Marier [:francois] from comment #5) > I discussed options with the Chrome team and the following would match > fairly closely the data they are able to track. > > A single enumerated histogram keyed by threat type (phish, malware, > unwanted): > > 0 = no match on the page > 1 = warning page shown due to a match on the top-level URL > 2 = warning page shown due to a match on a sub-resource (bug 1195242) > > We should record this after every page load. Note that determining when a > page has finished loading is not trivial. That may not cover the case multiple lists hit and multiple sub-resource loadings encounter the list (for example a page loading ends up with 3 blocked sub-resource loadings with different types). So we may have to use bits flag to record telemetry, something like enum PageLoadMatchFlag { NO_MATCH = 0x00, TOP_LEVEL_MATCH = 0x01, SUB_RESOURCE_MATCH = 0x02, MALWARE_LIST_MATCH = 0x04, PHISHING_LIST_MATCH = 0x08, UNWANTED_LIST_MATCH = 0x10, };
Reporter | ||
Comment 7•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #6) > That may not cover the case multiple lists hit and multiple sub-resource > loadings encounter the list (for example a page loading ends up with 3 > blocked sub-resource loadings with different types). I will clarify these points with Google. Hopefully we can get to something that's simple to measure and a useful comparison.
Updated•7 years ago
|
Whiteboard: #sbv4-m8 → #sbv4-m9
Reporter | ||
Comment 8•7 years ago
|
||
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #6) > That may not cover the case multiple lists hit and multiple sub-resource > loadings encounter the list (for example a page loading ends up with 3 > blocked sub-resource loadings with different types). We will be counting the percentage of top-level page views that trigger at least one Safe Browsing warning. If a URL matches more than one list type, we can simply take the "most severe" one. The order is defined here: https://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#67-87 e.g. if a single top-level page has 3 subresources: - 1 that matches the malware list - 1 unwanted, and - 1 phishing then this will count only as 1 malware page view.
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → ettseng
Status: NEW → ASSIGNED
Comment 9•7 years ago
|
||
A WIP patch. It only added a new telemetry data in Histograms.json. Francois, According to comment 8, we don't need to key on the V4 lists as you said in comment 0, do we? Could you take a look to see if the telemetry format is sane?
Flags: needinfo?(francois)
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(francois)
Whiteboard: #sbv4-m9 → #sbv4-m10
Reporter | ||
Comment 10•7 years ago
|
||
Comment on attachment 8907710 [details] [diff] [review] bug-1364611.patch Review of attachment 8907710 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Ethan Tseng [:ethan] from comment #9) > According to comment 8, we don't need to key on the V4 lists as you said in > comment 0, do we? > Could you take a look to see if the telemetry format is sane? Looks good. We should check with some telemetry folks to see if this is the correct approach or a ping on each page load will cost too much in storage / processing. ::: toolkit/components/telemetry/Histograms.json @@ +5019,5 @@ > + "alert_emails": ["francois@mozilla.com", "safebrowsing-telemetry@mozilla.org"], > + "expires_in_version": "never", > + "releaseChannelCollection": "opt-out", > + "kind": "enumerated", > + "n_values": 16, We probably only need 8 here. @@ +5021,5 @@ > + "releaseChannelCollection": "opt-out", > + "kind": "enumerated", > + "n_values": 16, > + "bug_numbers": [1364611], > + "description": "Whether Safe Browsing detected bad URLs in a page load (0 = none, 1 = malware, 2 = phishing, 3 = unwanted, 4 = others)." I would rephrase as "Whether or not a Safe Browsing warning page was shown, either because of the top-level URL or the URL of a sub-resource, on a page load (0=none, 1=malware, 2=phishing, 3=unwanted, 4=other)"
Comment 11•7 years ago
|
||
Refresh the patch based on Francois' comment.
Attachment #8907710 -
Attachment is obsolete: true
Comment 12•7 years ago
|
||
Fixed typos in the last patch.
Attachment #8913968 -
Attachment is obsolete: true
Comment 13•7 years ago
|
||
Sorry, this bug has been pending on me for a long time. It turned out I am the best person to work on this. Hand it over to Dimi.
Assignee: ettseng → dlee
Comment 14•7 years ago
|
||
(In reply to Ethan Tseng [:ethan] - 57 Regression Engineering Support from comment #13) > It turned out I am the best person to work on this. Hand it over to Dimi. I am "not"
Reporter | ||
Updated•6 years ago
|
No longer blocks: SafeBrowsingv4Telemetry
Reporter | ||
Updated•6 years ago
|
Assignee: dimidiana → nobody
Status: ASSIGNED → NEW
Updated•5 years ago
|
Priority: P2 → P3
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•