Open Bug 1331138 Opened 7 years ago Updated 2 years ago

[meta] Add opt-in reporting of matched URLs to Google (ThreatHit API)

Categories

(Toolkit :: Safe Browsing, task, P3)

task

Tracking

()

People

(Reporter: francois, Unassigned)

References

(Depends on 5 open bugs, )

Details

(Keywords: meta)

Chrome allows users to opt into reporting the malware/phishing URLs (known to Safe Browsing) they encounter in the wild. This allows Google to make better decisions on which entries to send to clients first and to measure how effective their protection is. They can also make use of the referrer information to "go up the chain".

Google tells us that Firefox users have a different usage pattern when it comes to downloads and they believe it might be the case for browsing as well. If we let our users report this data, we could help them make the list better for Firefox users as well.

We should add this feature (disabled by default) and let users opt-in by clicking a checkbox on the Safe Browsing interstitial pages. The UI would therefore be very similar to how we let users report information about TLS errors they receive, but it would be a separate setting since it sends the data to Google instead of Mozilla.

List of information to send:

- unique browser identifier (randomized, stable for a week only)
- country/region information (granularity to be confirmed)
- full URL (without query string)
- hash that matched
- referrer chain

Also, we should stand up a proxy to hide our users' IP addresses.
I put the private documentation we have on the Intranet Safe Browsing page (see URL field above).

For the region information, we'll have to work with Google to change the API, but perhaps we could be sending a UN M.49 code (https://en.wikipedia.org/wiki/UN_M.49) since that could be as granular as we want.
Assignee: nobody → tnguyen
(In reply to François Marier [:francois] from comment #0)

> List of information to send:
> 
> - unique browser identifier (randomized, stable for a week only)
> - country/region information (granularity to be confirmed)
> - full URL (without query string)
> - hash that matched
> - referrer chain
> 
> Also, we should stand up a proxy to hide our users' IP addresses.

The protobuf using in SafeBrowsing seems not cover all information we have to send to server

Only the followings information are included in the report request

  // The platform type reported.
  optional PlatformType platform_type = 2;

  // The threat entry responsible for the hit. Full hash should be reported for
  // hash-based hits.
  optional ThreatEntry entry = 3;

    // The URL of the resource.
    optional string url = 1;

    // The type of source reported.
    optional ThreatSourceType type = 2;

    // The remote IP of the resource in ASCII format. Either IPv4 or IPv6.
    optional string remote_ip = 3;

    // Referrer of the resource. Only set if the referrer is available.
    optional string referrer = 4;


There's no ASN and country/region. Could you please double check if we have to follow the [1], then only information which are relevant to SafeBrwosing v4: hash, url, ip, referrer could be sent to server. 
[1] http://searchfox.org/mozilla-central/rev/30fcf167af036aeddf322de44a2fadd370acfd2f/toolkit/components/url-classifier/chromium/safebrowsing.proto#201-246
Flags: needinfo?(francois)
I see [1], the protobuf report structure using in Chromium client side phishing detection is very similar to the requirement and contains asn + country
Is this something we can use?
[1] https://cs.chromium.org/chromium/src/components/safe_browsing/csd.proto?q=ClientSafeBrowsingReportRequest&dr=CSs&l=11
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #2)
> There's no ASN and country/region. Could you please double check if we have
> to follow the [1], then only information which are relevant to SafeBrwosing
> v4: hash, url, ip, referrer could be sent to server. 

Google will be extending the protobuf to allow us to report country/region as well as the temporary unique identifier.

Given that this is not yet available, we can start with just the ip, hash, url, referrer chain, etc.

(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #3)
> I see [1], the protobuf report structure using in Chromium client side
> phishing detection is very similar to the requirement and contains asn +
> country
> Is this something we can use?

No, we won't be reporting the same data as Chrome.
Flags: needinfo?(francois)
We may have to trace redirect history, and each time a channel does a redirect, we should collect the following information : remote address, referrer (if any), and url of the channel. 
Then, we may have to put them into an struct array.
I am writing a struct and adding an array of the struct to store redirect content.
It looks like
  struct ThreatHitTraceResource {
    // The URL of the resource.
    nsACString url;

    // The remote IP of the resource in ASCII format. Either IPv4 or IPv6.
    nsACString remoteIp;

    // Referrer of the resource. Only set if the referrer is available.
    nsACString referrer;
  };

  typedef nsTArray<ThreatHitTraceResource> ThreatHitResources;

Hi Patrick,
Do you have any concern if I extend Loadinfo and put those information to loadinfo like we did with principal array redirect chain [1]
[1] https://searchfox.org/mozilla-central/rev/7419b368156a6efa24777b21b0e5706be89a9c2f/netwerk/base/LoadInfo.h#147
Flags: needinfo?(mcmanus)
sgtm
Flags: needinfo?(mcmanus)
Depends on: 1351146
Depends on: 1351147
Status: NEW → ASSIGNED
Google has added a new field to specify the region that the user is in:

https://intranet.mozilla.org/index.php?title=SafeBrowsing&action=historysubmit&diff=182519&oldid=182360
Depends on: 1358536
Summary: Add opt-in reporting of matched URLs to Google → [meta] Add opt-in reporting of matched URLs to Google
Depends on: 1372456
Summary: [meta] Add opt-in reporting of matched URLs to Google → [meta] Add opt-in reporting of matched URLs to Google (ThreatHIt API)
Unassigning Thomas from the meta bug since the work is happening in the dependent bugs.
Assignee: tnguyen → nobody
Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false
Depends on: 1387364
Summary: [meta] Add opt-in reporting of matched URLs to Google (ThreatHIt API) → [meta] Add opt-in reporting of matched URLs to Google (ThreatHit API)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: 1385156
Depends on: 1414051
Depends on: 1414056
> since it sends the data to Google instead of Mozilla.

Make sure we update our privacy policy if necessary. The current text is pretty broad so maybe it covers it.
Depends on: 1442780
Keywords: meta
Priority: P2 → P3
Type: defect → task
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.