Open Bug 1542162 Opened 5 years ago Updated 1 month ago

Send SafeBrowsing ping for unknown file extension

Categories

(Toolkit :: Safe Browsing, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: dimi, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(3 files, 2 obsolete files)

Google has changed its policy for unknown file types. Now, a remote ping will be triggered for unknown file extensions.
We should consider doing the same thing.

This patch is the preparation work for changing how we determin if a
file is eligible for remote lookup, it includes the following changes:

  1. Replace |IsBinary| with |ShouldSendRemoteLookup| to determin if a file is eligible for
    remote lookup.
  2. Delay certificate parsing until local block list has been checked and the file is not in
    the list, this avoid unnecessary |ParseCertificate| for files will be found in blocklist.
  3. Use |ShouldSendRemoteLookup| to check if we need to ParseCertificate.
    If a file is not eligible for remote lookup, we don't need to retrieve allow list from
    its certificate because it won't change the result.

This patch merges the lists in ApplicationReputation and nsFileCommon
into two lists in nsFileCommon - kExecutableList & kNonExecutableList.

kExecutableList is used on Windows to determin whether files are
executable. kNonExecutableList are extensions are known to donwload protection but
not in kExecutableList.

Extensions in the two lists include a "ping" property, which is used by
download protection to determin if the file is eligible for remote
lookup. The possible value of the 'ping' includes:

  1. SENT
  2. NOT_SEND
  3. UNKNOWN

UNKNOWN value is used for extensions are included in our executable list
but are not known to download protection. For this case, wether a
extension is eligble for remote lookup will depend on the unknown extension
policy(implement in next patch)

After this patch, unknown extensions are eligible for remote lookup.

Add a preference "browser.safebrowsing.downloads.remote.send_unknown" to
turn ON/OFF this new policy.

We should consider doing the same thing.

What have been the considerations here, specifically has there been any discussion regarding the privacy implications that I can read up on?

We have long standing bugs like https://bugzilla.mozilla.org/show_bug.cgi?id=1167040 which mean that currently every archive gets scanned, meaning that effectively the privacy story of Firefox is worse than Chrome here (!).

We're now going to send unknown files too. That brings 2 questions: a) does this mean that effectively everything you download with Firefox gets sent to Google, with the exception of executable that have a known signature (on Windows - not sure about the Mac/Linux situation) b) What's the status of bug 1167040 in this context? If unknown files get sent, that means there's not really a point in fixing that any more, as the likelihood of finding only files we won't scan is near zero. I basically only see .jpg and .mp4 etc.

Hi gcp,
Thank you for the comment!
To be honest, I don't really aware of the privacy concern while working on this, thank you for the reminding.

To answer your question:
Yes, after this patch, only extensions in the below list won't trigger a remote ping
"jpg, jpeg, mp3, mp4, png, csv, ica, gif, txt, package, tif, webp, mkv, wav, mov, paf, vbscript, ad, inx, isu, job, rgs, u3p, out, run"
Right now, according to the telemetry, 40% of files downloaded are non- binary files, I am not sure the percentage the above list can cover.

You made a good point that we already expose more and more information to Google, and the patch makes it much worse.
I think in general, we should have criteria about if an extension can be added to the list

It will be good if we can have some data to help us evaluate the trade-off between security and privacy. Maybe we can add sample pings for new extensions to see what is the security benefit we can get, and only enable it if it is worth. Also, we already have telemetry[1] to see the existing ping result, maybe we can check this regularly and remove extensions that are considered safe.
But I am not sure if this is the right thing to do because it means we are less secure than chrome. Maybe we should do this aggressively while having some way to protect user's privacy (for example, ping via proxy)?

I'll cancel the review first and find someone to talk about the privacy concern. I also want to hear from you to know if you have any suggestion on the comments above, thanks!

[1] https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-03-14&include_spill=0&keys=.pdf!.zip!.exe!.zip&max_channel_version=beta%252F66&measure=APPLICATION_REPUTATION_SERVER_VERDICT_2&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2019-01-28&table=0&trim=1&use_submission_date=0

If I saw correctly, the functionality is behind a pref right now. So I don't see a problem with reviewing and landing the patches, as we can just enable or disable the pref. I just want to think about what we can do with the defaults going forward.

I understand that there's some efforts underway wrt Windows Defender integration, so even scenarios are possible where we would disable this feature depending on whether Defender is available, for example.

In any case, I'll review if you let me :-)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)

In any case, I'll review if you let me :-)

Sure :)

Blocks: 1549296

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:dimi, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(dlee)

I did a comparison about the extensions Google is using and the extensions we are using, I found the difference is bigger than I thought before.
Here is the table:
https://docs.google.com/spreadsheets/d/1_4CB84TxpzZaEiTsLRlKeAUu-sZ8MxBUv6RFt2ULFOo/edit?usp=sharing
There are 382 extensions in their list while we only have 135.
I feel even the patch here only enables this feature in Nightly build, we should still discuss this first in Whistler to decide the next step. So right now, I intend to defer landing this.

Flags: needinfo?(dlee)
Priority: P1 → P2
Blocks: 1564041

Add a preference "browser.safebrowsing.downloads.remote.send_unknown" to
turn ON/OFF this new policy.
Default is OFF, so this patch doesn't affect the amount of remote pings
we may trigger.

Depends on D29156

Attachment #9061291 - Attachment is obsolete: true
Attachment #9061291 - Attachment is obsolete: false
Attachment #9076514 - Attachment is obsolete: true

This patch merges the lists in ApplicationReputation and nsFileCommon
into two lists in nsFileCommon - kExecutableList & kNonExecutableList.

kExecutableList is used on Windows to determin whether files are
executable. kNonExecutableList are extensions are known to donwload protection but
not in kExecutableList.

Extensions in the two lists include a "ping" property, which is used by
download protection to determin if the file is eligible for remote
lookup. The possible value of the 'ping' includes:

  1. SENT
  2. NOT_SEND
  3. UNKNOWN

UNKNOWN value is used for extensions are included in our executable list
but are not known to download protection. For this case, wether a
extension is eligble for remote lookup will depend on the unknown extension
policy(implement in next patch)

Depends on D29155

Attachment #9076515 - Attachment is obsolete: true

I want to take an extra day to re-review this to convince myself we don't risk accidentally sending more stuff even if preffed off.

Dimi, I hinted at this in email, but I'll ask it again here: it is worthwhile to decide in advance, before we see the telemetry what kind of results from the telemetry would allow what kind of conclusion wrt. this feature. It we can't formulate ahead of time what decision we'll make on the data this brings, there's no need to collect the data.

Flags: needinfo?(dlee)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #14)

Dimi, I hinted at this in email, but I'll ask it again here: it is worthwhile to decide in advance, before we see the telemetry what kind of results from the telemetry would allow what kind of conclusion wrt. this feature. It we can't formulate ahead of time what decision we'll make on the data this brings, there's no need to collect the data.

Hi gcp, sorry for the late reply, I was on a vacation :)

I think adding this telemetry may help because I checked the extensions excluded by us, most of them are uncommon.

So if the extensions excluded by us plus unknown extensions are less than a certain ratio of user downloads(10% for example), we can consider adding this policy because this helps us sync the download protection coverage with Chrome, and with the trade-off that increase an acceptable amount of information leak.

I understand that if the result is 20% or 30%, it doesn't mean we SHOULD or SHOULD NOT do this, but this still gives us a hint about how critical the problem is, right now, we don't have the statistics. We have a similar telemetry but only records binary and non-binary. I replace it by this new telemetry with more fine-grained data.

Flags: needinfo?(dlee) → needinfo?(gpascutto)

Ok, so that's a clear metric: if the increase of things we'd send remotely falls below some threshold (i.e. 10%), we can probably just ship this for parity reasons because the privacy impact isn't that large and if not we have a problem to fix.

If I interpret our stats correctly, on beta: ~19% locally whitelisted, 39% safe (I assume with remote lookup), 32% non-binary (so no lookup?), the rest negligible. This is already a significant change from beta 66 which was mentioned earlier (32% remote -> 39% remote, 9% local whitelist -> 19% local whitelist, the latter surprises me!).

So currently we do remote lookups for ~39% of the downloads. Let's say the decision threshold is that we want to keep it under 50% for beta users? Sounds good?

I notice 2% of the beta users have download protection disabled. Would be interesting to see release, but the dashboard has been broken for months for that, sigh. But it indicates that a sizable part of our userbase is aware of the privacy trade-off of this feature.

Flags: needinfo?(gpascutto)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #16)

If I interpret our stats correctly, on beta: ~19% locally whitelisted, 39% safe (I assume with remote lookup), 32% non-binary (so no lookup?), the

Yes, safe is the result of remote lookup, non-binary means no lookup is involved

So currently we do remote lookups for ~39% of the downloads. Let's say the decision threshold is that we want to keep it under 50% for beta users? Sounds good?

Yes, 50% sounds good and reasonable :)

Assignee: dlee → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: