Closed Bug 1433179 Opened 6 years ago Closed 5 years ago

ReferrerUrl and HostUrl are never added to the Zone.Information stream

Categories

(Toolkit :: Downloads API, defect, P5)

57 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: julih, Assigned: emk)

References

Details

Attachments

(2 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299

Steps to reproduce:

open firefox.  navigate to any .exe file.  download it.  check the zone/EMOTW using powershell script:

PS C:\Users> get-content -path C:\users\Desktop\freevideo.exe -stream zone.identifier


Actual results:

[ZoneTransfer]
ZoneId=3

Only returns zone, does not return download url and referrer url


Expected results:

[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://demo.smartscreen.msft.net/
HostUrl=https://demo.smartscreen.msft.net/download/unknown/freevideo.exe
Component: Untriaged → Downloads API
Product: Firefox → Toolkit
I can't find any Microsoft documentation about this feature at the moment. If newer versions of Windows add this extended information, we first have to determine in what circumstances it is done before taking action. Given that this doesn't affect protection from downloaded files in general, this bug is low priority.

Note that we don't invoke the Attachment Manager directly because it affects performance by doing main-thread I/O. We also have to make sure this extended information is not added when downloading from a Private Browsing window.
Priority: -- → P5
Summary: Windows Defender SmartScreen: Firefox is not attaching Extended Mark of the Web (EMOTW) to files that are downloaded → ReferrerUrl and HostUrl are never added to the Zone.Information stream
This feature was added in Windows 10 release 1703 (build 15063).
The HostUrl and ReferrerUrl are set by Microsoft Edge and Google Chrome.
Edge also sets a HostIpAddress field.

It is used for protection purposes.
Specifically, Microsoft’s Windows Defender Advanced Threat Protection exposes this info to the SOC, who can then identify where attacks came from, which other downloads might be related, and respond/block accordingly.
I don't know which other products/tools use this feature.

Thanks,
Tomer
(from the Windows Defender Advanced Threat Protection team)
Attachment #8998857 - Flags: review?(paolo.mozmail)
Assignee: nobody → VYV03354
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Sorry, wrong files were contained in the patch.
Attachment #8998858 - Flags: review?(paolo.mozmail)
Attachment #8998857 - Attachment is obsolete: true
Attachment #8998857 - Flags: review?(paolo.mozmail)
Comment on attachment 8998858 [details] [diff] [review]
Add ReferrerUrl and HostUrl to the Zone.Information stream

Per comment 1, this probably shouldn't be added for private downloads.

Also, it's likely that certain schemes should be excluded, like "data:", but also others. Can you check if certain schemes are excluded by Chrome and Edge?
Attachment #8998858 - Flags: review?(paolo.mozmail)
(In reply to talpert from comment #2)
> This feature was added in Windows 10 release 1703 (build 15063).
> The HostUrl and ReferrerUrl are set by Microsoft Edge and Google Chrome.
> Edge also sets a HostIpAddress field.
> 
> It is used for protection purposes.
> Specifically, Microsoft’s Windows Defender Advanced Threat Protection
> exposes this info to the SOC, who can then identify where attacks came from,
> which other downloads might be related, and respond/block accordingly.
> I don't know which other products/tools use this feature.
> 
> Thanks,
> Tomer
> (from the Windows Defender Advanced Threat Protection team)

Thanks for chiming in. Do you have a link to the documentation, and/or can you answer about special URI schemes?
Flags: needinfo?(talpert)
* Forgot to change the test path for legacy downloads.
* Added a check for special URI schemes.

(In reply to :Paolo Amadini from comment #5)
> Per comment 1, this probably shouldn't be added for private downloads.

My first patch already included a check for private downloads. I even added a test to make sure that private downloads do not add HostUrl/ReferrerUrl. Please tell me in detail if it is insufficient.

> Also, it's likely that certain schemes should be excluded, like "data:", but
> also others. Can you check if certain schemes are excluded by Chrome and
> Edge?

Good catch. Chrome added "HostUrl=about:internet" for special URI schemes. Edge could not download files from data:/blob: URLs.🤔 So I followed the Chrome behavior.
Attachment #8999044 - Flags: review?(paolo.mozmail)
Comment on attachment 8999044 [details] [diff] [review]
Add ReferrerUrl and HostUrl to the Zone.Information stream

Review of attachment 8999044 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Masatoshi Kimura [:emk] from comment #7)
> My first patch already included a check for private downloads. I even added
> a test to make sure that private downloads do not add HostUrl/ReferrerUrl.
> Please tell me in detail if it is insufficient.

It's probably my eyesight to be insufficient :-) I skimmed over the patch and I genuinely didn't see that! I didn't have much time then, and obviously couldn't do a detailed review, but still wanted to give some initial general feedback to avoid you waiting for these few days.

I recommend stricter data validation, and I'd like to get another look after you've updated the patch, but with the changes below it should be good to land. I don't think we need to wait for the next cycle, for this kind of changes with operating system integration we get better testing on Beta than Nightly. If this causes any compatibility issues, we can still back it out from Beta, but given that Chrome has the same behavior we're probably good. Also, I expect this format to be compatible with earlier Windows version.

Thanks for volunteering on taking on this improvement!

::: toolkit/components/downloads/DownloadIntegration.jsm
@@ +500,5 @@
> +            let zoneId = "[ZoneTransfer]\r\nZoneId=" + zone + "\r\n";
> +            function isHttp(url) {
> +              const scheme = NetUtil.newURI(url).scheme;
> +              return scheme == "http" || scheme == "https";
> +            }

See the comment below, but this may also be sufficient, also less likely to throw:

url.startsWith("http:") || url.startsWith("https:")

We use this in a few other places in the codebase.

@@ +503,5 @@
> +              return scheme == "http" || scheme == "https";
> +            }
> +            if (!aDownload.source.isPrivate) {
> +              if (aDownload.source.referrer && isHttp(aDownload.source.referrer)) {
> +                zoneId += "ReferrerUrl=" + aDownload.source.referrer + "\r\n";

The URLs can be specified programmatically with the Downloads API, at least for the non-gUseLegacySaver case. To be on the safe side, we should probably sanitize at least newlines so that a programming error or missing validation on the side of the API consumers cannot result in writing arbitrary keys to the zone information stream.

We may actually call NetUtil.newURI and use it to check both isHttp and do validation, but we should still write the stream even if part of it is invalid. We may want a helper function that never fails and may return an empty string, like:

  let zoneId = "[ZoneTransfer]\r\nZoneId=" + zone + "\r\n" +
               this._zoneIdKey("ReferrerUrl", aDownload.source.referrer, false) +
               this._zoneIdKey("HostUrl", aDownload.source.url, true);

I think with the test functions you created we may be able to add a test for a referrerUrl with "\n" and/or "\r" in it. (We still want it even in case it throws upstream for other reasons.)

nit: you may already have thought of this, but keep the same relative order and capitalization of ReferrerUrl and HostUrl that Chrome uses, just to be stricter in writing data as a compatibility guarantee...

::: toolkit/components/downloads/test/unit/common_test_Download.js
@@ +360,5 @@
> +    {},
> +    { targetFile: longTargetFile },
> +    { sourceUrl: dataSourceUrl,
> +      zoneId: "[ZoneTransfer]\r\nZoneId=3\r\n" +
> +              "HostUrl=about:internet\r\n" },

Call this "expectedZoneId", and add it explicitly for the other cases, it makes the expected result more obvious even if it may seem a bit redundant. (You can alternatively go the other route of generating the expected result programmatically from the source options.)
Attachment #8999044 - Flags: review?(paolo.mozmail)
Also, it would be great if you could post a link to a try build that Tomer can test.
Attachment #8998858 - Attachment is obsolete: true
Awesome - thanks for doing this change!
I'd be happy to test a build with this change.

One more change:
It'd be great if you'd also write the HostIpAddress field (currently not included in the change).

Documentation:
I am not aware of any documentation for this... but I'll ask arround and update this thread if I get something.

Schemes:
All I can say is what schemes I see reported in our data (based on 1 month of data from millions of machines).
we currently see http, https, ftp, file, blob and about.
For about - we see only about:internet.
For blob - it's written only by Edge (and some other apps), but not by Chrome. Sometimes the blob URL is meaningless (e.g. some random guid), and in other cases it does carry some meaning - it depends on the implementaiton in the specific javascript. Still, you could go either way and either write such URLs or not, especially because they are not a common infection vector.
Flags: needinfo?(talpert)
(In reply to talpert from comment #10)
> we currently see http, https, ftp, file, blob and about.

Thanks! Sounds like we may want "ftp:" on the whitelist as well.

We have some additional logic for Mac which checks nested URIs, but I think it's not really necessary and can even be counterproductive here:

https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/toolkit/components/downloads/DownloadPlatform.cpp#228

We can file a separate bug for the HostIpAddress, but I don't know if we have the information.

> For blob - it's written only by Edge (and some other apps), but not by
> Chrome. Sometimes the blob URL is meaningless (e.g. some random guid), and
> in other cases it does carry some meaning

This reminds me of something that was recently discussed in a different bug. While Mozilla is not collecting any data here, we may still want to strip any username and passwords embedded in the URLs before we write them to the filesystem. We can't do much about query parameters, as they are opaque to us and may be necessary to reopen the original location of the download.

We may want to file a bug to have this done for Mac as well, possibly unifying the logic and providing sanitized URLs to the DownloadPlatform module.

Johann, do you have any opinion? It's probably worth looking at the rest of this bug as well. While no data is ever written for private downloads, I wonder if there are any privacy preferences that should control this behavior for non-private downloads as well?
Flags: needinfo?(jhofmann)
(In reply to :Paolo Amadini from comment #11)
> > For blob - it's written only by Edge (and some other apps), but not by
> > Chrome. Sometimes the blob URL is meaningless (e.g. some random guid), and
> > in other cases it does carry some meaning
> 
> This reminds me of something that was recently discussed in a different bug.
> While Mozilla is not collecting any data here, we may still want to strip
> any username and passwords embedded in the URLs before we write them to the
> filesystem. We can't do much about query parameters, as they are opaque to
> us and may be necessary to reopen the original location of the download.

I agree with this. If we're not doing this already we probably want to.

> Johann, do you have any opinion? It's probably worth looking at the rest of
> this bug as well. While no data is ever written for private downloads, I
> wonder if there are any privacy preferences that should control this
> behavior for non-private downloads as well?

Considering that after reading this bug I'm still not sure if I understand it, I wonder how we could surface preferences for this to the general user.

Something like

[x] Add additional metadata that helps Windows Defender identify and remove malicious downloads?

?

Looking at it like this I'd say it's enough to "expose" the pref in about:config to give real privacy nuts the chance to turn it off if they want to for some reason.

Tom, is this something Tor folks should be aware of?
Flags: needinfo?(jhofmann) → needinfo?(tom)
See also 1374027 which is the same thing but for MacOS

I added Arthur to the bug, but if we're not writing anything in PBM; then that is how Tor will behave (and how I think they want to behave.)
Flags: needinfo?(tom)
See Also: → 1374027
(In reply to Johann Hofmann [:johannh] from comment #12)
> Looking at it like this I'd say it's enough to "expose" the pref in
> about:config to give real privacy nuts the chance to turn it off if they
> want to for some reason.

Yes, I think we don't need a visible setting, but given the privacy implications an "about:config" setting would be acceptable in this case. I don't think it should impact the work on this bug, so it's fine but not required to add it here. When added, even just for Windows, the setting should have a simple name so it will eventually be reused for the other platforms, and a regression test.

Hi Masatoshi, are you planning on continuing to work on this? :-) Thanks!

Sorry for forgetting this, I'll update the patch and make a try run soon.

Comment on attachment 8999044 [details] [diff] [review]
Add ReferrerUrl and HostUrl to the Zone.Information stream

Review of attachment 8999044 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/downloads/DownloadIntegration.jsm
@@ +500,5 @@
> +            let zoneId = "[ZoneTransfer]\r\nZoneId=" + zone + "\r\n";
> +            function isHttp(url) {
> +              const scheme = NetUtil.newURI(url).scheme;
> +              return scheme == "http" || scheme == "https";
> +            }

I kept NetUtil.newURI for validation.

@@ +503,5 @@
> +              return scheme == "http" || scheme == "https";
> +            }
> +            if (!aDownload.source.isPrivate) {
> +              if (aDownload.source.referrer && isHttp(aDownload.source.referrer)) {
> +                zoneId += "ReferrerUrl=" + aDownload.source.referrer + "\r\n";

* NetUtil.newURI will automatically strip "\r", "\n", and "\t" (see 241788).
* Added _zoneIdKey() as suggested.
* Added an test case for "\r" and "\n".
* Yes, both Chrome and Edge will write ReferrerUrl first.

::: toolkit/components/downloads/test/unit/common_test_Download.js
@@ +360,5 @@
> +    {},
> +    { targetFile: longTargetFile },
> +    { sourceUrl: dataSourceUrl,
> +      zoneId: "[ZoneTransfer]\r\nZoneId=3\r\n" +
> +              "HostUrl=about:internet\r\n" },

Renamed to expectedZoneId and stopped using a default value for expectedZoneId.

(In reply to :Paolo Amadini from comment #11)

(In reply to talpert from comment #10)

we currently see http, https, ftp, file, blob and about.

Thanks! Sounds like we may want "ftp:" on the whitelist as well.

I added "ftp" to the whietlist.

Johann, do you have any opinion? It's probably worth looking at the rest of
this bug as well. While no data is ever written for private downloads, I
wonder if there are any privacy preferences that should control this
behavior for non-private downloads as well?

We used to have a pref, but I took it out in bug 952961. The OS-level setting will prevent other browsers from writing zone information, not just Firefox.

Pushed by VYV03354@nifty.ne.jp:
https://hg.mozilla.org/integration/autoland/rev/4d823462e5bd
Add ReferrerUrl and HostUrl to the Zone.Information stream. r=paolo
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
QA Whiteboard: [qa-67b-p2]
You need to log in before you can comment on or make changes to this bug.