Open Bug 665531 Opened 10 years ago Updated 1 year ago

[Linux] Store that file was downloaded from the Internet (Extended Attribute user.xdg.origin.url)

Categories

(Toolkit :: Downloads API, enhancement)

All
Linux
enhancement
Not set
normal

Tracking

()

People

(Reporter: BenB, Unassigned)

Details

(Keywords: privacy-review-needed)

On Windows, we let the OS know that this file is downloaded from the Internet. This causes Windows Explorer warn users when they start a downloaded executable.

On Linux, we have Extended Attributes in the filesystem. If available, we can set the attribute "user.xdg.origin.url", which is defined by FreeDesktop as
"Set on a file downloaded from a url. Its value should equal the url it was downloaded from." <http://www.freedesktop.org/wiki/CommonExtendedAttributes>

This may or may not cause the file manager (whatever the user uses, e.g. GNOME Nautilus, KDE Dolphin etc.) to issue a warning, that's up to the file manager. It can also offer other things, e.g. "redownload" or whatever.

I personally would file it useful, because I often save files for personal archive and usage, but then later want to share it with friends via IRC or Web. This is true for particularly interesting articles, comics, and maybe programs. Instead of uploading my copy of the file, and potentially violating copyright with that, I'd prefer to give my friends the original URL. But to find it later, I need to both save the file and bookmark it, and then later reassociate them (given that they are in different stores: filesystem and browser bookmarks). The "user.xdg.origin.url" attribute, automatically saved by Firefox, would solve this problem for me. I could just check where I got the file from, and share that URL.

(In fact, if some software implements a "Share" function, it could recognize this and share the URL instead. "Share" options seem to pop up everywhere nowadays.)

Ben
Blocks: 665567
No longer blocks: 665567
I'm poking around the source to see where this can be added, but not going to assign myself this bug as I'm unfamiliar with the Firefox source and may or may not actually get around to implementing it. Therefore, if someone else wants to take this, go right ahead.
Any progress on this? I can see that curl, wget and chromium are all supporting this
This issue has been raised on Debian mailing lists, and I consider it a security+privacy vulnerability in Chromium.  Thus, it should NOT be included in Firefox.

The main requested functionality (warning about files that come from the Internet) could be handled by a boolean; that has no issues I can think of.
(In reply to Adam Borowski from comment #3)
> I consider it a security+privacy vulnerability in Chromium.

Could you elaborate, please? I would love to have this functionality myself when I download a file I often forget where I get it from. What’s so wrong with having this record? Also, it very well can be optional/turned-off-per-default, but I really don’t see the need.
Adam, a boolean would fail the goals stats in the initial description.

Please note that the attribute is not part of the file itself, but stored in the filesystem. If you pass on the file to somebody else, the attribute would not be there. So, I cannot see the privacy issue, given that the information stays on your local computer.
Chromium on Windows doesn't suffer from this problem:

ꜰɪʟᴇ: user.Zone.Identifier: [ZoneTransfer]
ZoneId=3

Ie, it saves only whether the file came from your computer, local network, or the Interwebs at large.

> If you pass on the file to somebody else, the attribute would not be there.

Depends on the way you use to copy: cp and rsync need an option (but they don't copy any other metadata without being told to do so), mv always preserves xattrs even when moving between filesystems.

And if you save directly to an USB stick or a remote filesystem, the xattr will be there.

(Note: tmpfs supports xattrs but specifically denies the user. namespace to avoid this hole as /tmp is world-readable.)

> So, I cannot see the privacy issue, given that the information stays on your local computer.

Laptops and phones get seized/imaged/stolen quite a lot.  And this type of metadata is really well hidden from the user (even a bunch of operating system developers, some of which do security and/or filesystems, were surprised by this, which means an ordinary user has no way whatsoever to find out this is going on).

Even worse, Chromium does so even in that fake "incognito mode", and this metadata is not cleared when you clear history.

curl does this right (as an opt-in option), chromium and wget have a privacy hole.
(In reply to Adam Borowski from comment #6)
> > So, I cannot see the privacy issue, given that the information stays on your local computer.
> 
> Laptops and phones get seized/imaged/stolen quite a lot.  And this type of
> metadata is really well hidden from the user (even a bunch of operating
> system developers, some of which do security and/or filesystems, were
> surprised by this, which means an ordinary user has no way whatsoever to
> find out this is going on).

If you are this paranoid (or even if you actually really do need this level of privacy), then I guess you don’t rely on defaults, do you?

> curl does this right (as an opt-in option), chromium and wget have a privacy
> hole.

As I said, I have no problem with opt-in option.
> If you are this paranoid (or even if you actually really do need this level of privacy)

I'd put it more on the same boat as folks using the incognito mode, or using "Clear recent history".  In which case, indeed, file URLs are less important, but can be sensitive even for the same file content.

But, that's not the point: this metadata is exceptionally well hidden.  A regular person knows about browser history, might or might not know about EXIF, but has no way to even suspect there's an obscure filesystem feature that can be used to smuggle information as an unprivileged user.

Principle of least surprise.  Because when history _does_ turn out to be sensitive, you have no way to tell the user you saved it in a hidden place.

> As I said, I have no problem with opt-in option.

So, what about this: by default, save only that "ZoneId", for consistency with Chrome/Windows and MSIE, but no full URL without a default-off option.
If your own computer is no longer secure, you have bigger problems than this attribute. Your browser logs all web requests in the history file, and that is saved to disk. Even if you clean the history or delete the file, the remnants of the blocks of that file may well be still on your physical disk somewhere. If you use an encrypted file system, then you don't need to worry about this bug, either. Firefox presumes your own computer is safe - because all odds are off, if it isn't.

So, any argument about your own filesystem is moot.

As mentioned, sending a file by Internet, e.g. by email, xchat or HTTP upload, won't copy the attributes.

I don't see a case where you'd trust the recipient with the content of a file, but not where you got the file from. If you're this worried about privacy, then you better know your commandline tools and its flags and know how to see attributes and avoid copying them.

As mentioned above, this feature is useful for normal users, and can even increase security, because downstream applications can consider where the file comes from.

And I don't need to store all downloads twice, once as file and once as bookmark.
We should probably have a pref to turn this behavior off. Default should be on.
> Your browser logs all web requests in the history file, and that is saved to disk.

That's why there's "clear history", Incognito Mode, and so on.

> Even if you clean the history or delete the file, the remnants of the blocks of that file may well be still on your physical disk somewhere.

On a modern SSD, while it's near impossible to ensure a given block is indeed physically gone, even in normal usage with no extra steps, the chances for the old block being erased away are so high that even forensic people for high-value targets usually don't bother trying.  And even if they do, it involves asking the manufacturer for special firmware and tools to access raw flash chips, with a very miniscule chance of success.  Some extra churn (which happens naturally if you use the disk) makes that chance zero.

Certain filesystems might preserve tails of a block when a file shrinks (truncate), but that's not a real concern with the way Firefox accesses the database.

> As mentioned, sending a file by Internet, e.g. by email, xchat or HTTP upload, won't copy the attributes.

But saving to an USB stick, a phone or a laptop do preserve xattrs (depending on used filesystem and, if moved after the download, program used to copy -- mv vs cp sans options, etc).

> If you're this worried about privacy, then you better know your commandline tools and its flags

I'm a fricking Debian Developer who's a regular on filesystem related mailing lists and dabbles with filesystem-related kernel patches from time to time, and I didn't know about this user.-namespace smuggling.  This is not anything like common knowledge -- EXIF is bad enough, despite coming within the file.

> As mentioned above, this feature is useful for normal users

None of them have a chance of knowing this.

> downstream applications can consider where the file comes from.

That's ZoneId which has no privacy hole.

(In reply to Adam Borowski from comment #6)

Chromium on Windows doesn't suffer from this problem:

ꜰɪʟᴇ: user.Zone.Identifier: [ZoneTransfer]
ZoneId=3

That's not the behaviour in the current stable version of Chrome for Windows (73.0.3683.86).

If you download the following file in normal mode:

https://pypi.org/project/canadian-ham-exam/#files

the contents of the NTFS alternate data stream will be:

[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://files.pythonhosted.org/
HostUrl=https://files.pythonhosted.org/packages/d9/0a/d3d1290794660b7e49efe1339a3dc57e3c9b82ac7641595a485691f51ee2/canadian-ham-exam-0.2.0.tar.gz

as opposed to Incognito mode where it will be:

[ZoneTransfer]
ZoneId=3
HostUrl=about:internet

Also, note that Chromium has now removed this metadata from Linux builds: https://chromium.googlesource.com/chromium/src/+/a9b4fb70b4318b220deee0da7b1693d16b8ed071

Fixed in wget 1.20.1: it no longer saves this data by default, and even upon explicit request it stores only scheme/host/port for Referer, and cut down URL. No CVE was issued but it probably should have been.

Good to hear it got fixed in Chromium as well.

To explain my comment about CVE, so it doesn't sound like chicken little: wget knew that a part of the URL was username and password because it added it itself -- thus the severity is obvious. This differs from wget or Chromium saving an URL entirely providen by the user as the software has no way of knowing if there's indeed auth data inside (there still often is).

You need to log in before you can comment on or make changes to this bug.