Saving an image with a data: URL causes gvfs to loop and kill the disk

NEW
Unassigned

Status

()

Toolkit
Downloads API
--
critical
3 years ago
2 years ago

People

(Reporter: BenB, Unassigned)

Tracking

({hang, regression})

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Reproduction:
According to https://bugzilla.gnome.org/show_bug.cgi?id=637095#c47

Ross Lagerwall [gvfs developer] 2014-08-26 14:32:06 PDT

I have found a reproducer and the problem. Simply browse to
http://rossl.org/junk/meta.html in Firefox, right-click and save the image to
your Downloads folder. This should trigger gvfsd-metadata constant disk usage.

The problem is that when an application tries to save a large key-value pair,
larger than the size of the journal (32*1024), it triggers the journal to be
flushed to make space for the entry and the operation is then retried, but it
never fits, and the loop continues forever.

As for why this is triggered by Firefox, Firefox saves the download URI in
metadata::download-uri. If the download URI is a data URL (like in the above
link), it can easily be > 32kb and so trigger the issue. I presume a similar
thing happens with Thunderbird.

Severity: Critical, because hangs are by definition critical.
Furthermore, the result is that vgs creates and deletes millions of files, which is killing the disk, esp. SSDs.
Blocks: 797349
Keywords: regression
(Reporter)

Comment 1

3 years ago
I'm trying to fix with with
1. a simple exception for data: URLs (because they never make sense to save) *and*
2. cutting URLs longer than a 2048 chars (2KB) (because they are causing the overflow)

I can't find the place in the code where the URL is being saved in the download file's metadata.
I'm looking at toolkit/components/downloads/*.cpp and toolkit/system/gnome/*.cpp and extensions/gnomevfs/nsGnomeVFSProtocolHandler.cpp and extensions/gio/nsGIOProtocolHandler.cpp , but I can't find anything saving metadata to files.
Could somebody please help me out, where I find the code, either in Mozilla's GVFS component or in download manager code?

Comment 2

3 years ago
(In reply to Ben Bucksch (:BenB) from comment #1)
> 1. a simple exception for data: URLs (because they never make sense to save)

Sounds good, but maybe we should just limit this to "http", "https", and "ftp"? It doesn't seem the file manager would know how to open any other URI anyways. Also, for example, annotating that something was re-saved from a "file" URI is probably not very useful.

> 2. cutting URLs longer than a 2048 chars (2KB) (because they are causing the
> overflow)

Which parts should we truncate? I don't have a good answer, but I think one of these would work:
* Hash and query
* Hash, query, and path
* Everything (don't store)

I don't think we should do fixed-length truncation, both because of possible encoding issues and possibly sending unexpected query values to the server.

> Could somebody please help me out, where I find the code, either in
> Mozilla's GVFS component or in download manager code?

I believe the currently active version is in the "toolkit/components/jsdownloads" folder. There is legacy code in the "toolkit/components/downloads" folder, but it's not used in Firefox anymore.

Thanks for working on this!
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/src/DownloadPlatform.cpp#106
(Reporter)

Comment 3

3 years ago
> maybe we should just limit this to "http", "https", and "ftp"?

We could think towards this, but we definitely need more than just these. It's very common to store files from emails, and these have mailbox: or imap: or similar URLs, depending on implementation. They are also important to store.

> doesn't seem the file manager would know how to open any other URI anyways

The purpose is not just re-downloads, but security treatment. Windows warns me, before launching an EXE that I had downloaded from the Internet, that this is dangerous, and whether I'm sure to do that. Lots of malware spreads via email, so we need these URLs. I'd rather err on the side of storing more types of URLs than less, so I'm still tending towards a blacklist rather than whitelist.

I agree that file:// URLs don't belong in there, either.

> Which parts should we truncate?

Everything. It doesn't matter where the characters come from, but we just can't save more than a reasonable size, because the target doesn't support it. The hard limit, where things start to fail, is 32KB - at least in GNOME, no idea about other platforms. But I rather play safe than sorry, and I think URLs longer than 2 KB are crazy anyway and/or they are probably just storing data stead of an actual location.

> legacy code in the "toolkit/components/downloads" folder, but it's not used in Firefox anymore.

meh! :(
A note in README.txt in the code would do good.

> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/src/DownloadPlatform.cpp#106

Thanks! This looks like a hit.

Comment 4

3 years ago
(In reply to Ben Bucksch (:BenB) from comment #3) 
> The purpose is not just re-downloads, but security treatment. Windows warns
> me, before launching an EXE that I had downloaded from the Internet

Note that Windows uses a different mechanism, and I don't know what Linux distributions can do exactly.

I assume there are actually no special protections in place, in fact no security warning is given on my Ubuntu-based system, but if this is relied upon for security on other systems, then we cannot just omit the information, and we should make a more informed decision.

> I'm still tending towards a blacklist rather than whitelist.

Using a whitelist rather than a blacklist is more related to what we want to do with future, unknown type of URIs than to which known URIs we think are useful now. Since, as you mentioned, this may be related to security and future URIs have unknown security properties, I'm more for a whitelist.

> > Which parts should we truncate?
> 
> Everything.

Sounds good assuming the URI is not used for security. But if this is used to ask for open confirmation on a mainstream Linux distribution, we should at least provide some indication that it has been downloaded from an unknown location on the Internet (maybe using a dummy URL "http://domain.invalid/").

However, I'd like a real-life based, manual test case involving at least one Linux file manager on a popular distribution, before considering writing the dummy URL.

> > legacy code in the "toolkit/components/downloads" folder, but it's not used in Firefox anymore.
> 
> meh! :(
> A note in README.txt in the code would do good.

We're working towards removing this entirely (bug 851471). Thunderbird may continue to use the old code in comm-central or migrate to the new code (bug 907732).

Comment 5

2 years ago
(In reply to :Paolo Amadini from comment #4)
> (In reply to Ben Bucksch (:BenB) from comment #3) 
> > The purpose is not just re-downloads, but security treatment. Windows warns
> > me, before launching an EXE that I had downloaded from the Internet
> 
> Note that Windows uses a different mechanism, and I don't know what Linux
> distributions can do exactly.

Downloaded files shouldn't have the execution bit set by default. So, they are not executable until the user explicitly makes them executable. If some application doesn't behave like that, it is a bug there, not in the system itself.
You need to log in before you can comment on or make changes to this bug.