Open Bug 1060067 Opened 11 years ago Updated 8 months ago

Saving an image with a data: URL causes gvfs to loop and kill the disk

Categories

(Toolkit :: Downloads API, defect, P5)

All
Linux
defect

Tracking

()

People

(Reporter: BenB, Unassigned)

References

Details

(Keywords: hang, regression)

Reproduction: According to https://bugzilla.gnome.org/show_bug.cgi?id=637095#c47 Ross Lagerwall [gvfs developer] 2014-08-26 14:32:06 PDT I have found a reproducer and the problem. Simply browse to http://rossl.org/junk/meta.html in Firefox, right-click and save the image to your Downloads folder. This should trigger gvfsd-metadata constant disk usage. The problem is that when an application tries to save a large key-value pair, larger than the size of the journal (32*1024), it triggers the journal to be flushed to make space for the entry and the operation is then retried, but it never fits, and the loop continues forever. As for why this is triggered by Firefox, Firefox saves the download URI in metadata::download-uri. If the download URI is a data URL (like in the above link), it can easily be > 32kb and so trigger the issue. I presume a similar thing happens with Thunderbird. Severity: Critical, because hangs are by definition critical. Furthermore, the result is that vgs creates and deletes millions of files, which is killing the disk, esp. SSDs.
Blocks: 797349
Keywords: regression
I'm trying to fix with with 1. a simple exception for data: URLs (because they never make sense to save) *and* 2. cutting URLs longer than a 2048 chars (2KB) (because they are causing the overflow) I can't find the place in the code where the URL is being saved in the download file's metadata. I'm looking at toolkit/components/downloads/*.cpp and toolkit/system/gnome/*.cpp and extensions/gnomevfs/nsGnomeVFSProtocolHandler.cpp and extensions/gio/nsGIOProtocolHandler.cpp , but I can't find anything saving metadata to files. Could somebody please help me out, where I find the code, either in Mozilla's GVFS component or in download manager code?
(In reply to Ben Bucksch (:BenB) from comment #1) > 1. a simple exception for data: URLs (because they never make sense to save) Sounds good, but maybe we should just limit this to "http", "https", and "ftp"? It doesn't seem the file manager would know how to open any other URI anyways. Also, for example, annotating that something was re-saved from a "file" URI is probably not very useful. > 2. cutting URLs longer than a 2048 chars (2KB) (because they are causing the > overflow) Which parts should we truncate? I don't have a good answer, but I think one of these would work: * Hash and query * Hash, query, and path * Everything (don't store) I don't think we should do fixed-length truncation, both because of possible encoding issues and possibly sending unexpected query values to the server. > Could somebody please help me out, where I find the code, either in > Mozilla's GVFS component or in download manager code? I believe the currently active version is in the "toolkit/components/jsdownloads" folder. There is legacy code in the "toolkit/components/downloads" folder, but it's not used in Firefox anymore. Thanks for working on this! http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/src/DownloadPlatform.cpp#106
> maybe we should just limit this to "http", "https", and "ftp"? We could think towards this, but we definitely need more than just these. It's very common to store files from emails, and these have mailbox: or imap: or similar URLs, depending on implementation. They are also important to store. > doesn't seem the file manager would know how to open any other URI anyways The purpose is not just re-downloads, but security treatment. Windows warns me, before launching an EXE that I had downloaded from the Internet, that this is dangerous, and whether I'm sure to do that. Lots of malware spreads via email, so we need these URLs. I'd rather err on the side of storing more types of URLs than less, so I'm still tending towards a blacklist rather than whitelist. I agree that file:// URLs don't belong in there, either. > Which parts should we truncate? Everything. It doesn't matter where the characters come from, but we just can't save more than a reasonable size, because the target doesn't support it. The hard limit, where things start to fail, is 32KB - at least in GNOME, no idea about other platforms. But I rather play safe than sorry, and I think URLs longer than 2 KB are crazy anyway and/or they are probably just storing data stead of an actual location. > legacy code in the "toolkit/components/downloads" folder, but it's not used in Firefox anymore. meh! :( A note in README.txt in the code would do good. > http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/src/DownloadPlatform.cpp#106 Thanks! This looks like a hit.
(In reply to Ben Bucksch (:BenB) from comment #3) > The purpose is not just re-downloads, but security treatment. Windows warns > me, before launching an EXE that I had downloaded from the Internet Note that Windows uses a different mechanism, and I don't know what Linux distributions can do exactly. I assume there are actually no special protections in place, in fact no security warning is given on my Ubuntu-based system, but if this is relied upon for security on other systems, then we cannot just omit the information, and we should make a more informed decision. > I'm still tending towards a blacklist rather than whitelist. Using a whitelist rather than a blacklist is more related to what we want to do with future, unknown type of URIs than to which known URIs we think are useful now. Since, as you mentioned, this may be related to security and future URIs have unknown security properties, I'm more for a whitelist. > > Which parts should we truncate? > > Everything. Sounds good assuming the URI is not used for security. But if this is used to ask for open confirmation on a mainstream Linux distribution, we should at least provide some indication that it has been downloaded from an unknown location on the Internet (maybe using a dummy URL "http://domain.invalid/"). However, I'd like a real-life based, manual test case involving at least one Linux file manager on a popular distribution, before considering writing the dummy URL. > > legacy code in the "toolkit/components/downloads" folder, but it's not used in Firefox anymore. > > meh! :( > A note in README.txt in the code would do good. We're working towards removing this entirely (bug 851471). Thunderbird may continue to use the old code in comm-central or migrate to the new code (bug 907732).
(In reply to :Paolo Amadini from comment #4) > (In reply to Ben Bucksch (:BenB) from comment #3) > > The purpose is not just re-downloads, but security treatment. Windows warns > > me, before launching an EXE that I had downloaded from the Internet > > Note that Windows uses a different mechanism, and I don't know what Linux > distributions can do exactly. Downloaded files shouldn't have the execution bit set by default. So, they are not executable until the user explicitly makes them executable. If some application doesn't behave like that, it is a bug there, not in the system itself.

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --

The severity field is not set for this bug.
:mak, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(mak)

This is apparently still relevant https://searchfox.org/mozilla-central/rev/dc8348b3730c0d29dafd01c653d9151eaa9bc30f/toolkit/components/downloads/DownloadPlatform.cpp#173 though it was inactive for 9 years, that looks like a low priority.

Severity: -- → S3
Flags: needinfo?(mak)
Priority: -- → P5

I've performed a review of similar mechanisms on other platforms, and they all save both the "referring page" and the "data uri" in order to avoid problems with data: and blob: URIs.

Windows stores:

WindowsQuarantineInfo:
ZoneID int // 3 = Internet
DataURL string // HostUrl
ReferrerURL string // ReferrerUrl

Apple stores:

AppleQuarantineInfo:
QuarantineType AppleQuarantineType // (enum as string) kLSQuarantineTypeKey
DataURL string // kLSQuarantineDataURLKey
ReferrerURL string // kLSQuarantineOriginURLKey
AgentBundleIdentifier string //kLSQuarantineAgentBundleIdentifierKey
AgentName string // kLSQuarantineAgentNameKey
TimeStamp Time // (TODO look up timestamp format) kLSQuarantineTimeStampKey
plus an OS-privileged flag indicating whether the user has cleared the warning.

Chrome on Linux used to store:

kSourceURLXattrName[] = "user.xdg.origin.url";
kReferrerURLXattrName[] = "user.xdg.referrer.url";
but this was deleted because no security tools were using it: https://chromium-review.googlesource.com/c/chromium/src/+/1407441
And with no checking of excessive data URI length.

Capping excessive data: URIs to just "data:" and storing the page referrer instead would work, but there's no established metadata key to do that with.

I would like to advocate that both the download URL and referrer URL always both be stored if they are stored at all, due to the prevalence of browser extensions needing to use either data: or blob: in order to download anything useful.

You need to log in before you can comment on or make changes to this bug.