Open Bug 1254327 Opened 8 years ago Updated 9 months ago

browser.downloads.download() doesn't handle redirects

Categories

(WebExtensions :: Request Handling, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: aswan, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [downloads] triaged)

Attachments

(2 files, 1 obsolete file)

When we start a download with browser.downloads.download() for a URL that generates an HTTP redirect, ideally we would use the redirect target to choose the default name for the download, but the API exposed by Downloads.jsm doesn't allow for this: we have to provide the URL and the target path before the download is started, but we don't get the redirect until trying the download

As a concrete example, the URL https://download.mozilla.org/?product=firefox-44.0.2-SSL&os=osx&lang=en-US redirects to https://download-installer.cdn.mozilla.net/pub/firefox/releases/44.0.2/mac/en-US/Firefox%2044.0.2.dmg.  The original URL has no useful path information for deriving mime type, file icon, etc., but the redirected one does.
Paolo, any thoughts on how to approach this one?
Flags: needinfo?(paolo.mozmail)
Talked about this with Andrew, we'll have to extend the Downloads.jsm API to opt in to having a dynamic DownloadTarget whose file name is not assigned prior to the download starting. If done right this will also help in unifying the logic for handling download file names, that is currently scattered in various places. We'll have to carefully evaluate how to avoid breaking the assumptions made by the front-end about the target file name being static.
Flags: needinfo?(paolo.mozmail)
Whiteboard: [downloads]
Assignee: nobody → aswan
Paolo, here's a quick and dirty attempt to do what we discussed a few weeks ago.
I had to break the model used widely within DownloadCore.jsm that all the
objects (Download, DownloadSaver, DownloadTarget, etc) are simple serializable
data structures, but I didn't rename methods like {to,from}Serializable.

This works for me locally in a very simple test case, I would obviously clean
this up and add thorough tests before landing anything, just wanted to get some
feedback before getting too deep into that work.

MozReview-Commit-ID: CH9quj5fDpf
Attachment #8734985 - Flags: feedback?(paolo.mozmail)
Comment on attachment 8734985 [details] [diff] [review]
prototype changes for bug 1254327

> I had to break the model used widely within DownloadCore.jsm that all the
> objects (Download, DownloadSaver, DownloadTarget, etc) are simple
> serializable
> data structures, but I didn't rename methods like {to,from}Serializable.

That's what I though as well - downloads for which the final target file name has not been determined would not be serialized across sessions. It's not incompatible with the Downloads.jsm model, that has a few other non-serializable download cases as well.

> This works for me locally in a very simple test case, I would obviously clean
> this up and add thorough tests before landing anything, just wanted to get
> some
> feedback before getting too deep into that work.

I won't be able to provide thorough feedback until at least next week. For now what I can say is that I think it's fine for this to be limited to DownloadCopySaver downloads initiated by Downloads.jsm, which will not be added to a DownloadList until the filename has been determined. We'll have to put checks in place on DownloadList.add and the other DownloadSaver.execute implementations to ensure that both conditions are true.

A different case would exist for WebExtensions that want to set up a file name callback for downloads not initiated by chrome.downloads. It will be difficult to do that for DownloadLegacy download anyways, so we can do that in a different bug if needed.

I'm not sure what's the best approach here - we'll need multiple callbacks, but anything that requires the callbacks to be registered when a download is added on a DownloadList would probably race with the addition itself, as the download may be already finished when that happens.

Maybe we'll just end up adding a global DownloadIntegration callback - see bug 899013 comment 2 for something the WebExtensions code might use to register a global integration override, once the bug lands.
Attachment #8734985 - Flags: feedback?(paolo.mozmail) → feedback+
Thanks for the feedback!  I'll clean everything up and hopefully have something on mozreview in the next day or two.

(In reply to :Paolo Amadini from comment #4)
> A different case would exist for WebExtensions that want to set up a file
> name callback for downloads not initiated by chrome.downloads. It will be
> difficult to do that for DownloadLegacy download anyways, so we can do that
> in a different bug if needed.

Yeah, we have bug 1245652 for that, I don't expect that to be done in 48.
We can revisit this part of the discussion when we get around to working on that bug...
Attachment #8734985 - Attachment is obsolete: true
Attachment #8736121 - Flags: feedback?(mwein)
Attachment #8736121 - Flags: feedback?(lgreco)
Comment on attachment 8736121 [details]
MozReview Request: Bug 1254327 (Part 2) Use filename callback from chrome.downloads.download() r?kmag

https://reviewboard.mozilla.org/r/43113/#review39895

::: toolkit/components/extensions/ext-downloads.js:440
(Diff revision 1)
> -          let target;
> +            let target;
> -          if (options.filename) {
> +            if (options.filename) {
> -            target = OS.Path.join(downloadsDir, options.filename);
> +              target = OS.Path.join(downloadsDir, options.filename);
> -          } else {
> +            } else {
> -            let uri = NetUtil.newURI(options.url).QueryInterface(Ci.nsIURL);
> -            target = OS.Path.join(downloadsDir, uri.fileName);
> +              let uri = request.URI;
> +              uri.QueryInterface(Ci.nsIURL);

`let uri = request.URI.QueryInterface(Ci.nsIURL)`

::: toolkit/components/extensions/ext-downloads.js:443
(Diff revision 1)
> -          } else {
> +            } else {
> -            let uri = NetUtil.newURI(options.url).QueryInterface(Ci.nsIURL);
> -            target = OS.Path.join(downloadsDir, uri.fileName);
> +              let uri = request.URI;
> +              uri.QueryInterface(Ci.nsIURL);
> +              let filename = uri.fileName;
> +              if (filename == "") {
> +                filename = "download";

How about `let filename = uri.fileName || "download"`?

::: toolkit/components/extensions/ext-downloads.js
(Diff revision 1)
> -          }
> +            }
>  
> -          // This has a race, something else could come along and create
> +            let promise = OS.File.exists(target).then(exists => {
> -          // the file between this test and them time the download code
> -          // creates the target file.  But we can't easily fix it without
> -          // modifying DownloadCore so we live with it for now.

I think this comment is still relavent.

::: toolkit/components/extensions/ext-downloads.js:482
(Diff revision 1)
> -            download.start();
> +              download.start();
> +              return download;
> +            });
>  
> +        return Promise.all([downloadPromise, targetPromise, DownloadMap.getDownloadList()])
> +          .then(([download, unused, list]) => {

I find `unused` a bit confusing. I'd rather omit that element, which is probably clearest if you move `targetPromise` to the end of the Promise.all array.
Attachment #8736121 - Flags: review?(kmaglione+bmo) → review+
https://reviewboard.mozilla.org/r/43113/#review39895

> I find `unused` a bit confusing. I'd rather omit that element, which is probably clearest if you move `targetPromise` to the end of the Promise.all array.

I found this a little confusing at first, too.  I also don't really understand what `list` refers to here.  Is it a list of already downloaded items? If so, could you consider renaming it to `downloadedList`?
https://reviewboard.mozilla.org/r/43113/#review39895

> I think this comment is still relavent.

Code in DownloadCore which is chained to the resolution of this promise moves the downloading file into the provided name, so the race is only with something outside of the browser and the window is very short.  Its still a race, but at this point it hardly seems worth mentioning (and I don't think it would be fixable without actually opening the file here and passing the stream or whatever objects represents the open file back to the download code, which seems like something we're unlikely to do if fixing this race is the only reason).  But if you think its important, I can put it back...

> I found this a little confusing at first, too.  I also don't really understand what `list` refers to here.  Is it a list of already downloaded items? If so, could you consider renaming it to `downloadedList`?

mattw: see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all, list is the third item in the array so it is the resolution of the third thing passed to Promise.all(), ie the result of DownloadMap.getDownloadList().
I'm surprised to hear that the name unused is confusing.  My instinct is that having different numbers of elements in the array passed to Promise.all() and the destructured array that receives its result is more confusing than explicitly labeling that one of the results is not used.
https://reviewboard.mozilla.org/r/43113/#review39895

> Code in DownloadCore which is chained to the resolution of this promise moves the downloading file into the provided name, so the race is only with something outside of the browser and the window is very short.  Its still a race, but at this point it hardly seems worth mentioning (and I don't think it would be fixable without actually opening the file here and passing the stream or whatever objects represents the open file back to the download code, which seems like something we're unlikely to do if fixing this race is the only reason).  But if you think its important, I can put it back...

What about cases like:

    for (let i = 0; i < 3; i++)
      browser.downloads.download({url: `http://example.com/archives-${i}/photos.zip`,
                                  conflictAction: "uniquify"});

Three downloads start, three calls to `createTarget` happen on the same
thread, at the same time, three requests go the the OS.File worker to check
whether photos.zip exists. The worker checks once, dispatches a message to the
main thread that it doesn't, immediately checks again for the second request,
and immediately replies that the file still doesn't exist, and then does the
same again.

That's probably not incredibly likely to happen, but it's not especially
unlikely either.
https://reviewboard.mozilla.org/r/43113/#review39895

> mattw: see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all, list is the third item in the array so it is the resolution of the third thing passed to Promise.all(), ie the result of DownloadMap.getDownloadList().
> I'm surprised to hear that the name unused is confusing.  My instinct is that having different numbers of elements in the array passed to Promise.all() and the destructured array that receives its result is more confusing than explicitly labeling that one of the results is not used.

I should've been more clear. I think that using the variable name `list` could be more informative.  It tells me that DownloadMap.getDownloadList() returns a list, but I think it could be better if it gave more information about what the list contains.

For the .then(...) call, you could also do something like .then(([download, /\* unused \*/, list]) => { ... }).
Attachment #8736121 - Flags: feedback?(mwein) → feedback+
Andrew, congratulations for opening your first can of worms branded Firefox Source Code! Hope they're tasty, but if not, at least they're said to prevent headaches later.

Seriously, as I mentioned before, there is more work to do than it may appear. I've looked at this from different angles and in all cases I don't see real shortcuts. Conversely, I see lots of pitfalls where taking the short path might easily backfire, and eventually be more work for both my team and yours.

One pretty obvious example (and we may have mentioned this) is that taking any raw filename from a remote source to use locally is a security hazard - you risk writing to Alternate Data Streams, using wrong extensions, or other nasty things, and in the best case you'll get weirdly encoded names. I've not seen a release blocker bug for this filed in WebExtensions yet but there should be. My point is that we may be able to factor this logic to some common place first, so it's trivial to use in WebExtensions once it's time.

There are other examples that are not obvious at all. The patches on this bug (a really good start) helped me remember some of them. A more detailed review will follow shortly. Future reviews should be faster!

To be clear, I'm not saying we should do all the work immediately, but we must have it planned and the dependencies made explicit. This will allow us to do most of the changes only once, in the most efficient way, and if we then choose to cut some parts, we'll understand exactly what will be affected.

The short version of the story is that, one way or the other, we have to rebuild the logic from nsExternalHelperAppService to make this work at all. The upside is that if we do this right, then we'll substantially lower the cost of taking out the legacy code from nsExternalHelperAppService and getting rid of DownloadLegacySaver.
Comment on attachment 8736120 [details]
MozReview Request: Bug 1254327 (Part 1) Add filename callback to DownloadTarget r?paolo

https://reviewboard.mozilla.org/r/43111/#review40579

When filing lots of bugs with the same component and dependencies, the "remember values as a bookmarkable template" button is very handy. I've used this feature a lot. Yes, I mentioned this for a reason ;-)

Thanks for your patience, I'm sure new things will come up but I hope I have identified most of the core issues that may block this bug.

Feel free to file more bugs than those I requested here if you think it's needed, and structure them in the way it makes more sense. In any case I suggest moving the code on this bug to new bugs, or at least the Downloads.jsm parts to "Toolkit :: Download Manager" for clarity.

Let me know here or on other bugs if you have any questions!

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1352
(Diff revision 1)
>    /**
>     * String containing the path of the target file.
>     */
>    path: null,

We shoudln't overload "path" to mean different things over time (in this case for providing a download directory to use by the name logic). My first thought is that this should be null if we don't have a finalized target path.

Should this be able to transition from null to a different path? Probably so. It should also fire a change notification for those consumers who can handle it. Do we need to fire the notification now? Maybe not, but file a bug if we don't implement it now. The notification needs a specific test.

Should this be able to transition from one value to the other? The chrome.downloads documentation doesn't say anything, looks like the path can change if a download is in progress. But again, this might not be the case in practice. File a bug to support this case later - we'll WONTFIX it if we figure out we don't need this ability.

In any case, the new expectations of this property should be documented on the property itself (based on what we implement but also what we don't implement).

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1357
(Diff revision 1)
>    /**
>     * String containing the path of the ".part" file containing the data
>     * downloaded so far, or null to disable the use of a ".part" file to keep
>     * partially downloaded data.
>     */
>    partFilePath: null,

Just a thought, maybe we should have a caller-provided temporaryFilePath property that is required if "path" is null.

What happens if the download is canceled or fails before the filename is determined? We need a test for this whatever the answer.

We need a test to ensure that when tryToKeepPartialData is true, we create a download that can be resumed, even if partFilePath is initially null when the download is created.

If the target path has not been determined, it's quite possible we don't want to keep the temporary file around for resuming, so the download should not be marked as resumable during that time frame. We need a test for this also.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1364
(Diff revision 1)
>    /**
>     * Indicates whether the target file exists.
>     *
>     * This is a dynamic property updated when the download finishes or when the
>     * "refresh" method of the Download object is called. It can be used by the
>     * front-end to reduce I/O compared to checking the target file directly.
>     */
>    exists: false,

Document how "exists" works for non-determined filenames. Test.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1373
(Diff revision 1)
>    /**
>     * Size in bytes of the target file, or zero if the download has not finished.
>     *

What about non-determined filenames? Maybe this should be zero as well. Check the code that updates this property. Test.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1399
(Diff revision 1)
> +   * (and optional partFilePath).  If non-null, should be a callable
> +   * that takes an nsIRequest and returns a promise that resolves with
> +   * the target filename.  The callback will be invoked upon getting

I'm a bit undecided on how much we should isolate this from the core of the Downloads.jsm code.

Until now, DownloadCore.jsm hasn't dealt with any path modification, and this is good for separation of concerns. If we keep it this way, the return value should include the target path and the partFilePath separately - the ".part" file could be stored in a different directory or drive.

If we don't keep it this way, we should have the function return a suggested file name from which we can derive the relevant paths, which we may also valiadte. I'm thinking this might make the logic less flexible, though. We might consider having two different overridable callbacks, onDeterminePaths and an onDetermineFilename, where a default implementation for onDeterminePaths invokes onDetermineFilename.

The provided argument now is aRequest, but this might not be all that is needed to choose a file name. Some downloads may not even have a network request, for example those from DownloadPDFSaver, and some may have more than one, like complete web pages as HTML. We won't support deferred naming for these types of downloads so it's not a concern now, but new types of downloads might.

I'm not sure what is the right level of abstraction here. We should make the argument an object in any case, for flexibility. Whether "request" is one of its properties is up for discussion.

A possible failure mode is that this callback is called asynchronously, and the request might have failed by the time we read its properties. In all cases, we should have exception handling for the callback. And a test for it.

Separately, I wanted to note that a different take at the same problem is to support a method like setPaths() that is callable on DownloadTarget objects whose "path" is null, at any time. This could be called, for example, in response to an "onGotRequestHeaders()" or similar notification from the download. I think in the end it's pretty much equivalent - exactly the same race conditions to be aware of and test.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1403
(Diff revision 1)
> +   * have already been handled and the callback can consult the actual
> +   * URI being downloaded, can check for HTTP errors, etc.  Note that even

What can the callback do if it detects an HTTP error and it does not want to continue? File a bug to figure this out. If the caller can do something about it, this needs a test.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1410
(Diff revision 1)
> +  /**
> +   * Indicates if the current target path has been finalized.
> +   *
> +   * When a finalize callback is being used (see above), this property is
> +   * true until the callback has completed and the actual target has been
> +   * established, at which points it becomes false.
> +   */
> +  isTempFile: false,
> +

Setting "path" to "null" instead of using this property is consistent with how we treat similar cases in Downloads.jsm.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1449
(Diff revision 1)
> +  isSerializable: function() {
> +    return (this._finalizeFilename == null && !this.isTempFile);
> +  },

This method is not needed and can be inlined, see below.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1466
(Diff revision 1)
>      // Simplify the representation if we don't have other details.
>      if (!this.partFilePath && !this._unknownProperties) {
>        return this.path;
>      }

Document why we don't need to change the conditions linked to this simplification code.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1507
(Diff revision 1)
> +    if ("filenameCallback" in aSerializable) {
> +      target._finalizeFilename = aSerializable.filenameCallback;
> +    }

We might as well use the same name for both properties. In any case, before this lands and ideally when we're set on a specific interface and have documented it, we should get another opinion on it (super-review) from Marco Bonardo.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:1941
(Diff revision 1)
> -      // Add the download to history the first time it is started in this
> +        // Add the download to history the first time it is started in this
> -      // session.  If the download is restarted in a different session, a new
> +        // session.  If the download is restarted in a different session, a new
> -      // history visit will be added.  We do this just to avoid the complexity
> +        // history visit will be added.  We do this just to avoid the complexity
> -      // of serializing this state between sessions, since adding a new visit
> +        // of serializing this state between sessions, since adding a new visit
> -      // does not have any noticeable side effect.
> +        // does not have any noticeable side effect.
> -      if (!this.alreadyAddedToHistory) {
> +        if (!this.alreadyAddedToHistory) {
> -        this.addToHistory();
> +          this.addToHistory();
> -        this.alreadyAddedToHistory = true;
> +          this.alreadyAddedToHistory = true;
> -      }
> +        }

Basically, the condition that triggers our ability to create a history entry is that we have a determined "DownloadTarget.path".

In the first implementation, this is also the condition for which downloads will appear on the public DownloadList. I think that architecture-wise it's simply much better to move this code to a listener, that is set up among others in DownloadIntegration.

Please file a separate bug on this, it's a self-contained change that we could start right away. While doing this we have to figure out whether this could race with changes to Places made in response to the download failing after being added. The good thing is that once we figure it out there, DownloadCore.jsm is free to ignore this issue.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:2131
(Diff revision 1)
> +                let tmpfile = new FileUtils.File(targetPath);
> +                if (!tmpfile.isDirectory()) {
> +                  tmpfile = tmpfile.parent;
> +                }
> +                uri.QueryInterface(Ci.nsIURL);
> +                tmpfile.append("inprogress." + uri.fileExtension);
> +                tmpfile.createUnique(Ci.nsIFile.NORMAL_FILE_TYPE, FileUtils.PERMS_FILE);

nsIFile.createUnique does main-thread I/O and this code runs in the parent process. We have a rule against adding new instances of main-thread I/O to the parent process anyways, but this is especially true for downloads where jank (the interface of Firefox becoming unresponsive) has been a serious problem in the past.

Also, the race condition that Kris mentioned is a very real problem. We should probably expect whoever provides the file name to also create a placeholder file. Since the name is now determined, if the placeholder is not empty, it will be overwritten. We need a test for this.

Note that once we have the final file name, we should set the new DownloadTarget properties as soon as possible so that the download can be serialized to "downloads.json" and potentially resumed or erased if Firefox crashes. However, the serialization of "downloads.json" happens asynchronously and we must have created the target file as a placeholder before we know which file name it will have, so we should document somewhere that during this short time frame, we may lose track of the download and leave an empty placeholder behind. There's no practical way around this.

Downloads that can actually be resumed must have a defined target path and partFilePath. These must be present before a new "download.start()" call is made, otherwise we wouldn't be able to use the partial data in the first place, but also any data already present would be overwritten because of the above.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:2147
(Diff revision 1)
> +                  if (keepPartialData) {
> +                    download.target.partFilePath = filename + ".part";
> +                    OS.File.move(tmpfile.path, download.target.partFilePath);
> +                  } else {
> +                    OS.File.move(tmpfile.path, download.target.path);
> +                  }

This will definitely not work on every file system where there is a lock on the name of "tmpfile" while it's open for writing (say Windows, both FAT and NTFS).

Also, as a general note, every time you start a Promise chain you must terminate it. Sometimes it would just be a ".catch(Cu.reportError)", but in this case this should be clearly linked to the download completion.

However, good news! BackgroundFileSaver.setTarget was designed with this in mind and you can call it at any time to change your mind on the name. The BackgroundFileSaver will do all the heavy duty work on a separate thread, and propagate all errors to the original listeners.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:2561
(Diff revision 1)
>    /**
>     * Implements "DownloadSaver.execute".
>     */
>    execute: function DLS_execute(aSetProgressBytesFn)
>    {

For clarity, we should throw an exception here if the target path is not determined. Document why.

::: toolkit/components/jsdownloads/src/DownloadCore.jsm:2794
(Diff revision 1)
>    /**
>     * Implements "DownloadSaver.execute".
>     */
>    execute: function (aSetProgressBytesFn, aSetPropertiesFn)
>    {

And more importantly, the same exception here.

::: toolkit/components/jsdownloads/src/DownloadList.jsm:93
(Diff revision 1)
> +    if (!aDownload.target.isSerializable()) {
> +      throw new Error("Cannot add download since its target is not yet serializable");
> +    }

The discrimination here is not that the target is not serializable, as non-serializable downloads can be added for the session. The point is that we don't have a determined "DownloadTarget.path", and all other consumers expect this property to be present and immutable.

We should have a bug on file to remove this constraint once other consumers are updated, and reference it from here.

One thing we could do is to implement a FilteredDownloadList that only shows downloads with a certain condition, for example "download.target.path !== null". This would be returned by default to current consumers, but an option to Downloads.getList() would allow new consumers to also see downloads that just started without a filename. WebExtensions might use the latter list if "chrome.downloads" requires extensions to see this type of downloads as well.

::: toolkit/components/jsdownloads/test/unit/common_test_Download.js:2327
(Diff revision 1)
> +function makeResolver(fn) {
> +  let resolve;
> +  let promise = new Promise(r => { resolve = r; });

General style note, use descriptive function names, especially for helpers in the global scope. Also all helpers need JSDoc-style documentation.

I think this is an instance where the helper name is generic because its boundaries are unclear - we resolve a returned promise with no data, then we call the given function but only pass an URI.

Anyways, we want to test much more than the URI, so this helper might not be needed after all.

::: toolkit/components/jsdownloads/test/unit/common_test_Download.js:2341
(Diff revision 1)
> +/**
> + * Test using a filename callback to set the filename
> + */
> +add_task(function* test_filenameCallback() {

There will be several specific tests in the end, for each one state which case the test is designed to verify. No need to be super detailed though.

::: toolkit/components/jsdownloads/test/unit/common_test_Download.js:2379
(Diff revision 1)
> +  const path = "/redirect";
> +  gHttpServer.registerPathHandler(path, (req, resp) => {
> +    resp.setStatusLine(req.httpVersion, 301, "Moved Permanently");
> +    resp.setHeader("Location", httpUrl("source.txt"), false);
> +    return;
> +  });
> +  do_register_cleanup(() => { gHttpServer.registerPathHandler(path, null); });

Great test, we should keep test cases clean like this one.
Attachment #8736120 - Flags: review?(paolo.mozmail)
https://reviewboard.mozilla.org/r/43113/#review40585

::: toolkit/components/extensions/ext-downloads.js:482
(Diff revision 1)
> -            download.start();
> +              download.start();
> +              return download;
> +            });
>  
> +        return Promise.all([downloadPromise, targetPromise, DownloadMap.getDownloadList()])
> +          .then(([download, unused, list]) => {

Rewriting the entire function with Task.jsm would make the code much easier to read, and Promise.all and its argument would disappear.
Priority: -- → P3
Whiteboard: [downloads] → [downloads] triaged
Reminder to at least file and triage a bug to evaluate the potential security issues mentioned in comment 14, if one isn't on file already. This should be a release blocker for "chrome.downloads".
Flags: needinfo?(aswan)
(In reply to :Paolo Amadini from comment #14)
> One pretty obvious example (and we may have mentioned this) is that taking
> any raw filename from a remote source to use locally is a security hazard -
> you risk writing to Alternate Data Streams, using wrong extensions, or other
> nasty things, and in the best case you'll get weirdly encoded names. I've
> not seen a release blocker bug for this filed in WebExtensions yet but there
> should be. My point is that we may be able to factor this logic to some
> common place first, so it's trivial to use in WebExtensions once it's time.

Sorry for going silent for a while but now that 48 is done I can give this some attention.
But I don't really understand the issue(s?) here.  I don't know what you mean by "Alternate Data Streams".  And by wrong extensions, you mean something like downloading a .zip file but saving it with an extension of .pdf?  I'm not seeing how that is a security hazard?  And can you be more specific about the "other nasty things"?
Flags: needinfo?(aswan) → needinfo?(paolo.mozmail)
(In reply to Andrew Swan [:aswan] from comment #18)
> I don't know what you mean by "Alternate Data Streams".

There is a thorough article here:

https://blogs.technet.microsoft.com/askcore/2013/03/24/alternate-data-streams-in-ntfs/

Notably, an ADS is used to store security zone information.

> And by wrong extensions, you mean
> something like downloading a .zip file but saving it with an extension of
> .pdf?  I'm not seeing how that is a security hazard?

On Windows, the file extension determines if a file is considered executable. We provide additional warnings for executable files. Various document types are considered executable:

http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileWin.cpp#2939

Unfortunately the logic to ensure we put the right extension on the file is split in several places, I don't have a clear specification for what the actual behavior is:

http://mxr.mozilla.org/mozilla-central/search?string=IsExecutable

I think we strictly enforce the extension for executable files and allow the user to change it manually in file name dialogs for non-executable files. Some of this might not apply to non-interactive use cases, but this is still something that should be evaluated.

> And can you be more specific about the "other nasty things"?

Backreferences may be a typical example. I think the nsIFile implementation already disallows them in a leaf name, I'm not sure about OS.File though.

There may well be other hazards I'm not aware of, maybe you could add control characters or leverage issues with different encodings for example. Sanitizing the file name avoids these unknown issues by only allowing what we know is safe and filtering out the rest.
Flags: needinfo?(paolo.mozmail)
(In reply to :Paolo Amadini from comment #19)
> (In reply to Andrew Swan [:aswan] from comment #18)
> > I don't know what you mean by "Alternate Data Streams".
> 
> There is a thorough article here:
> 
> https://blogs.technet.microsoft.com/askcore/2013/03/24/alternate-data-
> streams-in-ntfs/
> 
> Notably, an ADS is used to store security zone information.

Okay.  I'm having trouble finding the existing code that guards against inadvertently downloading into an alternate data stream, can you point me at it?

> > And by wrong extensions, you mean
> > something like downloading a .zip file but saving it with an extension of
> > .pdf?  I'm not seeing how that is a security hazard?
> 
> On Windows, the file extension determines if a file is considered
> executable. We provide additional warnings for executable files.

I assume these warnings live outside of the DownloadCore module which means we're not showing them at all on downloads that are started programmatically from an extension.  I don't really feel equipped to evaluate whether that's something we need to change (to state the obvious: existing addons that use DownloadCore directly don't display these warnings).  Who is the right person to make a decision on this?

> > And can you be more specific about the "other nasty things"?
> 
> Backreferences may be a typical example.

This we cover here: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/extensions/ext-downloads.js#401-411

> There may well be other hazards I'm not aware of, maybe you could add
> control characters or leverage issues with different encodings for example.
> Sanitizing the file name avoids these unknown issues by only allowing what
> we know is safe and filtering out the rest.

Can you point me to the existing code that handles this so I can adapt or re-use it?  I'm sorry I'm unable to find it myself but I've never managed to grasp the layout of the current downloads implementation.
(In reply to Andrew Swan [:aswan] from comment #20)
> Okay.  I'm having trouble finding the existing code that guards against
> inadvertently downloading into an alternate data stream, can you point me at
> it?

It may be a while before I can provide more detailed answers, in the meantime here are some starting points for different download code paths, you should probably follow the code outside of these functions since other validations may occur at different stages.

http://mxr.mozilla.org/mozilla-central/source/toolkit/content/contentAreaUtils.js#1105
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/downloads/nsHelperAppDlg.js#396
http://mxr.mozilla.org/mozilla-central/source/mobile/android/components/HelperAppDialog.js#304

> > On Windows, the file extension determines if a file is considered
> > executable. We provide additional warnings for executable files.
> 
> I assume these warnings live outside of the DownloadCore module which means
> we're not showing them at all on downloads that are started programmatically
> from an extension.  I don't really feel equipped to evaluate whether that's
> something we need to change

Add-ons can start downloads that appear in the Downloads Panel, this means that the logic in the Firefox front-end determines if a warning is shown or not, even if the download is started from a WebExtension.

> (to state the obvious: existing addons that use
> DownloadCore directly don't display these warnings).

If an existing add-on uses the .launch() method, the warning is still displayed if the file is executable:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/src/DownloadIntegration.jsm#715

> Who is the right person to make a decision on this?

WebExtensions is part of the Firefox product, but I feel you should evaluate this within the team first. You may conclude this is not an issue. Since I don't have all the context, I cannot make a decision here. You may also file a bug to schedule a review at later stages, when the API is more mature and can be tested with actual WebExtensions.

> > Backreferences may be a typical example.
> 
> This we cover here:
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/extensions/
> ext-downloads.js#401-411

This code only runs for addon-provided paths, but doesn't run if the addon specifies "null" and you use the name from the URI. In that case, however, backreferences are probably not an issue for now because I've noticed you don't unescape characters yet, so backslashes appear as percent-encoded in the final filename.

Adding validation of the file name would probably mitigate most of these issues.
Depends on: 1282165
Component: WebExtensions: Untriaged → WebExtensions: Request Handling
Attachment #8736121 - Flags: feedback?(lgreco)
Assignee: aswan → nobody
As described in the bug I just DUPed to this one, we should also handle Content-Disposition headers if/when we have the ability to choose the target file name after beginning the download.
Product: Toolkit → WebExtensions
See Also: → 1245652
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: