Open Bug 1245652 Opened 8 years ago Updated 9 months ago

Implement chrome.downloads.onDeterminingFilename

Categories

(WebExtensions :: Request Handling, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: aswan, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [downloads]triaged)

Attachments

(2 files)

Part of supporting chrome.downloads in WebExtensions

https://developer.chrome.com/extensions/downloads#event-onDeterminingFilename
Blocks: 1213445
Whiteboard: [downloads]
See also bug 1123040 comment 18 and the rest of the discussion there.
See Also: → 1123040
Whiteboard: [downloads] → [downloads]triaged
Component: WebExtensions: Untriaged → WebExtensions: Request Handling
Priority: -- → P3
There's only 95 extensions in Chrome that use this API, so its very sparse, but I do think it will help with some of the concerns around existing Firefox extensions. It would allow extension developers to save files into a directory based on rules determined by the extension. Some of the extensions on bug 1246236 would benefit from this.
webextensions: --- → ?
Andy thank you for answer!
In fact, I do not know any other way to capture the downloading FTP file. Only onDeterminingFilename and it works well in chrome.
Our extension captures downloads and helps to download files.
Now there is a problem with FTP downloading.
If you could add this method to your API in the future it would be very usefull.
webextensions: ? → +
Assignee: nobody → mstriemer
onDeterminingFilename is a must have feature to catch all browser's downloads in a right way (e.g. for a download manager).
After looking at a few factors: how complicated this is to implement in Firefox, how different our internals are from Chrome and the amount of usage it gets in Chrome Extensions (very little) I'm going to take this off the webextensions+ list for 57.

I doubt we'll be able to get this done in time and if we do it, we'd need to do it with some real underlying changes to Firefox.
webextensions: + → ---

(In reply to Andy McKay [:andym] from comment #6)
> After looking at a few factors: how complicated this is to implement in
> Firefox, how different our internals are from Chrome and the amount of usage
> it gets in Chrome Extensions (very little) I'm going to take this off the
> webextensions+ list for 57.

Hi Andy, I am working on a prototype (bug 1357160) that need some way to update target.path after download is created and since your team explored, it will be great if you can share issues found in getting this implemented. Thanks


> I doubt we'll be able to get this done in time and if we do it, we'd need to
> do it with some real underlying changes to Firefox.
Flags: needinfo?(amckay)
(In reply to Punam Dahiya [:pdahiya] from comment #7)
> Hi Andy, I am working on a prototype (bug 1357160) that need some way to
> update target.path after download is created and since your team explored,
> it will be great if you can share issues found in getting this implemented.
> Thanks

In short, there are at least three different paths for starting downloads in Firefox, and not all of them are written in a way that facilitates asynchronously determining the target filename.  The three paths I know of are:
1. Downloads.createDownload()   http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/components/jsdownloads/src/Downloads.jsm#70
2. ContentAreaUtils.DownloadURL()  http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/content/contentAreaUtils.js#793
3. From uriloader/exthandler/nsExternalHelperAppService.cpp (difficult to point to a single line of code there...)

It is the third path that is most problematic for onDeterminingFilename.  The ideal solution would be significant refactoring of that code path to move a bunch of the logic into the Download objects in toolkit/components/jsdownloads/ but that is a pretty substantial undertaking.
Flags: needinfo?(amckay)
This seems to be a big issue for Downloader add-ons. (i.e https://github.com/marklieberman/downloadstar/issues/9) How hard would it be to just support this for downloads created with chrome.downloads API?
(In reply to Tom Schuster [:evilpie] from comment #9)
> How hard would it be
> to just support this for downloads created with chrome.downloads API?

I'm not sure, but why couldn't an extension that wants to use this on its own downloads just supply the filename parameter when it first creates the download?
(In reply to Andrew Swan [:aswan] from comment #10)
> (In reply to Tom Schuster [:evilpie] from comment #9)
> > How hard would it be
> > to just support this for downloads created with chrome.downloads API?
> 
> I'm not sure, but why couldn't an extension that wants to use this on its
> own downloads just supply the filename parameter when it first creates the
> download?

That was already explained in the link given.

It requires each and every download be requested via a lower-level API like XMLHttpRequest to determine the filename, then cancelled, then requested again through chrome.downloads, which has significant overhead and breaks sites which only allow a single request per generated URL.
(In reply to Stephan Sokolow from comment #11)
> (In reply to Andrew Swan [:aswan] from comment #10)
> > (In reply to Tom Schuster [:evilpie] from comment #9)
> > > How hard would it be
> > > to just support this for downloads created with chrome.downloads API?
> > 
> > I'm not sure, but why couldn't an extension that wants to use this on its
> > own downloads just supply the filename parameter when it first creates the
> > download?
> 
> That was already explained in the link given.
> 
> It requires each and every download be requested via a lower-level API like
> XMLHttpRequest to determine the filename, then cancelled, then requested
> again through chrome.downloads, which has significant overhead and breaks
> sites which only allow a single request per generated URL.

For that scenario, this bug alone is not enough, bug 1254327 would also need to be fixed.
I looked into this a bit, but it seems even fixing this just for downloads started with browser.downloads.download is hard. From what I can tell we always create the target file path/name even before starting the download. We don't even currently seem to look at the Content-Disposition header.
Product: Toolkit → WebExtensions
See Also: → 1579456

I had tried this at some point in the past and also concluded that it is hard. The API didn't seem compatible with how downloads are handled in Firefox.

Assignee: mstriemer → nobody
Type: defect → enhancement
  1. Move the "destination stuff" (conflict action, making directories) into DownloadCore.
  2. Implement optional Content-Disposition handling in DownloadCore (which is a nice to have regardless), falling back to deriving the name from either the original URL or the final URL (not sure which would be best). With some sanity checks, of course. Optional in that it should be possible to force a name (ext-downloads needs this). Not sure if legacy/pdf need to actually handle this, as it's probably handled elsewhere already, but copysaver certainly does.
  3. Have DownloadCore notify the web extensions backend (and/or any other interested parties) and wait for the responses before moving the file to the final target path. A simple registerSuggestionProvider() type API where a provider is essentially a function (download): Promise<string?> would suffice to track what to notify.
  4. Have ext-downloads observe the notifications, and in turn fire onDeterminingFilename on extension listeners, validate the results according to API rules and report the result back to DownloadsCore.
  5. DownloadsCore collects the suggestions, validates according to it's rules (things like reserved names in Windows and/or reserved characters should be avoided) and if any is acceptable changes the final target path accordingly, considering the conflict action.
  6. DownloadsCore creates the directories as necessary and moves the file to the final target path.

One hurdle is that the notification stuff probably needs to be async because IPC is involved to call into extensions. But that is solvable by delaying the final move until the results are in.

Of course, once the final target location changes, the part file will be in the wrong place too. Not sure if it should be moved (which requires flushing the file, closing it, moving it, reopening it) or if it is OK to keep it in the original location. I seem to remember the .part location is derived from the target location, which would mean it has to be moved so it can be found if the download is stopped and resumed.

mak, I am not entirely sure when what implementation of the Trinity that is DownloadsCopySaver, DownloadsLegacySaver, DownloadsPDFSaver kicks in. DownloadsCopySaver seems to be for creating new downloads ad-hoc, while DownloadsLegacySaver will adopt an existing channel (e.g. when a user clicked a link that resulted in a download)? And DownloadsPDFSaver is special for pdfjs integration?
Could you shed some light?
Also, does this sound like a plan, in general?

Flags: needinfo?(mak77)

(In reply to Nils Maier [:nmaier] from comment #17)

mak, I am not entirely sure when what implementation of the Trinity that is DownloadsCopySaver, DownloadsLegacySaver, DownloadsPDFSaver kicks in. DownloadsCopySaver seems to be for creating new downloads ad-hoc, while DownloadsLegacySaver will adopt an existing channel (e.g. when a user clicked a link that resulted in a download)? And DownloadsPDFSaver is special for pdfjs integration?

DownloadLegacySaver is for downloads initiated by the network stack, like clicking on a link, as you said. The network starts the download in background and we adopt it. In the past I suggested renaming it to DownloadAdoptingSaver, because Legacy doesn't really say anything (there was a plan to get full control of the download but it never went through), never filed a bug for that renaming though.
DownloadCopySaver is for downloads initiated by the downloads code, we retain full control of the process.
DownloadPDFSaver is special because it creates a download from a live PDF document, it's pretty limited in functionality.

I still have to evaluate the plan, will do asap, so I'm leaving the needinfo.

(In reply to Nils Maier [:nmaier] from comment #17)

  1. Move the "destination stuff" (conflict action, making directories) into DownloadCore.

This is a bit generic, it would benefit from making a more detailed plan. Maybe you should try to do a whole breakdown of the needed changes and keep it in a gdoc or similar, then dependencies can be filed from it.

  1. Implement optional Content-Disposition handling in DownloadCore (which is a nice to have regardless), falling back to deriving the name from either the original URL or the final URL (not sure which would be best).

This sounds like related, but not strictly necessary for this specific bug. Is it?

With some sanity checks, of course. Optional in that it should be possible to force a name (ext-downloads needs this). Not sure if legacy/pdf need to actually handle this, as it's probably handled elsewhere already, but copysaver certainly does.

The legacy saver will also have to handle onDeterminingFilename. It looks like there may be a general path to go through onDeterminingFilename, if every saver needs it.

One hurdle is that the notification stuff probably needs to be async because IPC is involved to call into extensions. But that is solvable by delaying the final move until the results are in.

Yes, using await it won't be crazy, I'd still probably race the promises with a timeout, we don't want to wait forever for a broken extensions.

Of course, once the final target location changes, the part file will be in the wrong place too. Not sure if it should be moved (which requires flushing the file, closing it, moving it, reopening it) or if it is OK to keep it in the original location. I seem to remember the .part location is derived from the target location, which would mean it has to be moved so it can be found if the download is stopped and resumed.

Both the placeholder and .part must be moved. If it's too complex to do while ongoing, it should be stored and persisted (eventually) in downloads.json, then done when the download is finished, though this may confuse the user. BackgroundFileSaver.cpp is doing that move internally I think, unfortunately I don't know that part very well, but it can be studied.

Flags: needinfo?(mak77)

(In reply to Marco Bonardo [::mak] from comment #19)

Yes, using await it won't be crazy, I'd still probably race the promises with a timeout, we don't want to wait forever for a broken extensions.

I have mixed feelings on this because I have a history of running into tab/extension loadouts which cause the browser to bog down for long periods of time and I wouldn't want brokenness elsewhere to randomly cause apparent breakages here.

(eg. At the moment, my chrome process will seize up at 100% of one core for up to a minute once or twice a day and, since most of my extensions are privacy or security extensions and I don't know how to reliably reproduce the problem on demand, I've been unwilling to risk doing an enable/disable binary search among them.)

(In reply to Stephan Sokolow from comment #20)

I have mixed feelings on this because I have a history of running into tab/extension loadouts which cause the browser to bog down for long periods of time and I wouldn't want brokenness elsewhere to randomly cause apparent breakages here.

I'm not sure I get your concern, my suggestion is just to unblock the download after some seconds if an add-on never replies to the callback. That has no relation with hangs or heavy load.

(In reply to Marco Bonardo [::mak] from comment #21)

(In reply to Stephan Sokolow from comment #20)

I have mixed feelings on this because I have a history of running into tab/extension loadouts which cause the browser to bog down for long periods of time and I wouldn't want brokenness elsewhere to randomly cause apparent breakages here.

I'm not sure I get your concern, my suggestion is just to unblock the download after some seconds if an add-on never replies to the callback. That has no relation with hangs or heavy load.

My concern is that heavy load elsewhere in the browser may slow the extension down to the point that the timeout fires before the extension manages to respond through no fault of its own.

I understand it's an unpredictable situation, but If the timeout is large enough it should not be a big deal, and still better than having downloads get stuck forever, imo.

The onDeterminingFilename API is such that it receives a suggest function which it must call synchronously before the listener function returns. If that function is not called, then it's just regarded as no suggestion provided. An extension therefore cannot forget to call and stall the download indefinitely.

What makes the entire thing async is the need for the WebExtension implementation to relay the event from the main process into the add-on process and relay back the result. It is my understanding this cannot stall forever either (if implemented correctly), and will throw an Error in the main process in the worst case if the add-on process dies prematurely

Thus, dealing with timeout should not be necessary.

Attached patch WIP patch v0Splinter Review

Here is a work in progress patch.
What's missing still:

  • tests for the new functionality
  • onDeterminingFilename for DownloadsLegacy (and pdf)
  • hooking up call sites other than the WebExtensions API with regards to allowNameDetermination.

What's working:

  • Determining the file name from channel meta data (Content-Disposition or redirected URL)
  • Existing tests (well, I tested the components/downloads xpcshell tests and the ext-download ones for now)
  • onDeterminingFilename for WebExtension-initiated downloads (which is my primary use case to begin with)

Marco, you wanted a more detailed plan, so here it is kinda ;)
I'd like some feedback to the overall approach (not a full code review with nitpicking etc). Are there any major showstopper issues you see there?

Also, as for hooking up other call sites, IMO downloads should be onDeterminingFilename-able unless the user specifically picked a file already. So e.g.

  • legacy downloads would generally be onDeterminingFilename-able unless the file picker was shown,
  • "Save Link As" asks to pick a save location so it shouldn't be onDeterminingFilename-able
  • The PDF download button normally does not show a file picker, so it normally should be onDeterminingFilename-able
  • extension-initiated downloads should be onDeterminingFilename-able unless they specify a filename in the create options (this is implemented already, and to match Chrome behavior).

I think this would pretty much match Chrome's behavior.
Does this sound reasonable?

Then there is the issue of moving the part file around or not. The issue here is that the part file location is user-defined and does not necessarily have any relationship with the target file, tho often it is just target + ".part". So I see three options now

  • Leave it be. Easiest, implementation-wise. The only minor drawback might be that the user gets confused when the .part file and the target file seem unrelated. Then again, they might be already "unrelated", and we do not have to move it, meaning less I/O, and one piece less that can break.
  • Always move it around (as long as name determination is allowed).
  • Only move it around when partFile === targetFile + ".part".

Thoughts?

Flags: needinfo?(mak77)

(In reply to Nils Maier [:nmaier] from comment #25)

  • legacy downloads would generally be onDeterminingFilename-able unless the file picker was shown,
  • "Save Link As" asks to pick a save location so it shouldn't be onDeterminingFilename-able

I seem to remember Chrome's behaviour being to fire onDeterminingFilename before showing the file picker so the extension could change the default that was presented to the user.

In fact, that's one of my target uses. To do things like "If the download is coming from domain X and it's a Zip or RAR file, override the extension in the Save As picker to .cbz or .cbr, respectively".

I seem to remember Chrome's behaviour being to fire onDeterminingFilename before showing the file picker so the extension could change the default that was presented to the user.

Interesting, kinda makes sense. I wasn't aware of this.
This might be somewhat hard to implement in Firefox I suppose as - if I remember correctly - downloads are not even created until after such a picker is shown.

In fact, that's one of my target uses. To do things like "If the download is coming from domain X and it's a Zip or RAR file, override the extension in the Save As picker to .cbz or .cbr, respectively".

Is that public or even open source? And some instructions?
I'd like to play with it.

Flags: needinfo?(from_bugzilla2)

(In reply to Nils Maier [:nmaier] from comment #27)

Is that public or even open source? And some instructions?
I'd like to play with it.

It was just a hard-coded one-off thing that I was side-loading back before Chrome introduced an "opening the Save As dialog has a small but non-zero probability of causing the UI to stop responding to X11 input events" bug that wound up driving me back to Firefox exclusively.

The .cbz example was for a site that you can't just create a free account on and get testable downloads, but I think I also had it correcting Twitter images to change ".jpeg:large.jpg" to ".jpg".

I'll see if I've still got a copy kicking around after I've slept.

As I suspected and kinda remembered the legacy downloads (that is initiated from clicking links or using save as) go through HelperAppDlg.jsm which handles the prompts (both, the generic save/launch one, and the file picker for save-as) before they hit nsITransfer (DownloadLegacyTransfer). So implementing onDeterminingFilename before those prompts are shown would be a HUGE task involving essentially rewriting the entire click-to-transfer pipeline.

Sorry for the delay.

I'm not sure whether I should share the domain where I was doing the .zip to .cbz correction, but the extension's hard-coded ruleset did also contain an entry for fixing .jpeg_large to .jpg when downloading images off Twitter which I left in.

The attached Zip contains it in unpacked form for easy exploration and I just verified that it does modify the Save As dialog's contents under Chromium 77.0.3865.90.

Flags: needinfo?(from_bugzilla2)

Also, I just realized another use I'd want to put such a thing to... Rewriting the default filenames provided to Ctrl+S so I don't have to keep manually swapping in underscores for characters like colons that are valid on Linux+ext2/3/4 and often show up in page titles, but are not valid under FAT32, exFAT, or NTFS's Win32 personality.

(I prefer not to have to load my "Web Page, complete" saves on a Linux machine and re-save them to make them compatible with flash drives and Windows machines without breaking the references to the files in ..._files.)

Thanks for the infos.
And, I agree, there are use cases for applying file name changes before showing any pickers.

I think I was tired and not entirely thinking straight when I said it would be a massive undertaking. to implement this... Just patching in the determineFileName step into HelperAppDlg.jsm, right before it does determine the name itself, should be fine.

Something regarding Chrome's behavior that might be related to this, is that if I click to 'save link as' for a large file and wait a while with the file picker up, and then choose a path to save the file to, the download will usually already be partly completed as though it had been downloading in the background while the file picker was open.

I know it's not directly related to this issue per se, but I think it reveals a bit about how Chrome manages downloads - that is, they do everything to start the download up front, before even giving the user a chance to override anything.

It might be worth debating whether Firefox should use a similar approach or not. On one hand, it means that by the time a user chooses a directory for the file to reside in, the file is already that much closer to being downloaded. On the other hand, if a user doesn't realize how large the file is and is on a metered connection, that portion of the file already downloaded might go against their data cap even if they cancel the transfer. There might be other pros and cons, but those are two things that I could come up with off the top of my head.

If Firefox chooses to mimic Chrome in this regard, then it sounds like much of the work that would be required to implement onDeterminingFilename would be done anyway. Otherwise, perhaps it's too much work to do that much refactoring and reorganization just for this one API feature.

Firefox stopped doing that at some point?

I'm certain I remember it behaving that way back during the height of the single-process, XUL UI era and assumed Chrome was copying Firefox on that front.

I admit I haven't tested that, and I based my post on the wording in other comments, saying things like the download isn't 'created' until after the filepicker is shown, and discussing what to do if the user or script changes what directory to download the file to (if it has to move the .part file, or just finish downloading and then move it, and so on).

I might be misunderstanding the wording, too (perhaps 'after the filepicker is shown' still refers to before the user picks a location).

I haven't looked into it but my impression based on prior experience was that the download starts streaming immediately but the higher-level API object that would need to be fed to onDeterminingFilename doesn't get created until after a filename is chosen and something about Firefox's internal architecture makes that difficult to change.

As a developer of an extension that renames downloads, this is one API I'm sorely missing from Firefox. I've had lots of user issues with trying to get filenames via a separate fetch request (cookies, Referer, Content-Disposition). The Chrome equivalent generally works great with far fewer lines of custom code. Having this API or an equivalent in Firefox would make my users much happier.

Firefox does start the download before the final filename is assigned but that's not really relevant for this bug. The problem is that at some point a DownloadTarget object gets created, and all the rest of the downloads code assumes that once this object is created, the target filename never changes. Discussion related to this and an attempt at addressing it happened some time ago over on bug 1254327. If anybody is interested in working on this bug, reading up on that bug and potentially picking up the abandoned patch there could be a good place to start.

See Also: → 1254327
Flags: needinfo?(mak)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: