Closed Bug 946239 Opened 11 years ago Closed 10 years ago

[Download API] System app stops getting download progress notifications when the app which started the download is killed

Categories

(Core :: Networking: HTTP, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
1.4 S3 (14mar)
blocking-b2g 1.4+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.4 --- verified

People

(Reporter: borjasalguero, Assigned: reuben)

References

Details

(Keywords: verifyme, Whiteboard: [systemsfe][p=5])

Attachments

(3 files, 3 obsolete files)

STR:
- Open browser
- Search a big file (i.e. a mp3, for having enough time for canceling :) )
- Tap to download
- When the download is shown in the notification bar, close the browser

EXPECTED:
Download is downloading as expected.

CURRENTLY:
Download is stopped without launching any 'statechange' event, and the result is an inconsistency in the list of downloads.
Depends on: 938023
Fabrice, I've found this issue while testing the API... It's weird because I dont know why the download is canceled when closing the Browser, could you take a look? Thanks!
Flags: needinfo?(fabrice)
Whiteboard: [systemsfe]
Component: Gaia → General
Hm, interesting... Does all network activity stop, or only event dispatching from the API?
Flags: needinfo?(fabrice) → needinfo?(aus)
Hi mates, I think all of them are stopped because the only one event received is "ondownloadstart". And I cannot see the icon when there is network activity on the status bar. For this reason I think that network activity is stopped. It is only a suspicion in order to help you. Thanks Fabrice and Aus
Hi Fabrice! I was testing more and it seems more related with the app which launched the download more than a network issue. For example, the following STR:
- Open browser and download a file
- Switch to 'airplane mode'
EXPECTED:
Download is canceled and the event of 'state change' from download API is working.
CURRENTLY:
Working as expected!

However, if instead of activating the airplane mode I killed the browser, the download is broken...I hope this new STR helps. Currently download notifications code is in master, so you can test directly with latest build. Thanks!
Flags: needinfo?(fabrice)
Ok Borja, thanks for confirming the str.
Flags: needinfo?(fabrice)
Checking it out... looks like we'll want to add an integration test for this scenario.
Assignee: nobody → aus
Flags: needinfo?(aus)
Looks like when the window for the application goes away we purge the cache which in turn kills the download. This was done on purpose. 

We have a few options to solve this. We do know when the window goes away at which point we could do:

1. Move the download to the system application window so that it can finish (if it's not finished already)
2. We simply do not purge downloads that are still active. We instead purge them from the cache when they are completed.
3. And another would be to specify that the download can complete without the application that requested it being required to be running. A sort of 'background' download if you will. This would be a flag that is passed when initiating the download.

I'm somewhat partial to #2. 

Gregor, how do you feel about the options I listed?
Flags: needinfo?(anygregor)
(In reply to Ghislain 'Aus' Lacroix from comment #7)
> Looks like when the window for the application goes away we purge the cache
> which in turn kills the download. This was done on purpose. 

The cache is in the child right? How does purging the cache in the child trigger a cancel-download?
Do we send a message to the parent?
Flags: needinfo?(anygregor)
Status: NEW → ASSIGNED
Whiteboard: [systemsfe] → [systemsfe][p=5]
blocking-b2g: --- → 1.4?
Gregor,

Is this bug in scope for the SFE work in 1.4?

Please help understand user experience and risk profile involved.
Flags: needinfo?(anygregor)
blocking-b2g: 1.4? → 1.4+
Flags: needinfo?(anygregor)
Target Milestone: --- → 1.4 S1 (14feb)
Reuben, can you help out here?
Flags: needinfo?(reuben.bmo)
Yes, when the cache is purged we send a message that cancels the download. The simplest solution is to simply let the download continue. However, we need to make sure that the events still flow to the Download Manager UI as expected.

The other options are listed here: https://bugzilla.mozilla.org/show_bug.cgi?id=946239#c7
(In reply to Ghislain Aus Lacroix [:aus] from comment #11)
> Yes, when the cache is purged we send a message that cancels the download.
> The simplest solution is to simply let the download continue. However, we
> need to make sure that the events still flow to the Download Manager UI as
> expected.

As we discussed on IRC, that's not actually what causes the download to be stopped. In fact, the download isn't stopped at all, it's just the UI that goes out of sync, since it uses the Download object it uses to get status updates is destroyed when the app is killed. I'll investigate comment 7 option 2 since it appears to be the simplest solution, but it looks like The Right Thing to do here is to stop linking Download objects to the windows that started them.
Flags: needinfo?(reuben.bmo)
Assignee: aus → reuben.bmo
Summary: [Download API] Download is not working when the app which started the download is killed → [Download API] System app stops getting download progress notifications when the app which started the download is killed
Component: General → Download Manager
Product: Firefox OS → Toolkit
Target Milestone: 1.4 S1 (14feb) → ---
Version: unspecified → Trunk
So we're not getting progress notifications because they're coming from the child. When we kill the app, we kill the download, the progress notifications stop coming, and we end up with an inconsistent downloads list. This is related to bug 562444 and bug 564315 (it also reproduces on Firefox, if you open a new E10S window, start a download then kill the window we end up in the same state).
Component: Download Manager → Networking: HTTP
Product: Toolkit → Core
IRC chat with reuben:

reuben: yup, you pretty much nailed it. It's the URILoader being in content that currently makes us funnel the download to the child.  I don't know the URILoader code well, but I got the impression from bz and biesi that it's not trivial to move it to the parent.  But maybe it's time. Alternatively it might be possible to make the decision to download in the child, but then create some new downloadParentManager thing that can be told to complete the download in the parent. I don't know the details of URILoader or the download manager well enough to say.  Perhaps bz does.
Flags: needinfo?(bzbarsky)
The biggest issue with moving URILoader from child to parent, I suspect (apart from actually doing it) is that some of the things URILoader decides right now (like whether to decompress the data) have to get decided before returning from OnStartRequest.  So we'd need to make sure that URILoader can do all that without needing to talk to the child.
Flags: needinfo?(bzbarsky)
If we need to try to fix this for 1.4, here's a possible, hack-y solution that might work without moving the URILoader to the parent (though that seems like the best long-term fix).

1) Proceed as usual, up to where OnStartRequest in the child calls URILoader and we determine that we're going to do a download.

2) Call a new method on the child channel that is kind of like suspend(), except that it 1) sends an IPDL msg to suspend() the parent channel, while 2) still delivering all pending OnDataAvailable messages to the child listener, followed by some new, magic "your queue is flushed" method.  Let's call it "ChannelFlushed()"

3) The child currently sends all the OnDataAvailable messages it gets back to the parent.  Continue doing that, but once ChannelFlushed() is called, we somehow arrange to change the listener from the usual HttpChannelParent (which would send the ODA to the child) to the parent DownloadManager.  We could do that either by 1) teaching HttpChannelParent to accept a new "DivertTo(listener)" method, and have it forward OnData/OnStop to the new listener as it gets those calls, or by 2) using nsITraceableChannel.  I want to deprecate that class, so I prefer #1.

4) Once we have this all set up, we resume() the parent nsHttpChannel (or FTP channel) and all further data will go to the parent download manager even if the child dies. 

5) Clean up on the child side: shut down the IPDL for the PHttpChannel, make sure that whatever we currently hand the channel to in the child is OK with no longer getting OnStopRequest, etc.  Note that the HttpChannelParent is refcounted and will stay alive until OnStopRequest is done, so even if we tear down the IPDL connection we can still use it to dispatch OnData/OnStop to the download manager listener).

This solution would keep downloads going even if the child dies, unless the child dies almost immediately after starting the download.  It's a little kludgy--ideally we could just restart the channel on the parent and point it at the download manager, but restarting POSTs (and even GETs sometimes) is not something we want to do.

The reason the bits about the download manager listener are hand-wavey is that I don't know that code very well.  Brian Crowder wrote that code, but he's gone now, and I think the code is unowned.  So we'd need someone to learn the bits he changed in bug 562444.
I definitely think that passing the consuming of a channel from the child to the parent is the way to go. Trying to run the uriloader logic in the parent seems complex. In general we want to move towards running more logic in child processes, not less.

So adding the ability to pass of stream consumption from the child to the parent once the child has determined that we're dealing with a download seems like the way to go.

However having to drain all queued OnDataAvailable calls seems annoying. Is there some way we could enable telling necko "please pass this channel endpoint from the child to the parent" and then in the parent tell necko "here's the new nsIStreamListener I want to use".

My idea would be something like a "PrepareForTransferToParent()" function which would return an ipdl object that represents the paused channel. Internally this would suspend the channel and stick all queued OnDataAvailable data in the ipdl object.

The caller would then be responsible for transferring the ipdl object to the parent.

Once in the parent the caller would call some "nsIChannel* ResumeTransferredChannel(nsIStreamListener*)" function. I.e. the function takes the new streamlistener and returns the nsIChannel. This unsuspends the channel and calls into the streamlistener with all the queued OnDataAvailable data.

Does this sound crazy?
Depends on: 975338
Ah, Jonas, always thinking of the more user-friendly, well-encapsulated interfaces...

I agree that it would be prettier for necko to automagically provide an API where a child listener during OnStartRequest() can request to divert the channel back to the parent, where some new listener can get all the OnStart/Data/Stop calls, with necko handling returning any OnData/OnStop calls that have already arrived on the child.

To make this happen will take a little more work (we'll need to rip out all the PExternalHelperApp.ipdl stuff that's currently doing its own child->parent OnData/OnStop msg returning) but it will have a cleaner architecture and will be more reusable if we ever need to deal with this sort of thing again.

I've created bug 975338 for the necko bits. Once we've got an API in place we can have Reuben change the ExternalHelper code to use it in this bug.
Target Milestone: --- → 1.4 S3 (14mar)
I have a patch that works, just need to remove all the hacks and wait for Steve's patches to get ready. And do more testing, of course :)
Attached patch Minimum working patch (obsolete) — Splinter Review
Need to figure out if DidDivertRequest belongs in nsExternalAppHandler, if we're not leaking the handler or the dialog after the channel is diverted, if this works on desktop + e10s.
Oh, and also see if we can rearrange things in the parent so that the child actor doesn't have to stay alive during the download.
Attached patch Divert requests to the parent (obsolete) — Splinter Review
Attachment #8385549 - Attachment is obsolete: true
Attachment #8385731 - Flags: review?(jduell.mcbugs)
Comment on attachment 8385731 [details] [diff] [review]
Divert requests to the parent

Boris, this builds on top of the stuff in bug 975338.
Attachment #8385731 - Flags: review?(bzbarsky)
Comment on attachment 8385731 [details] [diff] [review]
Divert requests to the parent

>+++ b/uriloader/exthandler/ExternalHelperAppChild.cpp
> ExternalHelperAppChild::OnStartRequest(nsIRequest *request, nsISupports *ctx)
>+    return DivertToParent(divertable, request);
>+  } else {

No else after return, please.

>+  if (mHandler) {

How would mHandler be null here?  Are part channels divertible or something?  If it can be null, please document why.

> ExternalHelperAppChild::OnStopRequest(nsIRequest *request,
>+  if (mHandler) {

Pleaes document that mHandler can be null if we diverted.

>+ExternalHelperAppChild::DivertToParent(nsIDivertableChannel *divertable, nsIRequest *request)
>+  mozilla::net::ChannelDiverterChild *diverter = nullptr;
>+  nsresult rv = divertable->DivertToParent(&diverter);

The documentation says that the DivertToParent() call will drop the ref to us, no?  Does that mean we need to hold a strong ref to ourselves on the stack here?  Or do divertible channels guarantee that they hold stack refs while sending stream listener notifications?

>+    delete mHandler;
>+    mHandler = nullptr;

mHandler is refcounted, no?  Why are we manually calling delete on it?  I'd expect this code to crash, when we then try to Release() a deleted object!

>+++ b/uriloader/exthandler/ExternalHelperAppParent.cpp
>+ExternalHelperAppParent::OnStopRequest(nsIRequest *request,
>+  if (status != NS_ERROR_ABORT) {
>+    unused << Send__delete__(this);

I don't follow this.  NS_ERROR_ABORT could happen for lots of reasons, no?  Why is it ok to skip sending delete just because that's the status code?

>+++ b/uriloader/exthandler/ExternalHelperAppParent.h
>+  bool mDiverted;

This is only used in asserts, right?  Should it be debug-only?

Also, put it next to the other bool members we have, to pack better?

>+++ b/uriloader/exthandler/nsExternalHelperAppService.h
>-#ifdef MOZ_LOGGING
>-#define FORCE_PR_LOG
>-#endif

Why this change?

r=me with the above addressed or explained as needed, but I'd like to see the interdiff addressing those things.
Attachment #8385731 - Flags: review?(bzbarsky) → review+
Comment on attachment 8385731 [details] [diff] [review]
Divert requests to the parent

Review of attachment 8385731 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not a peer for this code, but overall looks close to ready.  I agree with everything bz says (my default policy :) plus a couple other things.

> The documentation says that the DivertToParent() call will drop the ref to us, no?
> Does that mean we need to hold a strong ref to ourselves on the stack here?  
> Or do divertible channels guarantee that they hold stack refs while sending stream
> listener notifications?

bz: good point.  We'll change the doc to specify that the listener ref is dropped after OnStartRequest returns.  I think it's silly to require kungFuDeathGrip in the listener.

::: uriloader/exthandler/ExternalHelperAppChild.cpp
@@ +62,5 @@
>  {
> +  nsCOMPtr<nsIDivertableChannel> divertable = do_QueryInterface(request);
> +  if (divertable) {
> +    return DivertToParent(divertable, request);
> +  } else {

Time-honored practice at Mozilla is to use

  if (...)
     return 

  blah()

instead of 

  if (...) 
     return
  else
    blah.

better indenting for one thing...

@@ +66,5 @@
> +  } else {
> +#ifdef DEBUG
> +    nsCOMPtr<nsIHttpChannel> httpChannel = do_QueryInterface(request);
> +    nsCOMPtr<nsIFTPChannel> ftpChannel = do_QueryInterface(request);
> +    MOZ_ASSERT(!httpChannel && !ftpChannel, "HTTP/FTP channel is not divertable");

You should probably remove the if DEBUG code here.  If at some point (when e10s desktop and/or b2g addons are supported) we actually run into a addon-created channel that doesn't support diversion, there's a pretty good chance it will implement nsIHttpChannel (but still not be a HttpChannelChild).  So you'd assert for something that's "normal".

That said, right now I'd be fine just putting in 

   NS_RUNTIMEABORT("We don't support addon-wrapped download channels yet")

It's a lie :) (we're keeping the old code that ought to support them), but I'd love to get a hard reminder if/when this actually starts happening.

@@ +71,5 @@
> +#endif
> +    NS_WARNING("Channel is not divertable");
> +  }
> +
> +  if (mHandler) {

I have same question as bz about how mHander could be null here.

::: uriloader/exthandler/nsExternalHelperAppService.cpp
@@ +1274,5 @@
>  
> +void
> +nsExternalAppHandler::DidDivertRequest(nsIRequest *request)
> +{
> +  // Remove our request from the child loadGroup

For fun you could put in an assert that we're on a child process...

::: uriloader/exthandler/nsExternalHelperAppService.h
@@ -7,5 @@
>  #define nsExternalHelperAppService_h__
>  
> -#ifdef MOZ_LOGGING
> -#define FORCE_PR_LOG
> -#endif

Any reason why you decided to get rid of PR_FORCE_LOG?  Without it we won't get any logging in release builds.  Some modules turn it on, some don't.  Just wondering why we're getting rid of it.

@@ +235,5 @@
>    ~nsExternalAppHandler();
>  
> +  /**
> +   * Clean up after the request was diverted to the parent process.
> +   * XXX should this be on nsIRequestObserver?

No I wouldn't put it in nsIRequestObserver.
Attachment #8385731 - Flags: review?(jduell.mcbugs) → feedback+
(In reply to Boris Zbarsky [:bz] from comment #24)
> >+++ b/uriloader/exthandler/nsExternalHelperAppService.h
> >-#ifdef MOZ_LOGGING
> >-#define FORCE_PR_LOG
> >-#endif
> 
> Why this change?

Because I'm including nsExternalHelperAppService.h in a .cpp file that gets built in unified mode. FWIW, nsExternalHelperAppService.cpp enables FORCE_PR_LOG too. Why do we want every file that includes nsExternalHelperAppService.h to have logging force enabled?
Flags: needinfo?(bzbarsky)
Attachment #8385731 - Attachment is obsolete: true
The NS_ERROR_ABORT was just me not knowing how to handle the shutdown scenario. I implemented ActorDestroy so we only send the message if the child actor is still alive.
> Why do we want every file that includes nsExternalHelperAppService.h to have logging
> force enabled?

We want logging force-enabled for all subclasses of nsExternalHelperAppService.h, not necessarily every file that includes nsExternalHelperAppService.h.

But more to the point, that macro needs to be defined before prlog.h is included, since prlog.h is what cares about that macro.  Defining it in the .cpp _after_ you've already included prlog.h from the header is totally pointless.

I'm fine with rejiggering the logging setup here, but would appreciate it being done as a separate patch, with actual testing in the affected (that is, opt) builds.

Or we could move the relevant .cpp out of unified mode, of course.
Flags: needinfo?(bzbarsky)
Ok, I'm just gonna move the affected files out of unified mode for now, since this bug needs to land quickly.
Attachment #8387844 - Attachment is obsolete: true
Attachment #8387876 - Flags: review?(bzbarsky)
Comment on attachment 8387876 [details] [diff] [review]
Divert requests to the parent, v3

>+  NS_RUNTIMEABORT("We don't support addon-wrapped download channels yet");

Why is this ok?  What will happen with:

  <a href="data:application/msword,">Click me</a>

and the like?  That doesn't seem to be a divertible channel...

r=me with that fixed to do something sane, whatever it is that this code is trying to do...
Attachment #8387876 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] from comment #33)
> Comment on attachment 8387876 [details] [diff] [review]
> Divert requests to the parent, v3
> 
> >+  NS_RUNTIMEABORT("We don't support addon-wrapped download channels yet");
> 
> Why is this ok?  What will happen with:
> 
>   <a href="data:application/msword,">Click me</a>
> 
> and the like?  That doesn't seem to be a divertible channel...
> 
> r=me with that fixed to do something sane, whatever it is that this code is
> trying to do...

You're right, I misunderstood when we'd get channels that aren't divertable. I think we should just revert to the original NS_WARNING. Jason, is that OK with you?
Flags: needinfo?(jduell.mcbugs)
> I think we should just revert to the original NS_WARNING. Jason, is that OK with you?

Yes
Flags: needinfo?(jduell.mcbugs)
Why NS_WARNING for a situation that can happen easily in normal operation?  What problem are we trying to solve with this NS_WARNING?
bz: true.  Let's scrap the warning too.

It hadn't occurred to me that data:// might be used for downloads.  I expect it's very rare, but it's legal and there's no reason to warn about it.
> It hadn't occurred to me that data:// might be used for downloads.

It's used for dynamically-created downloads.  So is blob:.
Depends on: 984194
Bug 984194 might be an uncaught regression introduced by this change, still investigating there.
Can someone verify if this bug regressed single-click downloading of ordinary and/or gzip-compressed files on B2G?

B2G works with e10s enabled, but some download-related e10s tests are disabled.
Flags: needinfo?(reuben.bmo)
Keywords: verifyme
(In reply to :Paolo Amadini from comment #42)
> Can someone verify if this bug regressed single-click downloading of
> ordinary

This bug /fixed/ that :)

> and/or gzip-compressed files on B2G?

I assume you're talking about HTTP compression here, right? Does anyone here have a link that I can use to test this? I couldn't find any gzip compressed files that aren't handled by the browser, probably because enabling gzip compression on already compressed files is not useful…
Flags: needinfo?(reuben.bmo) → needinfo?
1.4: Killing the Browser process after initiating download does not cancel download

1.4 Environmental Variables:
Device: Buri 1.4 MOZ
BuildID: 20140527000202
Gaia: 0542778892a294d224e75af4a76be5d42938bc90
Gecko: d583ae109f54
Version: 30.0
Firmware Version: v1.2-device.cfg
Flags: needinfo?
Depends on: 1084997
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: