Closed Bug 436382 Opened 12 years ago Closed 11 years ago

Resume all resumable downloads before retrying, even for failed/canceled downloads.

Categories

(Toolkit :: Downloads API, defect, major)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: whimboo, Assigned: om.brahmana)

References

(Blocks 1 open bug, )

Details

(Keywords: verified1.9.1)

Attachments

(1 file, 2 obsolete files)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9pre) Gecko/2008052304 Minefield/3.0pre ID:2008052304

I saw this bug today while downloading a large file. The download was nearly finished when I was affected by the 24 hour reconnect. After some seconds the Download manager reports that it cannot be read the source file:


C:\Dokumente und Einstellungen\henrik\Desktop\ubuntu-8.04-desktop-i386.iso could not be saved, because the source file could not be read.

Try again later, or contact the server administrator.


After that I had to restart the download from scratch. Resuming isn't possible yet.

Steps to reproduce:
1. Save following file to disk: ftp://ftp.fu-berlin.de/linux/ubuntu/releases/hardy/ubuntu-8.04-desktop-i386.iso
2. Disconnect your network by removing the network cable
3. After some seconds the above warning appears

Instead of canceling the download it should be paused, if the server supports resuming. We still know how many bytes we have saved to disk so it shouldn't be a big deal to resume it after the network connection has established again. 

With the current behavior it's really frustrating to loose all the downloaded data and even more if it's happen for big files like ISO images.

Seems to be too late in the cycle of Firefox 3. So asking for blocking1.9.0.1
Flags: blocking1.9.0.1?
Henrik: did you confirm that the server supports resume?  We'll probably have to look at the HTTP 1.1 headers they send us, via WireShark -- I don't have much time to do that right now, sadly.  I believe we have to see if the server gives us the entity ID for the file, and, if supported, we send the chunk/range we need.  Shawn, is that an accurate summation?
Sure. Just start the download of Ubuntu and restart Firefox after a while the file is downloading. It will be resumed afterwards. So the server definitely supports resuming.
(In reply to comment #1)
Yeah - I'm just not sure how well we handle a dropped network connection.

Attached patch patch v0.1 (obsolete) — Splinter Review
Made changes as per discussion with sdwilsh and biesi on IRC. Tested the fix with the URL mentioned in the bug report and it worked fine. When the n/w cable was plucked, the download was paused and later resume happened as expected.

Regards,
Brahmana.
Assignee: nobody → om.brahmana
Status: NEW → ASSIGNED
Attachment #323138 - Flags: review?(sdwilsh)
Attachment #323138 - Flags: review?(cbiesinger)
We should probably have an automated test case for this somehow too.  I think we can shut down the http server, which might simulate this.
There will be steps that have to be captured in the automated test:

Test-1:
1. Start the resume with some data. (preferably large amounts of data)
2. Start a resumable-download
3. Stop the server
4. Check that the state of the download is paused
5. Start the server again
6. Resume the download and make sure it resumes successfully.

Test-2:
1. Start a non-resumable download
2. Stop the server
3. Make sure the state of the download is failed.

Test-3:
1. Start a resumable download
2. Stop the server or kill the request somehow with NS_ERROR_NOT_RESUMABLE status
3. Make sure the download state is failed.

I am not aware of a way to carry out Test-3. Some suggestion on that is needed.

Regards,
Brahmana.
I'm not sure test 3 would even happen in the wild.  With that said, I strongly suspect you can make it appear to be non-resumable when you set the server back up.  Edward may be of some help there.
Comment on attachment 323138 [details] [diff] [review]
patch v0.1

>+  if (NS_FAILED(aStatus)) {
>+
nit: lose the newline

>+    // We pause the download in case of errors, only if the download is
>+    // resumable and the error itself does not say that download cannot be
>+    // resumed. If either of these conditions fail, we fail the download.
>+
ditto

>+    if (IsResumable() && aStatus != NS_ERROR_NOT_RESUMABLE) {
>+      nsresult rv = Pause();
>+      NS_ENSURE_SUCCESS(rv, rv);
>+      return rv;
just return Pause

>+    } else {
>+      return FailDownload(aStatus, aMessage);
>+    }
and lose the braces

With that said, I don't think the UI experience here is correct.  We suddenly pause a download (seemingly randomly to the user), and then what happens to it?
Attachment #323138 - Flags: review?(sdwilsh)
(In reply to comment #8)
> >+    } else {
> >+      return FailDownload(aStatus, aMessage);
> >+    }
> and lose the braces
(Don't even need the else statement because the if statement ends with a return.)
(In reply to comment #9)
> (Don't even need the else statement because the if statement ends with a
> return.)
This is why you are a peer :)
> With that said, I don't think the UI experience here is correct.  We suddenly
> pause a download (seemingly randomly to the user), and then what happens to it?
> 
I asked you about this at the time of discussion itself. I suggested a notification being given  to the user and now I re-suggest that. We can tell the user that we encountered an error while downloading the file and hence we have paused it so that the user can resume it when he thinks the cause of the error is rectified. Of course the message can't be like this, but something that will convey this idea. If need be we can put the download file name to tell which download was paused by us.

A not so pleasant behavior would be seeing several dialogs saying the similar things when the user is downloading several files and a error like network going off happens.

This is a UI decision and hence some one can talk to Madhava, probably, and also put him on the CC list.

And about the code for this sort of control flow: Instead of doing "return Pause();" we can write a PauseWithNotification() which will Pause the download and then show the notification -- Pretty similar to the FailDownload control flow.

I will implement this and submit a WIP patch which you can look at and make suggestions for the strings and other details.

Regards,
Brahmana.
Keywords: uiwanted
This is one of the slogans we are using to promote Firefox 3:

"A pause and resume feature means there’s no need to wait for a download to finish before you disconnect."

So it's a broken feature which should be listed within the release notes. 
Keywords: relnote
Summary: Broken network connection cancels resumable downloads → Disconnecting network cancels resumable downloads
(In reply to comment #12)
> This is one of the slogans we are using to promote Firefox 3:
> 
> "A pause and resume feature means there’s no need to wait for a download to
> finish before you disconnect."
> 
> So it's a broken feature which should be listed within the release notes. 
Hardly!  We make no claim to automatically pausing when you disconnect (but we do this if you go into offline mode first or get told by the OS that we are going to sleep).

I agree with sdwilsh - this is not relnote worthy.

At the same time, if the network connection goes down, shouldn't we be detecting an offline event? I would assume that going from connected -> offline would pause all existing downloads. What's actually generating the cancel event here? Some sort of network timeout? What happens if the user clicks "Work Offline"?
Keywords: relnote, uiwanted
Flags: in-testsuite?
Flags: in-litmus?
(In reply to comment #14)
> At the same time, if the network connection goes down, shouldn't we be
> detecting an offline event? I would assume that going from connected -> offline
> would pause all existing downloads. What's actually generating the cancel event
> here? Some sort of network timeout? What happens if the user clicks "Work
> Offline"?
I was talking with biesi and Brahmana on irc about this, and I guess what happens is that we an error code from the progress listener before we'll get the offline event, which is why this happens.  If we got the offline event first, we would properly pause the download (we have an automated test for this as far as I know).
Comment on attachment 323138 [details] [diff] [review]
patch v0.1

OnStateChange needs this too
Attachment #323138 - Flags: review?(cbiesinger) → review-
Hm, selecting "Work Offline" in the middle of a download didn't automatically pause that download - should that work?
Oh that's lovely.  I thought that was fixed, but apparently not:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/downloads/src/nsDownloadManager.cpp#1914

The only unit test I could find is the one that tests when the computer goes to sleep.  Shawn is now sad :(
(In reply to comment #17)
> Hm, selecting "Work Offline" in the middle of a download didn't automatically
> pause that download - should that work?
Filed bug 437415.
Flags: blocking-firefox3.1?
Depends on: 437415
Keywords: uiwanted
Summary: Disconnecting network cancels resumable downloads → Pause resumable downloads on all server errors except NS_ERROR_NOT_RESUMABLE
Flags: wanted1.9.0.x?
Flags: blocking1.9.0.1?
Flags: blocking1.9.0.1-
With the fix on bug 437415 the download doesn't get canceled anymore. After plugging off the network cable it stays in active state and continues immediately when the network connection is back.
If you are waiting a bit longer while there is no network connection you have to manually pause and resume the download to continue.
(In reply to comment #20)
> With the fix on bug 437415 the download doesn't get canceled anymore. After
> plugging off the network cable it stays in active state and continues
> immediately when the network connection is back.
Are you saying that that bug fixed this?

(In reply to comment #21)
> If you are waiting a bit longer while there is no network connection you have
> to manually pause and resume the download to continue.
Can you elaborate please?
(In reply to comment #22)
> (In reply to comment #20)
> > With the fix on bug 437415 the download doesn't get canceled anymore. After
> > plugging off the network cable it stays in active state and continues
> > immediately when the network connection is back.
> Are you saying that that bug fixed this?

As we talked on IRC: No, it's only half-fixed. We still have to wait for the Necko patch so we can detect the network loss and switch automatically to pause mode.

> (In reply to comment #21)
> > If you are waiting a bit longer while there is no network connection you have
> > to manually pause and resume the download to continue.
> Can you elaborate please?

As what I can see by doing a quick reconnect, the download resumes immediately. But waiting for around 30 seconds before plugging-in the cable again, it doesn't resume automatically.
Product: Firefox → Toolkit
Flags: wanted1.9.0.x? → wanted1.9.0.x-
Depends on: 438040
Even with both the blocking bugs fixed, I think this bug still remains valid. The blockers take care of two things:
1. Necko should go offline when network goes off (and notify DM about this)
2. When we get the offline notification we should pause the download.

Even with this done we might get other server errors (apart from NS_ERROR_NOT_RESUMABLE), may be because of the network being busy and we not getting a timely response, or the server going down or some such reason. In such a case we should pause the download -- either with or without notification as I have mentioned in #c11. 

Shawn, Edward, what do you think?
Also, do we need to talk to UI people?

Regards,
Brahmana.
This patch has the rough implementation of the PauseWithNotification function which I had mentioned (long ago) in a comment.

Kindly ignore the commented part in the function PauseWithNotification and also the formatting.

With this code, when we get an error apart from NS_ERROR_NOT_RESUMABLE and the download is actually resumable, we pause the download with the sliding notification (right bottom of the screen). Else we fail the download. Fail download also notifies in the same way.

Is this fine?

Regards,
Brahmana
Attachment #323138 - Attachment is obsolete: true
Please don't use any hard-coded string. That makes it impossible to localize the notification messages. 
(In reply to comment #26)
> Please don't use any hard-coded string. That makes it impossible to localize
> the notification messages. 
> 

Kindly ignore these things. Thats what I had mentioned in the comment at the starting of the new function. I just want to know if this is the right approach. Also the user experience has to be approved. Once that happens I will submit a proper patch.

Regards,
Brahmana.
Attachment #334705 - Flags: review?(sdwilsh)
So, I had an idea here that I think will be a better user experience.  We should still mark the download as failed (that's what happened), but when clicking retry, we should try to resume it instead.  That means we should make nsDownloadManager::RetryDownload attempt to resume, instead of start from scratch, and start from scratch if resuming fails.  I think this covers the issues listed in this bug - correct me if I'm wrong.  Does this seem like a more sensible approach?
Shawn, I'm not familiar in detail with the download manager back-end. So please let me ask one question. What normally happens with downloads when they fail, e.g. server stopped responding? The already downloaded content which resists in the .part file will be deleted, right? If it's so should we really use the failed state here? How do we differentiate between these two possible situations?
I don't see the download manager removing the part file, but at the same time, I can't find the code in nsExternalHelperAppService that manages it either.  I think you are right though, and we should address that here as well.
exthandler intentionally doesn't remove the file on download errors, specifically so that dl manager can resume the download.
Attachment #334705 - Flags: review?(sdwilsh)
As biesi has stated it, we do not delete the .part file. The code to move the contents from .part file to the one without .part extension happens in ExecuteDesiredAction(). So in case of server stops responding in the middle, ext app handler will not move the contents from .part file to the actual file and hence there is nothing to be fixed here.

That apart, I am with Henrik regarding the need to have a state apart from FAILED. I think the PAUSED is good enough. It is better for DM to hide as many network level issues and make downloading as seamless as possible. So as the bug title states we just pause for all server errors, but notify the user that there was a problem and he can try resuming the download later.

Since this is more of a UI decision may be one of the UI folks need to be notified.

Regards,
Brahmana.
(In reply to comment #28)
> So, I had an idea here that I think will be a better user experience.  We
> should still mark the download as failed (that's what happened), but when
> clicking retry, we should try to resume it instead.  That means we should make
> nsDownloadManager::RetryDownload attempt to resume, instead of start from
> scratch, and start from scratch if resuming fails.  I think this covers the
> issues listed in this bug - correct me if I'm wrong.  Does this seem like a
> more sensible approach?

I think that's definitely reasonable for cases where there was an error other than network disconnect. I actually like it a lot more than notification.

In the case of network disconnects, we should just pause, as the user will understand that they're disconnected from the network pretty quickly (when the rest of their web browser doesn't work) and it seems like a smart action to take in that case.

We wouldn't hold the release for this, but a string-free patch with tests would be considered for approval.
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Target Milestone: --- → mozilla1.9.1
Attached patch patch v0.1Splinter Review
As per the user behavior described in earlier comments, now for every download retry we just try to resume the download if it had failed earlier and it is resumable. If resume works, well and good and if it fails we retry the download from beginning.

sdwilsh : I had talked to you about moving ResumeRetry() from DM to DL, but that does not solve the problem here. It will require us to move CancelDownload() and RetryDownload() or else make calls from DM to DL and back to DM again and neither sound like a good approach. The calls are illustrated below.

Current flow:

nsDM->ResumeRetry()
	nsDL->Resume() --- (possible end)
	nsDM->Cancel()
	nsDM->Retry()

If ResumeRetry() was moved to nsDL

nsDM->RetryDownload()
	nsDL->ResumeRetry()
		nsDL->Resume() --- (possible end)
		nsDM->Cancel()
		nsDM->Retry()

Regards,
Srirang
Attachment #334705 - Attachment is obsolete: true
Attachment #357725 - Flags: review?(sdwilsh)
Comment on attachment 357725 [details] [diff] [review]
patch v0.1

r=sdwilsh

not so sure we can get an automated test for this though :(
Attachment #357725 - Flags: review?(sdwilsh) → review+
(In reply to comment #35)
> (From update of attachment 357725 [details] [diff] [review])
> r=sdwilsh

Thank you. This is my first v0.1 patch getting a r+.. :-)
> 
> not so sure we can get an automated test for this though :(

Nope. I talked to bz about this and unless we catch of the channel's stream listener we are not assured that we will be notified at the right time. 

Even if we get that listener it is not possible , not until the http server has sleep support. So this will take some time.

--Srirang.
Duplicate of this bug: 474065
Summary: Pause resumable downloads on all server errors except NS_ERROR_NOT_RESUMABLE → Resume all resumable downloads before retrying, even for failed/canceled downloads.
http://hg.mozilla.org/mozilla-central/rev/4f898e0f5ef6
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: mozilla1.9.1 → mozilla1.9.2a1
I have to reopen the bug because the fix doesn't seem to work under following conditions:

1. I still can see a failed download when I remove the network cable for about a minute. The download is paused correctly, but when we want to resume I get the mentioned failure.

2. You can also see this failure while having multiple network connecions activated. Enable your Wifi adapter and connect to a network while the network cable is connected too. When you now start a download and remove the cable we don't switch into the pause state and when plugging the cable in again, we fail.

While item 1 can be happen more often, I think item 2 is more an edge case, but should also be fixed. Probably in another bug?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Can you file a new bug for that? Even if there was a problem with this fix, trying to keep track of the followup in the same bug is just confusing.

(I think there are very few cases when REOPENing a FIXED bug is the right thing to do, and they all involve a backout of the committed patch.)
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Ok, this is bug 478793 now.
Blocks: 478793
Status: RESOLVED → VERIFIED
Attachment #357725 - Flags: approval1.9.1?
Attachment #357725 - Flags: approval1.9.1? → approval1.9.1+
Verified fixed with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090416 Shiretoko/3.5b4pre ID:20090416030924

Srirang, would you have time to create an automated test for your fix? That should be possible.
Last time I talked to bz about a test for this we figured out that delaying HTTP response was needed. Waldo is working on that. Its probably landed on trunk where as this had gone on branch. Waldo also says in bug 396226 that the fix will land on 191. I will put up a test here once that lands.

Somebody, Shawn or Edward, correct me if I am wrong about the dependency on Waldo's bug.
Yeah, Waldo's bug should help you out with the test AFAIK.
Ok, so adding to the dependency list for now.
Depends on: 396226
You may also want bug 484027, which would allow you to "steal" control of a response stream from the server and arbitrarily write data to it or even close it at any time.  I skimmed this bug, and it sounds like you might need that level of control, but I can't quite tell whether you do actually need to be able to, by appearances, shut down the server mid-request -- it's not possible to shut down the server except in a clean manner once all responses have been processed as it stands now.
You need to log in before you can comment on or make changes to this bug.