Closed Bug 306170 Opened 19 years ago Closed 19 years ago

background download fails if server returns 200 response [was: hide button on software-update stops download]

Categories

(Toolkit :: Application Update, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.8final

People

(Reporter: mailbox, Assigned: darin.moz)

Details

(Keywords: fixed1.8)

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050826 Firefox/1.0+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050826 Firefox/1.0+

If clicking on hide-button in software-update, it closes software-update window
(that's ok!), but it stops the download.

I think, a hide-button should just hide that window and not stopping the
download-process :)



Reproducible: Always
I can confirm this. Under the help menu the text is download update and the
throbber is spinning, but after leaving for half an hour and reopening, the
update starts downloading at the same point it was at when I clicked hide.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050827
Firefox/1.6a1 ID:2005082706
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #1)
> I can confirm this. Under the help menu the text is download update and the
> throbber is spinning, but after leaving for half an hour and reopening, the
> update starts downloading at the same point it was at when I clicked hide.
> 
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050827
> Firefox/1.6a1 ID:2005082706

Just leave it for some seconds... it stops in all cases...


I forgot: The same happens with software-update in Thunderbird.
Flags: blocking1.8b4?
Are you sure it's actually stopping?

Hiding the window makes software update go into a sort-of background/silent
download mode, where it retrieves the update at a much more leisurely pace.
Every sixty seconds, it will attempt to retrieve a 64K chunk.

See bug 304381.
I've retested this with a http monitor running on my system. It does in general
get more data every 60 seconds. However a couple of times there seemed to be a
glitch. For some reason the server it was downloading the mar from
(mozilla.osuosl.org) didnt return a partial content in some places, but tried to
return the full file. When that happened the throbber on the help menu goes away
and firefox never sends another request.

There was also this in the js console.

Error: [Exception... "'Failure' when calling method:
[nsIWritablePropertyBag::getProperty]"  nsresult: "0x80004005
(NS_ERROR_FAILURE)"  location: "JS frame ::
chrome://mozapps/content/update/updates.js :: anonymous :: line 612"  data: no]
Source File: chrome://mozapps/content/update/updates.js
Line: 612
Attached the request and response headers from the fnal request.
Pretty sure this comes from this code:

http://lxr.mozilla.org/mozilla/source/netwerk/base/src/nsIncrementalDownload.cpp#499

It seems if the response is not a 206 code and the file is not totally
downloaded then the response is considered an unexpected error. The timer is
only started to track the next download if there is no error.
(In reply to comment #3)
> Are you sure it's actually stopping?
> 
> Hiding the window makes software update go into a sort-of background/silent
> download mode, where it retrieves the update at a much more leisurely pace.
> Every sixty seconds, it will attempt to retrieve a 64K chunk.
> 
> See bug 304381.

Important:
User should be informed, that download-process will be much slower when clicking
on hide-button. Actually, user wonders why download-process is slow...

(In reply to comment #4)
> However a couple of times there seemed to be a
> glitch. For some reason the server it was downloading the mar from
> (mozilla.osuosl.org) didnt return a partial content in some places, but tried to
> return the full file.

That sounds like a bug with the server side of the update system. According to
the comments in bug 291910 and elsewhere, the client code of the update system
only works with HTTP 1.1 servers that support byte-range requests.
The strange bit is that the server returned a number of partial contents before
the one that wasnt partial. Maybe mozilla.osuosl.org is actually in a load
balancing scenario and some of their servers are misconfigured.

In any case, should the background downloading give up just because of one
download failure, especially when it has been successfully downloading from that
server previously? Perhaps the server was overloaded at the time or something.
Maybe on an error it should back off and try again in 5 minutes or something?
We should handle the non-206 response gracefully at the very least.  And, we
should try to fix the server as well.  -> me
Assignee: nobody → darin
Severity: normal → major
Status: NEW → ASSIGNED
OS: Windows XP → All
Hardware: PC → All
Target Milestone: --- → Firefox1.5
Flags: blocking1.8b4? → blocking1.8b4+
Hmm.. it is actually valid for a HTTP server to respond with 200 for a range
request.  We don't want to download the entire document though, so we probably
should just try again until we get the desired 206 response.  Hmm..
Summary: hide button on software-update stops download → background download fails if server returns 200 response [was: hide button on software-update stops download]
Darin - this looks like something we should get in for 1.5b2/1.8b5.  Any updates
on status for this?
After talking with Darin, we're going to push this out to after the beta2 (first
RC).
Flags: blocking1.8b5+ → blocking1.8b5-
Flags: blocking1.8rc1?
Flags: blocking1.8rc1? → blocking1.8rc1+
Attached patch v1 patch (obsolete) — Splinter Review
Here's a simple patch that just causes us to keep trying until we get the 206
response that we're looking for.  The assumption here is that we won't end up
in an infinite loop because we expect that our mirror network will only give
out 200 responses intermittently.
Attachment #199481 - Flags: review?(cbiesinger)
Notice that this patch also fixes a bug where we do not update mChannel when we
redirect.  That was preventing us from being able to cancel the download in some
cases.  Redirects are such a pain :-/
A couple more changes I want to make to this:

  1) Make the retry logic become gradually less frequent, and eventually give up.
  2) If requesting the entire file, then support a 200 response.
Comment on attachment 199481 [details] [diff] [review]
v1 patch

ok, r=me with that 200 handling, and the minimum timeout in OnStopRequest
removed (because it's not needed there), as discussed on irc

I also want to look at the changes for your point (1) but can only do that
tomorrow.
Attachment #199481 - Flags: review?(cbiesinger) → review+
Comment on attachment 199481 [details] [diff] [review]
v1 patch

Do we need additional reviews here?
Attachment #199481 - Flags: approval1.8rc1?
We should also get this landed and verified on the trunk before we take it on
the branch.
We also need to set up a server to test this against and to make sure we don't
put ourselves in a worse state where the client is hanging or the server is
being DOSsed.
Comment on attachment 199481 [details] [diff] [review]
v1 patch

>Index: nsIncrementalDownload.cpp
...
>+// We don't want to ping the server anymore frequently than this interval.
>+#define MIN_INTERVAL 2  // seconds

We don't want to ping the server anymore frequently than a minute, likely. :)

Darin, any reason the client couldn't handle having this interval set to 1-5
minutes?
Chase: the issue is that the code needs to distinguish between an explicit
download from the user and a true background download.  In the background
download case, waiting minutes is fine.  In the explicit case, retrying more
quickly is better.  However, that is all moot if I support a 200 response when
the consumer asks to have all of the file downloaded.  That case corresponds to
the explicit download from the user.  New patch coming up.
(In reply to comment #5)
> Created an attachment (id=194190) [edit]
> Headers from the failing request
> 
> Attached the request and response headers from the fnal request.

Mossop, we're trying to reproduce this locally against mozilla.osuosl.org and
getting a 206 response code, not a 200.  Can you reproduce the 200 response code
against mozilla.osuosl.org?
Darin, we need to pin down a server installation that we can use to test that
the fixes we take on the trunk and branch put us in a better position than where
we started.  Michael (from OSUOSL) states that the headers captured in
attachment 194190 [details] are from around the time that the servers at OSUOSL were being
moved.  Perhaps that old server no longer exists.

I presume that mozilla.osuosl.org has been fixed.  Aravind is testing
ftp.mozilla.org directly now under that presumption attempting to find a site
that reproduces this.  Can you make recommendations for how we can reproduce
locally?  (Affected server versions we could install, or a hostname/IP you know
of you use to reliably reproduce this bug.)
I don't know of a server where this problem exists, but you could surely
reproduce this manually using an Apache server.  Just disable byte range support
in the Apache configuration.  For my own testing, I'm doing something like that.

I've got a patch now that I'll be posting shortly that comment #16.  I've tested
it pretty carefully, but I think that having someone else verify the patch is a
good idea ;-)
Darin - any pointers on how to disable byte-range requests?  Nothing obvious in
the docs.
Attached patch v2 patchSplinter Review
Here's the more full featured patch.  I decided to set a limit of no more than
20 retries, spaced out in an ever increasing multiple of the fetch interval. 
Given a fetch interval of 1 minute (which is what we currently have), it would
take about 3.5 hours of retrying before FF would give up.  This feels like a
pretty decent configuration.
Attachment #199481 - Attachment is obsolete: true
Attachment #200162 - Flags: superreview?(bzbarsky)
Attachment #200162 - Flags: review?(cbiesinger)
Whiteboard: [needs review cbiesinger, SR bzbarsky]
> Just disable byte range support in the Apache configuration.

It sounded pretty simple to me, but now that I look into it, I also don't see an
easy way to disable range support in Apache without hacking the code.  I don't
suggest you go down that route.  You might want to play around with the Apache
"as-is" module.  It lets you send an arbitrary response including headers and
all.  Similarly, you could use netcat (i.e., the "nc" command under Unix).

  $ nc -p 80 -l < RESPONSE

Create a file called RESPONSE that contains the headers and data that you want,
and point FF's update service to http://host:80/ to have it fetch RESPONSE.
Attachment #200162 - Flags: superreview?(bzbarsky) → superreview+
Attachment #199481 - Flags: approval1.8rc1?
Comment on attachment 200162 [details] [diff] [review]
v2 patch

-  if (chunkSize != -1)
+  if (chunkSize > 0)
     mChunkSize = chunkSize;

Should it be documented that values <= 0 will mean "reasonable default value"?

hm... maybe those vars should even be unsigned, with 0 meaning default size?

+      PRInt32 len;
+      rv = http->GetContentLength(&len);

the other part of the code can handle 64-bit lengths :-/

+  if (status == NS_ERROR_DOWNLOAD_NOT_PARTIAL)

maybe NS_ERROR_NOT_RESUMABLE, to avoid having to make up a new error code?

r=biesi
Attachment #200162 - Flags: review?(cbiesinger) → review+
> -  if (chunkSize != -1)
> +  if (chunkSize > 0)
>      mChunkSize = chunkSize;
> 
> Should it be documented that values <= 0 will mean "reasonable default value"?

Yeah, that's a good idea.


> hm... maybe those vars should even be unsigned, with 0 meaning default size?

0 for interval means no-wait, but -1 means use default interval.  so, i went
with -1 to mean the default value in both cases.


> +      PRInt32 len;
> +      rv = http->GetContentLength(&len);
> 
> the other part of the code can handle 64-bit lengths :-/

true, ok.. guess that means i should use nsIPropertyBag2::getPropertyAsInt64.


> +  if (status == NS_ERROR_DOWNLOAD_NOT_PARTIAL)
> 
> maybe NS_ERROR_NOT_RESUMABLE, to avoid having to make up a new error code?

The error code is an internal error code used to communicate with the listener,
so that it knows that this is not reason to notify the downloader's observer.  I
didn't want to reuse an error code that could be generated by other modules in
Necko.
Attached patch v2.1 patchSplinter Review
patch with revisions suggested by biesi:

  * switched to using nsIPropertyBag2 to get a 64-bit valued content length
  * updated comments in nsIIncrementalDownload.idl
Attachment #200278 - Flags: approval1.8rc1?
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Attachment #200278 - Flags: approval1.8rc1? → approval1.8rc1+
fixed1.8
Keywords: fixed1.8
Whiteboard: [needs review cbiesinger, SR bzbarsky]
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: