Closed Bug 90020 Opened 21 years ago Closed 21 years ago

Installer Downloader stops frequently

Categories

(SeaMonkey :: Installer, defect)

x86
Windows 98
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.3

People

(Reporter: jlarsen, Assigned: ssu0262)

Details

(Whiteboard: PDT+ verified on branch)

Attachments

(5 files)

My network connection is sometimes buggy, and downloads will freeze for a few 
seconds while I'm downloading. Mozilla's network installer recently responded to 
this by making me press the resume button, even though it only looses the stream 
for a second or two. I've never seen another downloader do this, and intaller 
used to not do this either. I don't remember when this showed up unfortunatly. 
Is anyone else seeing this problem?
I have seen this from home via a 56k line.  I redownloaded the stub installer 
and also hung up the line and reconnected.  This had fixed my problems with the 
installer throwing me into pause mode frequently.

One possibility is that the installer could have been corrupted during the 
download.
Using the network installer for commercial builds from 7/17 and 7/18 I receive
network error messages before each different component is downloaded. This
occurs on a 28.8 dialup connection through sera. 

The installer is forced into pause mode and can be resumed.

I have not yet tested with the mozilla installer (it takes me hours to
completely perform an install). I will test using the mozilla installer later today.
Status: NEW → ASSIGNED
Keywords: nsbeta1+, nsBranch
Target Milestone: --- → mozilla0.9.3
Attached patch patch #1Splinter Review
This is essentialy a race condition problem.  Normally at the end of the first 
file downloaded, we close the data connection (but leave the control connection 
open).  By closing the data connection, the server will respond (at least ours 
will) with:

  226 Transfer completed.

We do have a flush function to try to catch this response.  The problem occurs 
when the user's connection to the server is either really slow or the machine is 
performing some network operations that causes a bottle neck for incoming 
traffic.

When this happens, the flush function has nothing to flush in the receive buffer 
because the 226 response has not yet arrived due to the bottle neck.  The code 
simply continues and attempts to make another data connection to the server for 
the next file.

On every new data connection, we wait until a response is received because we 
are expecting one.  At this point, we pick up the 226 response from when the 
data connection was closed.  We are instead expecting a 227 response, and so we 
fail and put the user in Pause mode.

The fix is to have the flush function wait for the known 226 response to arrive 
before continuing with the next file.  Not all servers might send a 226 response 
when the data connection is closed, so a maximun # of loops is performed before 
it puts the user in the Pause state.

In my tests (56k while performing a 700k ftp upload via SERA to a nfs file 
system), I have not seen more than 15 iterations while waiting for the 226 
response.  I have preset the max # of loops to be 50.  Hopefully this should be 
long enough even on 14.4k connections.

I really can't imagine why anyone would be using anything slower than a 14.4 
modem, let alone a 14.4 modem (syd doesn't count :)
Whiteboard: seeking r=, sr=
I forgot to elaborate on what happens if a server does not send the expected 226 
response.  In this case, the installer will simply time out and continue with 
the next file.  It is only when there's an expected 226 response which 
interferes with the next file's data connection that this problem occurs.
Is there any real reason to wait for the 226?  Do we take any action in response
to it?  (I'm wondering if we can just ignore the ones we get rather than waiting
just to throw it away.)
we have to wait for the 226 response.  We can't try to ignore it later when 
we're expecting the 227.  The reason being is that it might not be a 226 that's 
in the buffer (when waiting for the 227 response).  It could be something else 
altogether.

It is safer to wait for a 226 when we know there might be one, than to try to 
guess later when we're not sure if there might be one.

The whole process of keeping the connection open is not really correct right 
now.  The patch is a good way to fix the current problem for now.
tested my patch on the mac, downloads all xpi files.
patch #2 coming up.  fixed the logic of checking for E_TIMEOUT within 
FlushCntl().

This one also builds on the mac For Me(TM) without Syd's changes from his patch.  
I'm not sure why it didn't build for Syd.

Syd, can you try patch #2 to see if it builds?
nearly r= for patch #2 with two concerns:

1. not sure what syd's changes were about and you skip them. I want his OK on
this, or otherwise see a working Mac installer with these changes (not just a
compile)

2. the inner if() you added has a redundant term making it harder to read.
instead of checking for the conditions on which you bail ( x || ( y && z )) it's
clearer if you reverse the sense to the condition on which you *don't* bail:

  if (err==nsSocket::E_TIMEOUT && bailOnTimeOutCount < bailOnTimeOut)
    err == nsSocket::E_READ_MORE; // wait a little longer for response
  else
    goto BAIL;

Address those and r=dveditz.
PDT+, please check in if you get fully comfortable and fully reviewed.
Whiteboard: seeking r=, sr= → PDT+ seeking r=, sr=
r=dveditz for patch #4
sr=mscott
patch checked into the trunk.  working on the branch now.
Whiteboard: PDT+ seeking r=, sr= → PDT+
branch works fine for me with patch 4 applied.
patch checked into the branch.  marking this bug fixed.

how to verify:
on a slow machine via a slow connection (56k should suffice, LAN might even 
work), do the following:

1) copy about a 2mb file over the net (if on LAN try a 20mb file or larger).  
Either to or from somewhere is okay.  At the same time this copy is happening, 
do 2) and 3) below.

2) using the build _without_ the fix, run the win32 stub installer (any setup 
type).  It should toss you into Pause mode automatically after warning you.  
This should happen at the beginning of each file (after the first file).

3) using the build _with_ the fix, run the win32 stub installer the same way as 
2.  The installer should not put you in Pause mode.

If you can't reproduce 2), you will proabably have to keep trying the installer.  
Either that or bog down the network on the test machine some more :).
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
QA Contact: gemal → gbush
Resolution: --- → FIXED
Verified now works on a RedHat 6.2 JP system both using en LANG and LC_ALL and 
ja_JP LANG and LC_ALL environments. 
This works great on my slow machine(win95) at home dl rate  <56k. Not one 
interruption compared to many(!) on dl 7/17 ( and previous downloads) 
need to find 20mb file to download to go through test cases.
verified on branch build 2001072406 Win 95, steps 1(dl 2mb file),2,3 as noted in 
comments above.  Network error rec'd using build without fix, build with fix did 
not put me in Pause mode
Bravo!

adding vtrunk reminder
Keywords: vtrunk
Whiteboard: PDT+ → PDT+ verified on branch
verified on trunk build 2001081305
over 56k modem- no problems/errors
Status: RESOLVED → VERIFIED
Keywords: vtrunk
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.