Closed Bug 764377 Opened 12 years ago Closed 7 years ago

The complete MAR files take a long time to download in the background via the application update process

Categories

(Toolkit :: Application Update, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: wgianopoulos, Unassigned)

Details

This was split off from bug 764269.

The issue is that it takes an incredibly long time to download the complete MAR file for Linux in the case where the partial update does not apply correctly.

It seems to download in ~293KB chunks with a 1 minute pause between each chunk.  This results in 75 chunks so the download takes 75 minutes to complete even though I can download it using either an http or https URL directly via the browser in around 30 seconds.

Although I filed this as an Applicaiton update bug, based on the time it started occurring and the similarity to the symptoms I was seeing in bug 758899, this may be a regressions from bug 729133.
This has happened after bug 759065.  I think that this is sort of intentional, because the code throttles the download speed intentionally, however, I don't know what the reasoning behind doing that is.  Robert, do you know why we do this?

I agree that this is sort of stupid, at least for the case where a wizard is showing up.  Previously, I believe that we wouldn't be showing the wizard when a complete update would download after a partial failed.

The fix is easy, I just need Robert to tell me what exactly should happen here.
Blocks: 759065
Blocks: bgupdates
well perhaps doing it slower is desired, but delaying it from 30 seconds to 75 minutes seems rather lame.  Makes it highly likely that I will close the browser before it applies.  Especially since I am always switching between Nightly, Aurora release and my own builds during my testing.
So to provide a bit more detail, when we do the fallback we call downloadUpdate with the background parameter set to true <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#1027>, which later on <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#2975> causes us to use a 600 second interval when fetching chunks during the download.
I noticed this too when testing bug 759065 and mentioned it to Ehsan at that point. I was testing on Windows then.  I don't think it used to be that slow, although I'm not sure why it would be slower now.
OS: Linux → All
Summary: The complete Linux MAR files take a long time to download via the application update process → The complete MAR files take a long time to download via the application update process
(In reply to Ehsan Akhgari [:ehsan] from comment #3)
> So to provide a bit more detail, when we do the fallback we call
> downloadUpdate with the background parameter set to true
> <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> nsUpdateService.js#1027>, which later on
> <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> nsUpdateService.js#2975> causes us to use a 600 second interval when
> fetching chunks during the download.

It would seem to me to make more sense than using a constant delay, to calculate the delay based ont eh time it took to do the previous trunk.  For example if you take that time and then multiply it by 9, that would make the backgrounded download average to using only 10% of the available bandwidth while still permitting me to download the entire update in only 10 times the normal time (5 minutes vs. 30 seconds) rather than it taking almost an hour and a half.  This would also permit the rate to adjust based on other network traffic.  I am not sayung 10% is the correct utilization goal either, just citing this as an example of how to do this in a smarter manner.
It should only throttle the download when downloading in the background and not when UI is displayed. The reason for this is that the mirror network gets overloaded on the release channel when we release a new update.
(In reply to Robert Strong [:rstrong] (do not email) from comment #6)
> It should only throttle the download when downloading in the background and
> not when UI is displayed. The reason for this is that the mirror network
> gets overloaded on the release channel when we release a new update.

How can we tell if the UI is being shown in that case?
(In reply to Robert Strong [:rstrong] (do not email) from comment #8)
> The UI code can restart the download unthrottled.
> http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/content/
> updates.js#1257

But this happens in the service: <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#1027>  Restating the download seems rather dirty...  Can we somehow check to see if the UI is displayed from within the service?
(In reply to Ehsan Akhgari [:ehsan] from comment #9)
> (In reply to Robert Strong [:rstrong] (do not email) from comment #8)
> > The UI code can restart the download unthrottled.
> > http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/content/
> > updates.js#1257
> 
> But this happens in the service:
> <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> nsUpdateService.js#1027>  Restating the download seems rather dirty...  Can
> we somehow check to see if the UI is displayed from within the service?
Not reliably. Also, the download is paused and then started again and continues from where it left off.
(In reply to Robert Strong [:rstrong] (do not email) from comment #10)
> (In reply to Ehsan Akhgari [:ehsan] from comment #9)
> > (In reply to Robert Strong [:rstrong] (do not email) from comment #8)
> > > The UI code can restart the download unthrottled.
> > > http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/content/
> > > updates.js#1257
> > 
> > But this happens in the service:
> > <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> > nsUpdateService.js#1027>  Restating the download seems rather dirty...  Can
> > we somehow check to see if the UI is displayed from within the service?
> Not reliably. Also, the download is paused and then started again and
> continues from where it left off.

Not if the wizard is open when the partial update is being downloaded.  The partial update might fail to apply in that case, which causes the complete update to be downloaded again while the UI is still visible...

Oh well, I guess I'll resort to restarting the download in any case. :/
(In reply to Robert Strong [:rstrong] (do not email) from comment #6)
> It should only throttle the download when downloading in the background and
> not when UI is displayed. The reason for this is that the mirror network
> gets overloaded on the release channel when we release a new update.

But throttling by making it 150 times slower seems really excessive.  Besides it keeps more active TCP sockets active on the mirror network which is in some ways even worse.
(In reply to Bill Gianopoulos [:WG9s] from comment #12)
> (In reply to Robert Strong [:rstrong] (do not email) from comment #6)
> > It should only throttle the download when downloading in the background and
> > not when UI is displayed. The reason for this is that the mirror network
> > gets overloaded on the release channel when we release a new update.
> 
> But throttling by making it 150 times slower seems really excessive. 
> Besides it keeps more active TCP sockets active on the mirror network which
> is in some ways even worse.
It downloads in chunks, doesn't keep the TCP sockets alive aiui, and has already been optimized for the mirror network. Let's keep the discussion in this bug about this bug. Thanks
Well you know that is worse doing a mess of extra connection set-ups tear-downs is not the most efficient way from the server perspective to support simultaneous downloads either.
And this is the kind of thing I deal with on my real job, so I am not just talking out of my ___.
Which is great! Please feel free to file a mirror network bug so this can be discussed with those that recommended the current optimization, new optimizations can be decided upon, and hopefully this can be optimized further than it already is. Thanks
In case it isn't clear, I am all for optimizing this further but it needs to be done in conjunction with the people that manage the mirror network who will also work with those that manage the mirror servers. We can't and shouldn't try do that in this bug.
OK. i understand that.  How do I file a mirror network bug (but I think the fix might really need to be in the client)  just making sure the mirror network is not configuring wervers to accept more connections than they can really support because I want ot use the connections refused form a server to trigger client backoff.
And I apologize for my **** typing there. ;-)
Whole point here is that as far as mirrors are concerned, perhaps the current way of doing things makes sense for release builds but makes zero sense for nightlies (which is where my issue lies).  I really don't want to have a release and a nightly codepath though because that means when it comes time to release that codepath has not been tested with all the updates since the last release, but if you asked the mirror sites they would be all over that solution.
So, I kind of think I should come up with a better thought out way to do this as part of this bug resolution.
Under mozilla.org -> FTP: Mirrors

For the bug you file, keep in mind that the mirror is selected using bouncer. I wholeheartedly agree that chances are there would need to be client changes but we first need to make sure that any changes are acceptable to those that manage the mirrors. After that an app update bug can be filed to implement those changes.

Also, from a code testing perspective we definitely want to use the same code path for testing and that includes the nightly channel so we don't inadvertently lose Aurora users due to a bug hidden by using different code paths.
(In reply to Bill Gianopoulos [:WG9s] from comment #21)
> So, I kind of think I should come up with a better thought out way to do
> this as part of this bug resolution.
Please, let's keep this bug about the throttling that occurs and leave the rest to the mirror bug. Thanks
OK.  Well, if that is your take on what belongs in this bug and what does not, then ANY such throttling is unacceptable.  Taking a 30 second download and making it take almost an hour and a half is just completely brain-damaged.
I am saying that for this bug when there is UI displayed we should not be throttling!
I filed this bug.  I know what it is about.  Even in the case where the UI is not displayed it makes no sense to make a 30 second download take almost an hour and a half, and most likely will never complete because tomorrow you will be trying to download a different nightly.
This bug is about the current behavior probably being wrong for Nightly and Aurora, regardless of the status of the UI dispaly.
And I manage this component. For the case where we are doing things in the background we download throttled due to the reason I stated in comment #6. We do this on the nightly channel for the reason I stated in comment #22. It does complete the download the next day because we don't try to download a different nightly because of bug 353804.

Once again, I am all for improving this BUT it MUST first be evaluated by the people that manage the mirrors as I stated in comment #17 and you agreed to in comment #18. I am done with discussing this further with you and I am changing this bug to be what you want it to be about though once again, this first has to go through the people that manage the mirror networks.
No longer blocks: bgupdates, 759065
Filed bug 764587 for the throttling when the UI is displayed.
Drivers, regarding tracking-firefox15? - this issue has been around since Firefox 2 and I don't think it needs to be tracked. Bug 764587 is a regression and should be tracked.
(In reply to Robert Strong [:rstrong] (do not email) from comment #30)
> Drivers, regarding tracking-firefox15? - this issue has been around since
> Firefox 2 and I don't think it needs to be tracked. Bug 764587 is a
> regression and should be tracked.

Well. in actuality this could really be correct, because this only became a huge issue under Linux because of bug 764269.
But then again would this issue not impact people running nightlies who try do not necessarily update daily?
And I am fairly certain that includes me and this HAS DEFINITELY NOT been an issue since Firefox 2.
This has only been an issue in the last week or so and to try to say this has been going on since Firefox 2 and I have no idea what I am talking about is just  completely disrespectful of my input.
The regression that was caused by bgupdates is covered by Bug 764587 and is likely what you saw. There has been lots of discussion in this bug about other cases that have been the case since Firefox 2 and it has been the case that we download throttled when the UI isn't displayed (background download) specifically because the mirror network was overloaded... for example, comment #26. Since you want this bug to be about the cases (see comment #26 and others) other than the one that is a regression from bgupdates I filed a new bug to handle the regression and am leaving this bug to be about improving things as you have stated in many of the comments. I don't think that is disrespectful whatsoever and I didn't intend to offend you. I do understand why you might see all of the things that you pointed out as related to the regression but I assure you that they are separate and hope you can respect my judgement in this regard.
(In reply to Robert Strong [:rstrong] (do not email) from comment #35)
> The regression that was caused by bgupdates is covered by Bug 764587 and is
> likely what you saw. There has been lots of discussion in this bug about
> other cases that have been the case since Firefox 2 and it has been the case
> that we download throttled when the UI isn't displayed (background download)
> specifically because the mirror network was overloaded... for example,
> comment #26. Since you want this bug to be about the cases (see comment #26
> and others) other than the one that is a regression from bgupdates I filed a
> new bug to handle the regression and am leaving this bug to be about
> improving things as you have stated in many of the comments. I don't think
> that is disrespectful whatsoever and I didn't intend to offend you. I do
> understand why you might see all of the things that you pointed out as
> related to the regression but I assure you that they are separate and hope
> you can respect my judgement in this regard.

I really don't think you understand what I want to see.  I would like to find a forum under which I could propose something that would work better both for the Firefox client ap[application and the mirror network.  But people seem to be pointing me to one direction or another.  How do I get to engage everyone to solve the issue?
I have only pointed you in one direction to get this started... specifically (comment #22)
1. File a mozilla.org -> FTP: Mirrors bug (see comment #22) so the people that manage the mirrors can evaluate your recommendations and finalize them.
2. After they are finalized bug(s) should be filed in toolkit -> app update for client side changes (if any), mozilla.org -> FTP: Mirrors for changes (if any), etc.
OK so then I filed this bug so what is your non-useful answer here?
The issue is that this bug is not really a mirrors issue but a design issue of how downloads should work.  We need to have a valid design of how this should work which I understand should probably include the mirror sites int he design, but you have not provided me with a way to initiate that k,ind of design discussion.
It is already in comment #37. From your comments I gather that you want me and the other developers of app update to change the behavior. I have stated several times that this would first need to be evaluated by the people that manage the mirror network before we could implement any of these changes and how you can get that ball rolling. The design needs to be evaluated and finalized by the people that manage the mirror network. To initiate that design discussion file the bug like I have asked several times.
(In reply to Robert Strong [:rstrong] (do not email) from comment #40)
> It is already in comment #37. From your comments I gather that you want me
> and the other developers of app update to change the behavior. I have stated
> several times that this would first need to be evaluated by the people that
> manage the mirror network before we could implement any of these changes and
> how you can get that ball rolling. The design needs to be evaluated and
> finalized by the people that manage the mirror network. To initiate that
> design discussion file the bug like I have asked several times.

OK fine. Take no ownership and tell me that the only way to fix this is to interface with a new group of people I do not have time to because i have a real job.  So evidently Mozilla is Really not at all interested in fixing this issue?
Like I said in bug 764587 comment #11 "after I have some breathing room I'll even file the bug if you don't". I have also stated that I am all for improving this in at least one comment in this bug. You are making claims that are entirely not true. I and others are busy with many other things and it is a fact that this bug which is about changing behavior that has been around since Firefox 2 is not a higher priority than the things we are working on currently. I can understand your frustration but it is getting tiresome having you attribute things that are just not true.
well like I said give me input as to how to engage the mirror people about this.  what i want to propose should be a win-win.  But filing a bug seems lame because I don't thing the issue is on their side.
The ONLY reason the app update code has the throttle that makes it download slowly in the background is because of the mirrors. The people that manage the mirrors requested that the app update code add the handling that throttles the download and the existing behavior has been tuned per their requirements. So yes, it is a problem with the mirrors and if I (and other app update devs previously as well) had it my way we wouldn't do any throttling at all. The reason the people that manage the mirrors have to be engaged is because they are the one's that created the requirements for the current implementation since as I just stated, we would prefer not to throttle at all and it is the people that manage the mirrors that want app update to throttle. So yes, they need to be involved since they need to evaluate your recommendations (I can't do it myself since I am not as familiar with our mirror network and I hope you can understand why I am not in the same way that you wouldn't expect them to be as familiar with app update) and finalize their requirements and the easiest way to engage them is to file a bug in their component. Like I said, I will do so when I have some breathing room. Thanks!
As a matter of fact I'm going to move this bug over to mozilla.org -> FTP: Mirrors so you can discuss your proposal with the people that manage the mirror network so they can verify that your proposal will work with our mirror network, etc.
Assignee: nobody → server-ops
Component: Application Update → FTP: Mirrors
Product: Toolkit → mozilla.org
QA Contact: application.update → mrz
Version: Trunk → other
Moving this to Infrastructure.
Assignee: server-ops → bhourigan
Component: FTP: Mirrors → Server Operations: Infrastructure
QA Contact: mrz → jdow
I wish to apologize for my previous ranting in this bug.  I am diabetic and had missed a meal.  When my blood sugar gets too low I get easily irritated about stupid things and become quite stupidly argumentative.  I should no better than to skip a meal.
Bill, apology accepted and btw: I personally know you have been a valuable community member for quite a long time (both in bugs and the forums) and I really do appreciate your contributions.
Assignee: bhourigan → server-ops-infra
I'm reassigning this back to FTP->Mirrors. I'm dropping this to normal because major and higher bugs page the oncall. I read through the bug and don't see an immediate actionable from IT. So please don't bump the severity back up without assigning to an individual. Thanks!
Assignee: server-ops-infra → server-ops
Severity: major → normal
Component: Server Operations: Infrastructure → FTP: Mirrors
QA Contact: jdow → mrz
Well although the throttling of background updates was originally to help the mirrors, i kind of helps the user also.  I know if I were watching a streaming video I would not want to have it pause because A download of an update was hogging all the bandwidth.

So I am not sure what to do here.  My thought was to dynamically determine the chunk-size and back-off interval between chunks based on performance of the download.

My thought was to start with current chunk size but also have a target connection lifetime number, which should take into account the time you want to have the connection live on the server and the maximum amount of time we want to be chewing up bandwidth on the client for a function going on that the user is not aware of.  So adjust the chunk-size based on the time to get the first chunk so as to hit the target connection lifetime number.

From that point on I would like to have both a short and long back-off time between connection attempts to get the next chunk.  The short time would be a multiple of the target connection lifetime number and be used as the initial back-off time.  So, for example if we decided the connection should last 5 seconds then maybe 3 times longer (or any number you want to choose) before the next connect.

But then i we get a connection refused at any point we switch to the long interval (i believe it is currently 60 seconds) interval between chunks until we get some number (like perhaps 3) of chunks in a row with no connections refused, the idea being to if the server can handle the load try to get people finished so they are not part of the problem if it gets busier later on.

The whole idea here is that if the client is not busy and the server is not busy, the current throttling is helping no-one and could actually make things worse of the server-side by not getting rid of this person who could have been gone in 30 seconds and keeping him around till and hour and a half later.

That all said, if the current method is working well enough for backgrounded updates, perhaps I am trying to solve a non-existant issue.  I really run with automatic-updates disabled because I do nightly testing and am often trying tind regressions windows and want to be 100% sure what version I am running and the browser automatically getting me to a different version is NOT what is required to do this.

Therefore I ONLY ran into this being so slow because recent changes resulted in the fallback from partial to full updates if done from the UI was being backgrounded when it previously was not and having symlinks in the plug-ins folder under Linux causes updates to fail.

As mentioned previously I guess this behavior has been unchanged since Firefox 2 time, but then the image was much smaller then so even throttled I be the download was only about 20 minutes on my connection.
(In reply to Ashish Vijayaram [:ashish] from comment #49)
> I'm reassigning this back to FTP->Mirrors. I'm dropping this to normal
> because major and higher bugs page the oncall. I read through the bug and
> don't see an immediate actionable from IT. So please don't bump the severity
> back up without assigning to an individual. Thanks!

I am not even sure this needs to be at the current level except for getting a consensus from people running the mirror servers that this approach is worth pursuing then perhaps an enhancement bug on the client?
Assignee: server-ops → nobody
Component: FTP: Mirrors → General
Product: mozilla.org → Core
QA Contact: mrz → general
Version: other → unspecified
Assignee: nobody → server-ops
Component: General → FTP: Mirrors
Product: Core → mozilla.org
QA Contact: general → mrz
Version: unspecified → other
What is the proposed change on the mirrors side of this?  I see no difference between throttled updates and unthrottled updates from the mirror POV.  They should handle either with no problem.  It is my understanding that the throttling is done to help clients with slower internet connections (who are still a large userbase globally).

At any rate, +1 from IT for making client changes here if it is deemed necessary.
The original bug where IT requested client side changes so it throttles is bug 352853 in which the interval between chunks downloaded was increased from 60 seconds (likely a reasonable value to limit the affect on the client) to 600 seconds and the chunk size was changed from 65536 to 300000.

I don't think I am grokking comment #52 properly because from my reading it seems that it states that the mirror network can handle unthrottled updates. Perhaps the mirrors can handle the load and it is bouncer where the breakdown occurs? iirc there have been times where bouncer doesn't provide a mirror to download an update (iirc Dave Miller stated that it sent back a web page in this case) which is most noticeable when we have just advertised an update unthrottled on the server side.

So, can we revert the change from 60 seconds to 600 seconds to the interval made in Bug 352853 without overloading the mirrors?

Is there a more appropriate chunk size we should use?

Bill made several suggested changes in this bug... would those changes work with and not overload the mirror network?
Firefox hits bouncer before this client side throttling comes into effect, so the load on bouncer would be unaffected by this change.
Agreed. I recall being told that when bouncer does fail it sends back an html page even when requesting a update mar. I'm not sure what to do about that though a couple of things come to mind. First, we should just throttle AUS unless bouncer can always provide a url and / or AUS needs to have knowledge as to whether bouncer will successfully hand out a url.
Let me see if I can lay this basket of snakes out straight:

Firefox hits AUS to see if there's an update - this could be throttled (it could say there isn't, even when there really is), and in fact *is* throttled shortly after a release, to give us a chance to see if we have any major regressions not caught in nightly/aurora/beta channels.

If AUS reports there is an update, Firefox hits download.mozilla.org with the right query string to get directed to a mirror. If AUS said there was a partial update, then it should ask for that... this is usually 2-4MB. Otherwise, a full update.

Firefox begins background-downloading the update. This is throttled in 300KB chunks. The throttle *interval* differs based on what channel you're tracking. Release is 300KB/10 min, Nightly is 300KB/1 min... not sure about Aurora or Beta.

These throttle values were decided on long ago, and benefit 2 main things:
1) Client bandwidth (don't want to hog and kill the user's other stuff)
2) Mirror bandwidth usage (don't want to flood/kill them)

I can't speak to the former, but the latter is still considered a problem. For example North America is believed to be under-sourced, and thus only 50% of North American users are actually directed to a North American mirror... the other 50% can get any mirror worldwide. This is something Mozilla could perhaps commit to fixing by supplementing with paid-for bandwidth... I would not be the person to make that call.

The bug referenced above (bug 352853) is specifically referencing issues that mirror admins had with the *number* of connections required at a chunk size of 64KB. 300KB was chosen as a compromise between number of requests needed and dial-up bandwidth. 100KB was deemed "still too many connections".

This was 6 years ago... broadband penetration is higher now, so client bandwidth should be less of a concern (I hope). Mirrors even then would have been better served by a larger chunk size, if perhaps not a higher overall transfer rate (which that bug actually doesn't address at all).


I would like to find a way to make updates faster, but I think that would really come down to asking some mirror admins if they could handle an increase in traffic. It's obviously the same overall volume, but in a smaller time window... their average Mbps would be roughly the same, but their 95th percentile Mbps would likely go up, as would server load during peak times. 


My thoughts:

1) 600KB in 20 min is better for mirrors than the current situation (300KB/10min) because it's probably not going to affect their peak traffic or server load much, but is 1/2 as many connections overall. It's hopefully a smaller problem for client bandwidth today too, given 6 years of broadband adoption. It's might shave off up to 10 minutes, in cases where we "just miss" finishing the update under the current 300KB throttle (ex: 301KB left to fetch, have to wait 10min to get the last 1KB). This might be significant for partial updates, but obviously not for full updates.

2) Likewise for 1200KB in 40 min, but probably not much beyond that. Once you hit about 2MB or so you're getting into the full size of a partial update, meaning there's really no throttle at all for those. This circles back to oversaturating mirrors.

3) Mirrors don't handle Nightly builds, and the userbase there is much smaller- throttling there should only be dependent on what we think of Nightly user's bandwidth pipes. Similar for Aurora, and possibly even Beta. I would be in favor of completely disabling throttling on Nightly, and perhaps a minimal throttle on Aurora/Beta.
(In reply to Jake Maul [:jakem] from comment #56)
> If AUS reports there is an update, Firefox hits download.mozilla.org with
> the right query string to get directed to a mirror. If AUS said there was a
> partial update, then it should ask for that... this is usually 2-4MB.

That's not accurate for scheduled updates.  12.0->13.0 partial updates were 9.8MB on Windows and 11M on Mac.  13.0.1->14.0.1 partial updates were 7.4MB on Windows and 11MB on Mac.

Only partial updates for chemspills (like 13.0->13.0.1) are in the 2-4MB range.
Hey all,

I can only speak based on my experiences with mozilla.ftp.halifax.rwth-aachen.de (10 GBit/sec, Europe, weight of 10k, up to 9.8 GBit/sec of traffic during a FF release).

I don't know if this was mentioned already, but throttling also introduces a strange problem. If a mirror is temporarily deactivated (using sentry) and reactivated later on (in our case: 5 minutes later), there is a sudden surge of new clients. This is very demanding on the system. So far this has nothing to do with throttling.

However, with throttling, all of these clients have the same pause and unpause behaviour. Because of that, there is another spike 10 minutes after the first one. This effect can be seen for about 30 minutes, based on my experiences. Sometimes, the second spike is so challenging that our system again is deactivated by sentry which sometimes leads to a vicious circle.

Furthermore, I think the positive aspects of pausing the downloads is not visible to the mirrors. By increasing the length of the downloads, the number of connected clients is increased. However, with an unthrottled download clients would be finished earlier and make room for new clients.

With (in our case) more than 15,000 connected clients and close to 10 GBit/sec of traffic I don't see how things could go worse with unthrottled downloads.

If the mirror becomes saturated, as it is mentioned here, this is not a problem:

 - TCP just distributes the available bandwidth to all clients (which still should be faster than throttling?)

 - sentry lowers the weight of mirrors that do not respond in time (this happens earlier than expected because sentry checks a lot of files and delays sum up quickly)

Maybe it also helps to compare with other projects and their downloads. For large files (OpenSuse DVDs etc.) we also serve a lot of traffic, but the impact on the system is not that high. However, most of the FF releases (at least when our server was fresh and not as optimized as it is now) caused us to tweak quite a bit. We had to replace Apache with Nginx, we had to allow far more concurrent connections. Then we had to tweak the number of open file descriptors, add a second Nginx just for Mozilla (with different settings), had to play with "keep alive", ... I don't think that this much magic is really needed with unthrottled downloads.

Best regards,
Carsten
In speaking with the mirror network mailing list, there is strong support for removing the throttle altogether, and/or making it much smarter (that is: instead of 300KB/10min for everyone, implement an algorithm that sets throttles based on available client bandwidth... or some other relevant metric). The extra connections caused by our current throttling are more damaging than the sheer bandwidth of no-throttle would be.

To me this indicates the only criteria we should strongly consider for how a throttle should look would be the client experience... and that we need not consider the mirror servers.
That is an awesome start and thanks for talking with them.
This should probably be WORKSFORME given the CDN work and bug 803181, right?
I'm ok with closing this though Thunderbird and SeaMonkey still use the mirror network and could use this tuning.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #61)
> This should probably be WORKSFORME given the CDN work and bug 803181, right?

I don't think so.  We still throttle those downloads in the client code, so it doesn't make any difference how fast the mirror network is.
Summary: The complete MAR files take a long time to download via the application update process → The complete MAR files take a long time to download in the background via the application update process
(In reply to Ehsan Akhgari [:ehsan] from comment #63)
> (In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment
> #61)
> > This should probably be WORKSFORME given the CDN work and bug 803181, right?
> 
> I don't think so.  We still throttle those downloads in the client code, so
> it doesn't make any difference how fast the mirror network is.

Bug 803181 is about changing that built-in throttle rate. Specifically, the chunk interval was reduced from 10 minutes to 1 minute. That change alone should reduce the download time from 10-12 hours (assuming 20MB mar file) to around 1 hour.

Assuming a 20MB mar file:
old throttle: 682 minutes (11.4 hours), 0.5KB/sec average
new throttle: 68 minutes (1.1 hours), 5KB/sec average.
For comparison, 1KB/sec takes 5.7 hours to download 20MB.


However, it's worth noting that the original comment 0 here is talking about the delay on *nightly* being too long... not Release. Nightly and Aurora already had a 60-second chunk interval, and we haven't changed that (AFAIK). It's not my call if we should do anything there. Offhand it seems feasible to cut that interval down to 30s, or maybe even lower. A 30s sleep equates to approximately 10KB/s average... not nice on dialup, but feasible on just about anything else. I suspect the average bandwidth of Nightly/Aurora users is higher than that of Release users.

I still hope we can ultimately do something smarter here. A static chunk size and interval timer is a relatively poor approach overall, IMO, because one size does not fit all internet connections. I would much rather see no chunking, with a KB/s throttle rate that varies based on available bandwidth. Something adaptive is the only way we have any hope of providing an optimal experience under a wide range of conditions. Of course, this is harder to do... which might be why it wasn't done in the first place. :)
(In reply to Jake Maul [:jakem] from comment #64)
Agreed. When this bug when filed we were using the mirror network and we needed assurance that we weren't going to take down the mirrors as we have done previously when not throttling way back when before we moved forward with client changes. With CDN we can do much more on the client side without the worry we had with the mirrors. Since there are applications that might not be able to take advantage of these changes without adversely affecting their servers this will likely need to be optional. I'll file a bug for these additional enhancements as time permits.
Wrong component. Not really sure this has any further action, now that we have CDNs.
Assignee: server-ops → server-ops-webops
Component: FTP: Mirrors → Server Operations: Web Operations
QA Contact: mrz → nmaul
I'm going to move this bug back to Toolkit: App Update, as any future work here would be on the client side. Robert, feel free to take whatever action you deem appropriate.

I'd still like to see a better throttling algorithm someday, but maybe a discussion in dev.planning and then a new, clean bug is more appropriate than extending this one further. Between that and bug 596839 we have the potential to drastically enhance the update experience. :)
Assignee: server-ops-webops → nobody
Component: Server Operations: Web Operations → Application Update
Product: mozilla.org → Toolkit
QA Contact: nmaul
Version: other → Trunk
We no longer throttle the download and will be changing app update to use standard networking interfaces in bug 1348087. Any additional algorithms to try to "tune" the download should be in networking code so it can apply to all downloads, etc.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.