Closed Bug 921598 Opened 11 years ago Closed 10 years ago

Slow serving application updates (~19kb/s) for Firefox 25.0b3 via CDN

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: whimboo, Unassigned)

Details

Attachments

(1 file)

Attached file http.log.zip
Right now we are facing very slow downloads of updates from the CDN network (as it looks like) while doing the update tests for Firefox 25.0b3. In most cases our tests fail because the time exceeds 6 minutes for a full mar file. This problem happens across our machines and platforms.

If you want to have a look at an HTTP log, please see the archived attachment.
Some more information from the updater log:

00:03:50.779 *** AUS:SVC readStringFromFile - file doesn't exist: C:\Documents and Settings\mozilla\Local Settings\Application Data\Mozilla\Firefox\firefox\updates\0\update.status
00:03:50.779 *** AUS:SVC readStatusFile - status: null, path: C:\Documents and Settings\mozilla\Local Settings\Application Data\Mozilla\Firefox\firefox\updates\0\update.status
00:03:50.808 *** AUS:SVC Downloader:downloadUpdate - downloading from http://download.mozilla.org/?product=firefox-25.0b3-complete&os=win&lang=en-US&force=1 to C:\Documents and Settings\mozilla\Local Settings\Application Data\Mozilla\Firefox\firefox\updates\0\update.mar
00:03:51.431 *** AUS:SVC Downloader:onStartRequest - original URI spec: http://download.mozilla.org/?product=firefox-25.0b3-complete&os=win&lang=en-US&force=1, final URI spec: http://download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/25.0b3/update/win32/en-US/firefox-25.0b3.complete.mar
00:03:51.465 *** AUS:SVC Downloader:onProgress - progress: 14240/28623574
[..]
00:10:02.159 *** AUS:SVC Downloader:onProgress - progress: 28500000/28623574

As seen above it takes 6:12min to download the full mar from a testing machine (mm-win-xp-32-2.qa.scl3.mozilla.com).
FWIW, this seems to have happened for ~2% of the 25.0b3 ondemand_update testruns on releasetest.
Assignee: infra → server-ops-webops
Component: Infrastructure: Other → WebOps: Product Delivery
QA Contact: jdow → nmaul
Note that this happened also on the beta channel but only about 0.7% of the time.
Severity: major → normal
i have been testing this with `curl` from the Paris office for the last 90 minutes (one download per minute). of those 90 download, i have observed the slowest download rate at: 2,049k/sec and the fastest at: 5,602k/sec. i will continue to run this test, but wanted to report my preliminary results.

for future reference, this is the command i have been running:

 $ while true; do curl -o /dev/null http://download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/25.0b3/update/win32/en-US/firefox-25.0b3.complete.mar ; sleep 60; done
Out of curiosity, I tested cturra's command on 3 connections (french free.fr, US comcast & verizon fios) and got full download speed every time (between 1.5 and 6 Mbps).
I do not see this reported slowness as of today via testruns triggered by our Mozmill CI. Might this have been a temporary issue on Friday?

When this happens again, which specific information would you need? The above mentioned curl command doesn't spit out any details about the server, the file gets downloaded from. Was anything helpful in the attached HTTP log?
i had a brief look through the http.log and didn't see too many details about the timing out requests. we actually load balance our CDNs, so one of the main questions i would have is what CDN these requests were timing out to. i did some research on each of our CDNs and don't see any indication that they experienced any issues last friday that could be the root cause here.

this all said, i do see requests in the log going through proxy.dmz.scl3.mozilla.com, which is one of the internal web proxy's in our (scl3) datacenter. i wonder if this is the culprit. were you able to validate that downloads were slow at this same time external to our network?
Flags: needinfo?(hskupin)
I haven't done outside checks given that around that time (when the testing happened) I was already in my weekend but only quickly spot-checked some results. So it might be indeed a problem with the proxy. What could we do to get this investigated?
Flags: needinfo?(hskupin)
we'd have to touch base with opsec on that. i have copied :tinfoil for this thoughts here.
Flags: needinfo?(mhenry)
A few thoughts:

One thing that would definitely help us is to know what IP address each chunk is getting downloaded from. That will tell us if there's any correlation between slow chunks and which CDN is serving them up. Firefox usually downloads in 300KB Range requests, in order to throttle itself... this means it's possible that different chunks can be served from different CDN nodes, or even different CDN vendors altogether. Ideally, we'd like to know how fast each chunk was downloaded, and from what IP, so we can tell if there's a pattern. If we can isolate a particular node or vendor that is problematic, we have a much better chance of fixing it.

(Side note: I really wish we could change Firefox so that it would download at a variable *rate* instead of the fixed-chunk-size-and-fixed-sleep-interval system we currently have. Something that adapts to the user's available bandwidth, rather than the current one-size-fits-all approach. I have some ideas on how this could be done, but that's out of scope here.)

Proxy is a reasonable thing to look into. It would not be the first time we've had trouble with them... though it might be the first time we've had *performance* trouble with them.
I'm looking at the proxies.  As :jakem said we've not had performance issues with the proxies as of yet.

Will report back with what I see.
Flags: needinfo?(mhenry)
I've tested all the proxies and I've found that the average download speed is around 57Mb/s  Is that too slow?
As what I wrote in comment 6 we weren't seeing that slowness earlier this week. Also yesterdays update tests were successful and updates have been downloaded in full speed. So it looks like that there might be a time when connection speed goes down.

Michael, is there a way to check logs from last Friday afternoon PDT? That was the time when we have experienced the slowness.
Seems like this is resolved now?
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Looks like. Might have been a single CDN server which was serving the data painfully slow. We keep our update logging enabled just in case something like that will happen again.
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: