836044 - Aurora stub installer doesn't seem to be working

Reporter

Description

•

13 years ago

I've been trying to install Aurora on a Windows 7 VM today, and I haven't been able to successfully run the stub installer to completion. It stalls at "Installing." I tried at least 5 times from here: http://www.mozilla.org/en-US/firefox/channel/#aurora

Robert Strong (they/them - no direct email)

Assignee

Comment 1

•

13 years ago

The aurora installer that is downloaded by the stub appears to be corrupt.

Robert Strong (they/them - no direct email)

Assignee

Comment 2

•

13 years ago

I've also had difficulty getting the installer downloaded from (it stopped at about 18.2 MB) http://download.cdn.mozilla.net/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/firefox-20.0a2.en-US.win32.installer.exe

Robert Strong (they/them - no direct email)

Assignee

Comment 3

•

13 years ago

I highly suspect a change with something server side especially since we haven't changed anything in the stub since the time I verified it was working last.

Alex Keybl [:akeybl]

Updated

•

13 years ago

tracking-firefox20: --- → ?

Jake Maul [:jakem]

Comment 4

•

13 years ago

(In reply to juan becerra [:juanb] from comment #0) > I've been trying to install Aurora on a Windows 7 VM today, and I haven't > been able to successfully run the stub installer to completion. It stalls at > "Installing." If it's stalling after downloading, it doesn't seem likely to be a delivery problem. Could it be a bad build of the full installer or something? Aurora worked for me just now, start to finish via stub installer. (In reply to Robert Strong [:rstrong] (do not email) from comment #1) > The aurora installer that is downloaded by the stub appears to be corrupt. Is there any way to verify this apart from using the stub installer? (In reply to Robert Strong [:rstrong] (do not email) from comment #2) > I've also had difficulty getting the installer downloaded from (it stopped > at about 18.2 MB) > http://download.cdn.mozilla.net/pub/mozilla.org/firefox/nightly/latest- > mozilla-aurora/firefox-20.0a2.en-US.win32.installer.exe Works for me at the moment. If you can duplicate, please let me know what IP "download.cdn.mozilla.net" is resolving to from the machine that is having trouble, at the time it is having trouble. (In reply to Robert Strong [:rstrong] (do not email) from comment #3) > I highly suspect a change with something server side especially since we > haven't changed anything in the stub since the time I verified it was > working last. No changes that I'm aware of. We don't generally mess with product delivery unilaterally, except in the case of major outages/emergencies. Of course that doesn't mean there isn't an error... only that I've nothing to go on in that direction. If we could get some sort of logging or verbose output from the stub installer, that would help tremendously in troubleshooting this sort of thing. Seems like we've gone through some sort of trouble several times now, and it's always a bit of a guessing game as to what the problem is.

Robert Strong (they/them - no direct email)

Assignee

Comment 5

•

13 years ago

(In reply to Jake Maul [:jakem] from comment #4) > (In reply to juan becerra [:juanb] from comment #0) > > I've been trying to install Aurora on a Windows 7 VM today, and I haven't > > been able to successfully run the stub installer to completion. It stalls at > > "Installing." > > If it's stalling after downloading, it doesn't seem likely to be a delivery > problem. Could it be a bad build of the full installer or something? I checked the download from the stub and it was corrupted. > > Aurora worked for me just now, start to finish via stub installer. It had not been working for me (several tries(... just tried it again with the same stub and it is now working. > > > (In reply to Robert Strong [:rstrong] (do not email) from comment #1) > > The aurora installer that is downloaded by the stub appears to be corrupt. > > Is there any way to verify this apart from using the stub installer? No idea if there are checks for the binaries distributed to the servers or if bouncer is redirecting correctly. > > (In reply to Robert Strong [:rstrong] (do not email) from comment #2) > > I've also had difficulty getting the installer downloaded from (it stopped > > at about 18.2 MB) > > http://download.cdn.mozilla.net/pub/mozilla.org/firefox/nightly/latest- > > mozilla-aurora/firefox-20.0a2.en-US.win32.installer.exe > > Works for me at the moment. If you can duplicate, please let me know what IP > "download.cdn.mozilla.net" is resolving to from the machine that is having > trouble, at the time it is having trouble. Will try if I run into it... Juan will more likely see it than I will. > > > (In reply to Robert Strong [:rstrong] (do not email) from comment #3) > > I highly suspect a change with something server side especially since we > > haven't changed anything in the stub since the time I verified it was > > working last. > > No changes that I'm aware of. We don't generally mess with product delivery > unilaterally, except in the case of major outages/emergencies. > > Of course that doesn't mean there isn't an error... only that I've nothing > to go on in that direction. > > > If we could get some sort of logging or verbose output from the stub > installer, that would help tremendously in troubleshooting this sort of > thing. Seems like we've gone through some sort of trouble several times now, > and it's always a bit of a guessing game as to what the problem is. The last couple of times iirc it was bouncer. Perhaps there could be a process to verify that is working correctly? I am adding more logging to the stub but it seems like there should be something other than the stub for verifying the server side especially since we want the stub to remain small in size.

juan becerra [:juanb]

Reporter

Comment 6

•

13 years ago

I haven't been able to reproduce this again in the last half hour, however I've noticed that my machine cursor becomes a little spinner once it has crossed the "downloading" line and begun the installation process. That's not something I was seeing earlier, so perhaps the file never quite finished downloading despite the progress indicator being right on the line. I will keep trying for a little bit, but unless I can observe this again, I don't know how to proceed.

Robert Strong (they/them - no direct email)

Assignee

Comment 7

•

13 years ago

The OS changing the cursor to a spinner is fairly typical (see it often) during the start of the install since we launch an external process and is expected.

Jason Smith [:jsmith]

Updated

•

13 years ago

Blocks: StubInstaller

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 8

•

13 years ago

I tried this today with the stub installer for Aurora on Windows 7 64 and I don't see this problem. It works as expected.

Alex Keybl [:akeybl]

Comment 9

•

13 years ago

We'll track and wait for the builds from comment 5 that rs mentions as having extra logging. Understood that this doesn't appear to be an issue currently, but may be intermittent.

tracking-firefox20: ? → +

Lukas Blakk [:lsblakk] use ?needinfo

Comment 10

•

12 years ago

Can we confirm this is no longer occurring? Happy to keep this tracked for the rest of the week until FF 20 moves to Beta but after that we should either continue tracking on 21 once it lands on Aurora or resolve this WFM.

Keywords: qawanted

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

Flags: needinfo?(jbecerra)

juan becerra [:juanb]

Reporter

Comment 11

•

12 years ago

I tried this about 20 times on a couple of machines, and I was able to reproduce the problem once.

Flags: needinfo?(jbecerra)

Lukas Blakk [:lsblakk] use ?needinfo

Comment 12

•

12 years ago

Passing to Rob to see if we can get "some sort of logging or verbose output from the stub installer" to help Jake with troubleshooting here? Also, moving tracking to 21 as it will be moving to Aurora channel on Tuesday's Merge Day.

Assignee: nobody → robert.bugzilla

tracking-firefox20: + → -

tracking-firefox21: --- → +

Robert Strong (they/them - no direct email)

Assignee

Comment 13

•

12 years ago

The best way to get verbose logging would be to use wireshark and capture the download until it is reproduced. I'll try to do so when I have the time.

juan becerra [:juanb]

Reporter

Comment 14

•

12 years ago

I was able to reproduce this while running Wireshark, and I'm uploading the log file a Dropbox location, and once that's done I'll post the link here.

juan becerra [:juanb]

Reporter

Comment 15

•

12 years ago

Robert, the log is in the following link. Let me know if there's anything else you need: http://dl.dropbox.com/u/143596/20130219-aurora-stub-stuck.pcapng

Robert Strong (they/them - no direct email)

Assignee

Comment 16

•

12 years ago

Jake, while debugging this I noticed that some of the servers have nightly builds that are a couple of days old. download.cdn.mozilla.net IP's that should have Build ID 20130218031106 63.236.253.19 has Build ID 20130216031127 204.93.47.59 has Build ID 20130216031127 Possibly others as well.

Robert Strong (they/them - no direct email)

Assignee

Comment 17

•

12 years ago

I believe it should have actually been a 20130219 build (20130219031055?) across the board. Is the cdn usually a day behind?

Robert Strong (they/them - no direct email)

Assignee

Comment 18

•

12 years ago

Note: so far it is failing to download the majority of the time with 93.184.215.248 and has succeeded 5 out of 5 times with 63.236.253.19.

Robert Strong (they/them - no direct email)

Assignee

Comment 19

•

12 years ago

I just tested several of the IP addresses that are being returned with the following results (there were many more of the same before I actually started counting): 165.254.94.64 download.cdn.mozilla.net # Good 5 out of 5 165.254.94.16 download.cdn.mozilla.net # Good 5 out of 5 63.236.253.24 download.cdn.mozilla.net # Good 5 out of 5 205.234.218.40 download.cdn.mozilla.net # Good 5 out of 5 63.236.253.49 download.cdn.mozilla.net # Good 5 out of 5 63.236.253.19 download.cdn.mozilla.net # Good 5 out of 5 209.211.216.24 download.cdn.mozilla.net # Good 5 out of 5 93.184.215.248 download.cdn.mozilla.net # Bad 10 out of 10 I added a bunch of logging to the stub installer and the it is the same when it succeeds as when it fails.

Alex Keybl [:akeybl]

Comment 20

•

12 years ago

(In reply to Robert Strong [:rstrong] (do not email) from comment #19) > 93.184.215.248 download.cdn.mozilla.net # Bad 10 out of 10 > > I added a bunch of logging to the stub installer and the it is the same when > it succeeds as when it fails. Sending over to Jake. Thanks Rob!

Assignee: robert.bugzilla → nmaul

Jake Maul [:jakem]

Comment 21

•

12 years ago

Excellent, that is precisely the data I need, thank you. All of those working IPs are Akamai. The failing one is Edgecast. Just to confirm, we're still looking at this URL, right? http://download.cdn.mozilla.net/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/firefox-20.0a2.en-US.win32.installer.exe Comparing the md5sum's from each of those IPs downloading that file (as well as the FTP cluster directly), here's what I get: MD5 (165.254.94.64.exe) = a7df4df13300adfa59b9ea9c914f5740 MD5 (165.254.94.16.exe) = a7df4df13300adfa59b9ea9c914f5740 MD5 (63.236.253.24.exe) = a7df4df13300adfa59b9ea9c914f5740 MD5 (205.234.218.40.exe) = a7df4df13300adfa59b9ea9c914f5740 MD5 (63.236.253.49.exe) = a7df4df13300adfa59b9ea9c914f5740 MD5 (63.236.253.19.exe) = a7df4df13300adfa59b9ea9c914f5740 MD5 (209.211.216.24.exe) = a7df4df13300adfa59b9ea9c914f5740 MD5 (93.184.215.248.exe) = fd9ebd7ca34854e1fc7847114fdda892 MD5 (ftp.exe) = fd9ebd7ca34854e1fc7847114fdda892 File size on that one differs slightly, too: -rw-r--r-- 1 jakemaul staff 21179216 Feb 20 10:36 165.254.94.64.exe -rw-r--r-- 1 jakemaul staff 21179216 Feb 20 10:36 165.254.94.16.exe -rw-r--r-- 1 jakemaul staff 21179216 Feb 20 10:37 63.236.253.24.exe -rw-r--r-- 1 jakemaul staff 21179216 Feb 20 10:37 205.234.218.40.exe -rw-r--r-- 1 jakemaul staff 21179216 Feb 20 10:37 63.236.253.49.exe -rw-r--r-- 1 jakemaul staff 21179216 Feb 20 10:37 63.236.253.19.exe -rw-r--r-- 1 jakemaul staff 21179216 Feb 20 10:41 209.211.216.24.exe -rw-r--r-- 1 jakemaul staff 21179024 Feb 20 10:41 93.184.215.248.exe -rw-r--r-- 1 jakemaul staff 21179024 Feb 20 10:59 ftp.exe That explains why the stub installer fails *after* downloading, in the "Installing" phase. The delivery is fine... the contents being delivered are faulty somehow. Response headers for one of the good Akamai nodes, the bad Edgecast node, and the FTP cluster: Akamai: < HTTP/1.1 200 OK < Server: Apache < X-Backend-Server: ftp2.dmz.scl3.mozilla.com < Content-Type: application/octet-stream < Accept-Ranges: bytes < Access-Control-Allow-Origin: * < ETag: "1e522bc-1432b50-4d5ee1ece642e" < Last-Modified: Sun, 17 Feb 2013 16:30:02 GMT < X-Cache-Info: caching < Content-Length: 21179216 < Cache-Control: max-age=164507 < Expires: Fri, 22 Feb 2013 15:23:08 GMT < Date: Wed, 20 Feb 2013 17:41:21 GMT < Connection: keep-alive Edgecast: < HTTP/1.1 200 OK < Accept-Ranges: bytes < Access-Control-Allow-Origin: * < Cache-Control: max-age=345600 < Content-Type: application/octet-stream < Date: Wed, 20 Feb 2013 17:41:37 GMT < ETag: "834f22-1432a90-4d6169c610688" < Expires: Sun, 24 Feb 2013 17:41:37 GMT < Last-Modified: Tue, 19 Feb 2013 16:48:28 GMT < Server: ECAcc (cpm/F8A3) < X-Backend-Server: ftp3.dmz.scl3.mozilla.com < X-Cache: HIT < X-Cache-Info: cached < Content-Length: 21179024 ftp.mozilla.org: < HTTP/1.1 200 OK < Server: Apache < X-Backend-Server: ftp6.dmz.scl3.mozilla.com < Cache-Control: max-age=345600 < Content-Type: application/octet-stream < Date: Wed, 20 Feb 2013 17:43:16 GMT < Expires: Sun, 24 Feb 2013 17:43:16 GMT < Accept-Ranges: bytes < Access-Control-Allow-Origin: * < ETag: "834f22-1432a90-4d6169c610688" < Last-Modified: Tue, 19 Feb 2013 16:48:28 GMT < X-Cache-Info: caching < Content-Length: 21179024 Looking carefully at the dates, it appears that Akamai is currently serving an installer from Feb 17... Edgecast and ftp.mozilla.org are serving one from Feb 19. It appears to me that the older full installer is working properly, but the newer one is not. We are sending a far too long Expires header, at least, but that's not the problem here (it's just making things confusing because they don't all have the same contents). I'll work on this today. However, based on this data, I'd have to say that fixing this is likely to make the problem *worse*, because it will force the CDNs to stay more up-to-date, which means they'll both be serving the "bad" version instead. In the meantime, you might want to try this... set this line in your /etc/hosts file: 63.245.215.46 download.cdn.mozilla.net This will send you straight to the FTP cluster, cutting out both CDNs entirely. Be sure to flush your local DNS cache after setting this. If *that* fails (as I now suspect it will, given the above data), then we can definitively rule out either CDN as being a problem. That would tell me that there's a problem in the full installer itself, or at least something that is tripping up the stub installer.

Jake Maul [:jakem]

Comment 22

•

12 years ago

Just to confirm, I checked all of the IPs in comment 19... all of the working Akamai IPs are serving up: < Last-Modified: Sun, 17 Feb 2013 16:30:02 GMT < ETag: "1e522bc-1432b50-4d5ee1ece642e" < Content-Length: 21179216 The broken Edgecast IP is serving up: < Last-Modified: Tue, 19 Feb 2013 16:48:28 GMT < ETag: "834f22-1432a90-4d6169c610688" < Content-Length: 21179024 ftp.mozilla.org serves up (same as Edgecast, so presumably broken but needs tested): < Last-Modified: Tue, 19 Feb 2013 16:48:28 GMT < ETag: "834f22-1432a90-4d6169c610688" < Content-Length: 21179024

Robert Strong (they/them - no direct email)

Assignee

Comment 23

•

12 years ago

btw: I also tested using ftp for the download url without any failures (4 out of 4 good). http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-21.0a1.en-US.win32.installer.exe The IP's I hit for ftp are: 63.245.215.46 63.245.215.56

Robert Strong (they/them - no direct email)

Assignee

Comment 24

•

12 years ago

Since nightly fails as well I was testing using nightly.

Robert Strong (they/them - no direct email)

Assignee

Comment 25

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #21) >... > In the meantime, you might want to try this... set this line in your > /etc/hosts file: > 63.245.215.46 download.cdn.mozilla.net > > This will send you straight to the FTP cluster, cutting out both CDNs > entirely. Be sure to flush your local DNS cache after setting this. If > *that* fails (as I now suspect it will, given the above data), then we can > definitively rule out either CDN as being a problem. That would tell me that > there's a problem in the full installer itself, or at least something that > is tripping up the stub installer. It appears to only happen when using Edgecast and I suspect that just as with Akamai setting the ftp IP in my hosts file will succeed. I'll get data using aurora later today.

Robert Strong (they/them - no direct email)

Assignee

Comment 26

•

12 years ago

Jake, this might also be related to bug 816472 where we periodically see mar files that are larger than expected. Also, bug 816472 happens in Firefox and not in the stub.

Robert Strong (they/them - no direct email)

Assignee

Comment 27

•

12 years ago

(In reply to Robert Strong [:rstrong] (do not email) from comment #26) > Jake, this might also be related to bug 816472 where we periodically see mar > files that are larger than expected. Also, bug 816472 happens in Firefox and > not in the stub. btw: I have seen this happen in the stub as well... though less often than the other error but greater than successful downloads and also from the Edgecast IP.

Jake Maul [:jakem]

Comment 28

•

12 years ago

I have fixed the expires headers, and issued a purge of these files on both Akamai and Edgecast. By my testing they both contain the same file now, and it's the same file as on the FTP cluster.

Robert Strong (they/them - no direct email)

Assignee

Comment 29

•

12 years ago

I just tried 5 times and they were all successful. juan, can you still reproduce?

juan becerra [:juanb]

Reporter

Comment 30

•

12 years ago

I remember having to do this maybe 20 times before I was able to reproduce this. I'll give it a try again today.

Robert Strong (they/them - no direct email)

Assignee

Comment 31

•

12 years ago

I went with adding 93.184.215.248 download.cdn.mozilla.net to my hosts file which was giving me consistent failures. Now it is giving me consistent success.

juan becerra [:juanb]

Reporter

Comment 32

•

12 years ago

I haven't been able to reproduce the problem in the past hour, after trying tens of times.

Robert Strong (they/them - no direct email)

Assignee

Comment 33

•

12 years ago

With 93.184.215.248 download.cdn.mozilla.net in my hosts file it successfully downloaded / installed over 35 times. Last night it consistently failed. Jake, what ever you did appears to have fixed it for me and Juan. Any idea what could have been the cause?

Robert Strong (they/them - no direct email)

Assignee

Comment 34

•

12 years ago

Is it possible to get all of the IP Addresses? With that I can probably create a test installer to verify all of the servers.

Robert Strong (they/them - no direct email)

Assignee

Comment 35

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #22) > Just to confirm, I checked all of the IPs in comment 19... all of the > working Akamai IPs are serving up: > > < Last-Modified: Sun, 17 Feb 2013 16:30:02 GMT > < ETag: "1e522bc-1432b50-4d5ee1ece642e" > < Content-Length: 21179216 > > > The broken Edgecast IP is serving up: > > < Last-Modified: Tue, 19 Feb 2013 16:48:28 GMT > < ETag: "834f22-1432a90-4d6169c610688" > < Content-Length: 21179024 I was able to extract most of one of the corrupt 93.184.215.248 downloads from yesterday and the file times are 2/18/2013 6:19 AM. So, it appears that it thinks it is serving up 2/19 and the files I am received were from 2/18.

Robert Strong (they/them - no direct email)

Assignee

Comment 36

•

12 years ago

though it is possible that 93.184.215.248 started serving the 2/19 build after I experienced that. Still though, what was served was either corrupted or corrupted on the client side and it appears that what ever you did has fixed that.

Jake Maul [:jakem]

Comment 37

•

12 years ago

Is it feasible that ... I don't really know how to even ask this ... that stub installer is "expecting" a given version of a file in some way, and when getting some other version it chokes? Another possibility: Edgecast uses Anycast IPs. That means worldwide, despite hundreds (thousands?) of cache servers, they only present ~3 actual IPs. It's possible that one or more Edgecast nodes responding behind that IP actually did have bad data of some kind, totally unrelated to the the mere fact that they had different files from Akamai. I'm in Phoenix (and AFAIK you're all in the Bay area)... the nodes I hit and pulled files from weren't necessarily the same as the ones you accessed... so it's possible that the files *you* were getting are a different md5sum than the ones I was getting, even though we got them from the same IP. This all hinges on "maybe some Edgecast nodes had corrupt data"... unlikely, but definitely not impossible. If this is the case, then the cache flush would have cured it. This feels unlikely because the previous Expires header was only 4 days... we've had this bug open much longer. If it was bad data on some node, it would have been flushed naturally a long time ago. If the problem has been continuous since this bug was opened, then it's really hard to point a finger at something as transient as "bad cache data". The only way to debug this is to examine the HTTP headers of failed attempts vs successful ones: Edgecast includes a "Server: ECAcc (cpm/F8A3)" which uniquely identifies which precise node served your request. It might be possible to distinguish successful vs failed attempts looking just at that Edgecast IP. One other possible scenario, though I can't see how it would matter: I noted that it's been over 24 hours since a full installer build for Aurora was generated. That means, given the new Expires settings in bug 829207, that the CDNs are not actually caching right now... they'll merely be forwarding data from the origin (ftp.mozilla.org), because the Expires header will not allow them to cache. It seems pretty unlikely that the stub installer would care one way or the other if the full installer was served from cache or not, but I wanted to mention it for the sake of full disclosure.

Jake Maul [:jakem]

Comment 38

•

12 years ago

(In reply to Robert Strong [:rstrong] (do not email) from comment #34) > Is it possible to get all of the IP Addresses? With that I can probably > create a test installer to verify all of the servers. Yes, but it won't do you much good. Edgecast uses Anycast to route their CDN traffic, meaning they expose only a handful of IPs worldwide, for lots and lots of cache nodes. This means queries to the same IP from different locations will actually hit different servers. Makes troubleshooting this kind of thing harder. :( I can get you a list of Edgecast IPs that will access our origin, but this is not the same list of IPs that end users will access to get the files. I don't know if you can query them directly and get an intelligible answer. Akamai is more traditional... one IP per cache node/cluster, so (AFAIK) you can expect consistent behavior from a given IP. I don't think they publicize a list, but if it's of interest to you I can ask for a snapshot. I dunno if they'd be willing to divulge it.

Robert Strong (they/them - no direct email)

Assignee

Comment 39

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #37) > Is it feasible that ... I don't really know how to even ask this ... that > stub installer is "expecting" a given version of a file in some way, and > when getting some other version it chokes? No. It is just doing a WinInet download and at least yesterday only had ended up with a corrupted file from the Edgecast IP. > Another possibility: Edgecast uses Anycast IPs. That means worldwide, > despite hundreds (thousands?) of cache servers, they only present ~3 actual > IPs. It's possible that one or more Edgecast nodes responding behind that IP > actually did have bad data of some kind, totally unrelated to the the mere > fact that they had different files from Akamai. I'm in Phoenix (and AFAIK > you're all in the Bay area)... the nodes I hit and pulled files from weren't > necessarily the same as the ones you accessed... so it's possible that the > files *you* were getting are a different md5sum than the ones I was getting, > even though we got them from the same IP. > > This all hinges on "maybe some Edgecast nodes had corrupt data"... unlikely, > but definitely not impossible. If this is the case, then the cache flush > would have cured it. This feels unlikely because the previous Expires header > was only 4 days... we've had this bug open much longer. If it was bad data > on some node, it would have been flushed naturally a long time ago. If the > problem has been continuous since this bug was opened, then it's really hard > to point a finger at something as transient as "bad cache data". I don't think we know if it has been continuous. We do know that it was sporadic until I forced it to use the Edgecast IP. > The only way to debug this is to examine the HTTP headers of failed attempts > vs successful ones: Edgecast includes a "Server: ECAcc (cpm/F8A3)" which > uniquely identifies which precise node served your request. It might be > possible to distinguish successful vs failed attempts looking just at that > Edgecast IP. Juan posted a wireshark log from a failed in comment #15. btw: I have compared the WinInet logging for both a failed and a successful download and they were exactly the same so I don't think we will be able to get any additional insight there. > One other possible scenario, though I can't see how it would matter: I noted > that it's been over 24 hours since a full installer build for Aurora was > generated. That means, given the new Expires settings in bug 829207, that > the CDNs are not actually caching right now... they'll merely be forwarding > data from the origin (ftp.mozilla.org), because the Expires header will not > allow them to cache. It seems pretty unlikely that the stub installer would > care one way or the other if the full installer was served from cache or > not, but I wanted to mention it for the sake of full disclosure. The stub does individual range requests and I wonder if the Edgecast server is serving up a different file for some of the requests.

Robert Strong (they/them - no direct email)

Assignee

Comment 40

•

12 years ago

s/Edgecast server/Edgecast IP/

Robert Strong (they/them - no direct email)

Assignee

Comment 41

•

12 years ago

(In reply to Robert Strong [:rstrong] (do not email) from comment #35) > (In reply to Jake Maul [:jakem] from comment #22) > > Just to confirm, I checked all of the IPs in comment 19... all of the > > working Akamai IPs are serving up: > > > > < Last-Modified: Sun, 17 Feb 2013 16:30:02 GMT > > < ETag: "1e522bc-1432b50-4d5ee1ece642e" > > < Content-Length: 21179216 > > > > > > The broken Edgecast IP is serving up: > > > > < Last-Modified: Tue, 19 Feb 2013 16:48:28 GMT > > < ETag: "834f22-1432a90-4d6169c610688" > > < Content-Length: 21179024 > I was able to extract most of one of the corrupt 93.184.215.248 downloads > from yesterday and the file times are 2/18/2013 6:19 AM. So, it appears that > it thinks it is serving up 2/19 and the files I am received were from 2/18. btw: all of the files had a 2/18/2013 6:XX AM timestamp

Jake Maul [:jakem]

Comment 42

•

12 years ago

(In reply to Robert Strong [:rstrong] (do not email) from comment #36) > The stub does individual range requests and I wonder if the Edgecast server > is serving up a different file for some of the requests. Aha, I mid-aired with you to ask this very question, if it does anything interesting like Range requests. Anycast IPs fronting independent caching nodes File contents changing daily, but 4-day-long Expires headers Range requests ... possibly each Range request is hitting a different Edgecast node, and they don't all happen to have the same version of the file? That would severely screw up the download if you got 300KB from one version, and then 300KB from another one.

Robert Strong (they/them - no direct email)

Assignee

Comment 43

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #42) > (In reply to Robert Strong [:rstrong] (do not email) from comment #36) > > > The stub does individual range requests and I wonder if the Edgecast server > > is serving up a different file for some of the requests. > > Aha, I mid-aired with you to ask this very question, if it does anything > interesting like Range requests. > > Anycast IPs fronting independent caching nodes > File contents changing daily, but 4-day-long Expires headers > Range requests > > ... possibly each Range request is hitting a different Edgecast node, and > they don't all happen to have the same version of the file? That would > severely screw up the download if you got 300KB from one version, and then > 300KB from another one. I highly suspect that is what is going on. I suppose there is no way to have consistency of files served by Edgecast?... at least not with our current setup where the url served by bouncer is always the same? I'll see what I can do in the code for this scenario.

Robert Strong (they/them - no direct email)

Assignee

Comment 44

•

12 years ago

If we can't gaurantee consistency on Edgecast it would be helpful if the problem was still present so I can verify that any changes I make fix the problem. Can you revert Edgecast to the previous config if that is the case?

Jake Maul [:jakem]

Comment 45

•

12 years ago

(In reply to Robert Strong [:rstrong] (do not email) from comment #43) > (In reply to Jake Maul [:jakem] from comment #42) > > ... possibly each Range request is hitting a different Edgecast node, and > > they don't all happen to have the same version of the file? That would > > severely screw up the download if you got 300KB from one version, and then > > 300KB from another one. > I highly suspect that is what is going on. I suppose there is no way to have > consistency of files served by Edgecast?... at least not with our current > setup where the url served by bouncer is always the same? > > I'll see what I can do in the code for this scenario. Not really, no... our headers are explicitly telling them "it's okay to cache this file for X hours/days", and then we violate that by changing the contents of the file before then. I'll bring this situation up to them, but I suspect the response will be something along the lines of "yeah, don't do that". :) The obvious fixes, as I see them, are: 1) Sidestep the problem by not using Range requests. No idea what this entails in terms of work, or why it uses them now. I know why Firefox does this for updating, but don't know the rationale for stub installer. If you get the whole file in one shot, it'll be consistent. 2) Don't use Edgecast. Since Akamai's IPs are internally "safe" in this respect, and you're not likely to do another DNS lookup between Range requests, this should generally work. However, on a very slow link, you might have to do another lookup before finishing, and thus you could run into the same problem with any CDN. For this reason I would consider this a stopgap, not a real fix. 3) Retool our product delivery more significantly. Essentially, change the filename every time we change the contents. It will take some thought, but I suspect it may be feasible to use a simple query string to bust the cache and create new objects as needed. The query string would be junk, just so long as it's *different* junk whenever the file changes. The CDNs can recognize that as "this is a new object" and treat it accordingly. This is the standard solution to "my file changed and the CDN has the old version"... I think it will work in this situation ("I'm using range requests and the file is sometimes different") just as well, although it would be good to have someone else double-check my thought process on this. This basically means bouncer changes. It might be possible to have bouncer send you to a file like: http://download.cdn.mozilla.net/path/to/exe/installer.exe?<md5sum-of-current-file> How bouncer is going to get hold of that md5sum is up for debate... it can't be done on-the-fly though (performance), and needs to rotate whenever the file changes on the ftp cluster.

Jake Maul [:jakem]

Comment 46

•

12 years ago

(In reply to Robert Strong [:rstrong] (do not email) from comment #44) > If we can't gaurantee consistency on Edgecast it would be helpful if the > problem was still present so I can verify that any changes I make fix the > problem. Can you revert Edgecast to the previous config if that is the case? I can't make it inconsistent again, but I can definitely set it up for failure by undoing my change to the Expires headers... in a few days they'll wander out of sync again and the problem will start to reappear. Want me to do that?

Robert Strong (they/them - no direct email)

Assignee

Comment 47

•

12 years ago

Yes. That will give me some confidence that any changes I make fix the problem for the stub.

Jake Maul [:jakem]

Updated

•

12 years ago

Blocks: 829207

u279076

Comment 48

•

12 years ago

Jake/Robert, is qawanted still needed on this bug? If so, what more can we do to assist you?

Jake Maul [:jakem]

Comment 49

•

12 years ago

At the moment there's nothing I need from QA. The next step for me is to implement a decision on comment 45 (one of the options, or a different option), and to re-implement the fix in bug 829207 when :rstrong gives the green light to do so. I'm CC'ing Brandon and Laura on this bug, because one of the options in comment 45 is to alter Bouncer to include a query string when directing users to a mirror. This would need some thought and implementation, so I'd like to have them roped in on it. We can have a vidyo meeting to fill you two in on the background. :)

u279076

Comment 50

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #49) > At the moment there's nothing I need from QA. Thanks Jake. Dropping QAWANTED.

Keywords: qawanted

Robert Strong (they/them - no direct email)

Assignee

Comment 51

•

12 years ago

I'm removing range requests in bug 811573 which will make it so the stub doesn't break when hitting Edgecast servers so adding dependency.

Depends on: 811573

Robert Strong (they/them - no direct email)

Assignee

Comment 52

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #46) > (In reply to Robert Strong [:rstrong] (do not email) from comment #44) > > If we can't gaurantee consistency on Edgecast it would be helpful if the > > problem was still present so I can verify that any changes I make fix the > > problem. Can you revert Edgecast to the previous config if that is the case? > > I can't make it inconsistent again, but I can definitely set it up for > failure by undoing my change to the Expires headers... in a few days they'll > wander out of sync again and the problem will start to reappear. Want me to > do that? Thanks for doing this. It made it so I was able to find a bug in the new code while testing the fix in bug 811573.

Laura Thomson :laura

Comment 53

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #49) > At the moment there's nothing I need from QA. The next step for me is to > implement a decision on comment 45 (one of the options, or a different > option), and to re-implement the fix in bug 829207 when :rstrong gives the > green light to do so. > > I'm CC'ing Brandon and Laura on this bug, because one of the options in > comment 45 is to alter Bouncer to include a query string when directing > users to a mirror. This would need some thought and implementation, so I'd > like to have them roped in on it. We can have a vidyo meeting to fill you > two in on the background. :) Is this still something that you want to do?

Jake Maul [:jakem]

Comment 54

•

12 years ago

@laura: no, I think you and Brandon are in the clear on this now. :) Just waiting on the green light to re-apply the fix here, and then we can close this bug out.

Robert Strong (they/them - no direct email)

Assignee

Comment 55

•

12 years ago

Jake, green light is given on re-applying the fix in bug 829207. Thanks!

Robert Strong (they/them - no direct email)

Assignee

Comment 57

•

12 years ago

Pushed to mozilla-central in bug 811573 https://hg.mozilla.org/mozilla-central/rev/216ec69cc531

Assignee: nmaul → robert.bugzilla

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Whiteboard: [stub+]

Target Milestone: --- → Firefox 22

Robert Strong (they/them - no direct email)

Assignee

Comment 58

•

12 years ago

Pushed to mozilla-aurora in bug 811573 https://hg.mozilla.org/releases/mozilla-aurora/rev/189f9d9f856c

status-b2g18: --- → wontfix

status-b2g18-v1.0.0: --- → wontfix

status-b2g18-v1.0.1: --- → wontfix

status-firefox21: --- → fixed

status-firefox22: --- → fixed

status-firefox-esr17: --- → wontfix

u279076

Updated

•

12 years ago

Keywords: verifyme

Robert Strong (they/them - no direct email)

Assignee

Comment 59

•

12 years ago

Please note that the workaround for fiddler that was added to the stub installer has been removed since it breaks downloading from Edgecast. Also note that fiddler breaks other parts of Firefox as well and since it is very likely that very few people use fiddler this is better than the workaround.

Robert Strong (they/them - no direct email)

Assignee

Comment 60

•

12 years ago

Pushed to mozilla-beta in bug 811573 https://hg.mozilla.org/releases/mozilla-beta/rev/a0449ebebe8f

status-firefox20: --- → fixed

Robert Strong (they/them - no direct email)

Assignee

Comment 61

•

12 years ago

I received a sample set of the data last night and the new data points look like the symptoms that were present when this bug occurred during the download have been fixed.

Paul Silaghi, QA [:pauly]

Comment 62

•

12 years ago

FF 20b4, aurora 21.0a2 (2013-03-07) and nightly 22.0a1 (2013-03-07) stub installers works fine now. Is this enough for verifying this bug ? If not, what exactly should be tested here?

Jake Maul [:jakem]

Comment 63

•

12 years ago

Seems like enough to me. All evidence points to this being unreproducible now, after the fix(es).

Ioana (away)

Updated

•

12 years ago

Status: RESOLVED → VERIFIED

status-firefox20: fixed → verified

status-firefox21: fixed → verified

status-firefox22: fixed → verified

u279076

Updated

•

12 years ago

Keywords: verifyme