Closed Bug 848074 Opened 13 years ago Closed 12 years ago

Bouncer tests are saying firefox-latest-euballot builds are 404'ing from download.mozilla.org

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: stephend, Assigned: nmaul)

References

()

Details

Our bouncer.prod tests [1] are failing: msg = "Failed on http://download.mozilla.org \nUsing {'lang': 'en-GB', 'product': 'firefox-latest-euballot', 'os': 'win'}" Looks like the alias for firefox-latest-euballot isn't working. http://qa-selenium.mv.mozilla.com:8080/job/bouncer.prod/46691/testReport/junit/tests.test_redirects/TestRedirects/test_redirect_for_firefox_aliases__1_/
Interestingly, I can't reproduce this with curl: curl -IL "http://download.mozilla.org/?product=firefox-latest-euballot&os=win&lang=en-GB" HTTP/1.1 302 Found Server: Apache X-Backend-Server: bouncer8.webapp.phx1.mozilla.com Content-Type: text/html; charset=iso-8859-1 Date: Tue, 05 Mar 2013 20:00:08 GMT Location: http://download-eu.mozilla.org/?product=firefox-19.0-euballot&os=win&lang=en-GB Transfer-Encoding: chunked Connection: Keep-Alive X-Cache-Info: not cacheable; response is 302 without expiry time HTTP/1.1 302 Found Server: Apache X-Backend-Server: bouncer7.webapp.phx1.mozilla.com Cache-Control: max-age=15 Content-Type: text/html; charset=UTF-8 Date: Tue, 05 Mar 2013 20:00:01 GMT Location: http://wpc.1237.edgecastcdn.net/801237/download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/19.0/win32-EUballot/en-GB/Firefox%20Setup%2019.0.exe Transfer-Encoding: chunked Connection: Keep-Alive X-Cache-Info: cached HTTP/1.1 200 OK Accept-Ranges: bytes Access-Control-Allow-Origin: * Cache-Control: max-age=345600 Content-Type: application/octet-stream Date: Tue, 05 Mar 2013 19:59:43 GMT ETag: "384360-1370e78-4d5d3a9e4c300" Expires: Sat, 09 Mar 2013 19:59:43 GMT Last-Modified: Sat, 16 Feb 2013 08:56:12 GMT Server: Apache X-Backend-Server: ftp3.dmz.scl3.mozilla.com X-Cache-Info: cached But when I try with Firefox I eventually get sent to: https://ne1.wpc.edgecastcdn.net/801237/download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/19.0/win32-EUballot/en-GB/Firefox%20Setup%2019.0.exe
Assignee: server-ops → server-ops-webops
Component: Server Operations → Server Operations: Web Operations
QA Contact: shyam → nmaul
Assignee: server-ops-webops → nmaul
This is resolved, though not because of any action we (Mozilla) took. It appears there are 2 errors here: 1) Edgecast sometimes redirects from wpc.1237.edgecastcdn.net to ne1.wpc.edgecastcdn.net, which 404's. Other times it doesn't have this redirect, and works fine. I am at a loss as to what causes this, and will have to talk to them about it. 2) Sentry believes Akamai was too slow to respond between 19:35 and 20:25, so it disabled Akamai, leaving only Edgecast. This is a logic problem that's very hard to work around, but should normally be not an issue- the other CDN should pick up the slack. This time it was problematic, due to #1. We'll have to talk to Akamai about what may have happened here to cause Sentry to believe they were underperforming. Through rose-colored glasses on a silver cloud, it's kinda nice that problem #2 happened, or we might never have uncovered problem #1. :) The other thing to investigate is how we can do Sentry better. This is a tough problem, and I don't have a good answer. At the core of the issue is, how do you effectively monitor the health of a CDN? The best answer I know of is to use something like Cedexis, which we do. But then you still need redundancy in case Cedexis is down (or sending you to nodes that are slow, as was the case today), which we have... and that's what saved us here today. One thing that will help our response time is alerting based on Sentry status. Bug 848101 has been opened to do precisely that.
I'm going to close this out, because there's nothing more I can do here. Edgecast flatly denies that problem #1 happened or even can happen (on their end). I have no idea how :bhearsum landed at ne1.wpc.edgecastcdn.net... it is not (and AFAIK never has been) in bouncer. That web console snippet certainly seems to indicate that bouncer sent him directly to ne1.wpc, so I'm at a loss. It *looks* like bouncer is at fault, but I can't see how. I can't say for sure if this is the same 404 that WebQA was seeing, or a different one. Akamai was... less than helpful. The case is now closed, although no definitive cause was ever found. I still believe they had a temporary / intermittent network or server problem, but they didn't find (or won't admit) one. I do believe both issues were likely transient in nature, so I'm not too concerned about a repeat. The only distressing thing here is not getting the answers I wanted out of the CDN vendors.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.