Closed
Bug 804984
Opened 12 years ago
Closed 12 years ago
stage.m.o returns a lot of HTTP 500 errors
Categories
(Infrastructure & Operations :: Infrastructure: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rail, Assigned: bhourigan)
References
Details
(Whiteboard: [reit-ops])
I'm getting a lot of 500 errors while running update verification automation, which download files from stage.m.o. At least it happened yesterday, around 19-23. Re-running the automation again ATM.
Comment 1•12 years ago
|
||
:rail are you still seeing this this morning?
Reporter | ||
Comment 2•12 years ago
|
||
All of the failed verification steps are passed now.
Comment 3•12 years ago
|
||
Peter, any idea what happened here? Rail, your bug report isn't clear on what the exact issues were and what you saw? Can you provide some more information please? Logs? Machines involved etc? I'm dropping this to normal so as to not page oncall and since there is not apparent outage/imapact at the moment.
Severity: major → normal
Reporter | ||
Comment 4•12 years ago
|
||
Sure, you can take a look at one of the failed logs: http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/17.0b3-candidates/build1/logs/release-mozilla-beta-win32_update_verify_1-bm12-build1-build1.txt.gz grep for FAIL: pattern
Comment 5•12 years ago
|
||
Rail, Are these stage URLs persistent? I guess not, because they all 404 now?
Comment 6•12 years ago
|
||
I've no ideas beyond bug 804119. First I've seen of this issue.
Comment 7•12 years ago
|
||
Saw a bunch of failures over night that broke 10.0.10 update verification: http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/10.0.10esr-candidates/build1/logs/release-mozilla-esr10-macosx64_update_verify_2-bm12-build1-build0.txt.gz
Blocks: 800422
Reporter | ||
Comment 8•12 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #5) > Rail, > > Are these stage URLs persistent? I guess not, because they all 404 now? That's really strange. The files should be on ftp for a long time before we remove them...
Comment 9•12 years ago
|
||
stage and ftp aren't the same, atleast from what I can see. Tossing this over to infra.
Assignee: server-ops → server-ops-infra
Component: Server Operations → Server Operations: Infrastructure
QA Contact: shyam → jdow
Comment 10•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #7) > Saw a bunch of failures over night that broke 10.0.10 update verification: > > http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/10.0.10esr- > candidates/build1/logs/release-mozilla-esr10-macosx64_update_verify_2-bm12- > build1-build0.txt.gz Do you have a time for when that happened? This could all be related to the ongoing netapp issues in SCL3.
Comment 11•12 years ago
|
||
From the log: Trying to get http://stage.mozilla.org/pub/mozilla.org/firefox/nightly/10.0.10esr-candidates/build1/update/mac/lv/firefox-10.0.10esr.complete.mar: 21:26:22 ERROR 500: Internal Server Error. 21:26:54 ERROR 500: Internal Server Error. 21:27:28 ERROR 500: Internal Server Error.
Assignee | ||
Comment 12•12 years ago
|
||
All connections to stage.mozilla.org on tcp/80 point to the ftp cluster on tcp/80. I combed through the apache logs on ftp* and all requests for firefox-10.0.10esr.complete.mar were satisfied with a response code of 200, so I don't think this was attributed to the netapp unless the servers were so loaded that the health checks failed and shut down the pool. Zeus logs do confirm numerous 500 errors. I found that 29 errors occured between 24/Oct/2012:21:21:45 and 24/Oct/2012:21:32:43. Detailed logging isn't enabled for this vip however general Zeus error logs don't provide any helpful information. It is worth noting that the 500 errors were only served from zlb1.ops.scl3, the remainder of the zeus nodes did not log any errors.
Comment 13•12 years ago
|
||
(In reply to Brian Hourigan [:digi] from comment #12) > It is worth noting that the 500 errors were only served from zlb1.ops.scl3, > the remainder of the zeus nodes did not log any errors. And zlb1 and 6 are the ones that host ftp...
Comment 15•12 years ago
|
||
philor, did you mention there is an ongoing problem with scattered 500 responses when test slaves make request to http://ftp.m.o/, probably for files in /pub/mozilla.org/firefox/tinderbox-builds/ /pub/mozilla.org/mobile/tinderbox-builds/
Comment 16•12 years ago
|
||
There was an ongoing problem with them, it feels like it's been since around September, but there haven't been any for several days, so those may have been the netapp issues.
Comment 17•12 years ago
|
||
Closing this one on grounds that it is probably the netapp issue that was at fault and we haven't seen any further issues in a few weeks. Reopen if it happens again.
Assignee: server-ops-infra → bhourigan
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•