Closed
Bug 924109
Opened 12 years ago
Closed 12 years ago
https://secure.pub.build.mozilla.org/builddata/buildjson/builds-pending.js , builds-running.js and self-serve down
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: philor, Unassigned)
References
Details
All trees closed, https://secure.pub.build.mozilla.org/builddata/buildjson/builds-pending.js and https://secure.pub.build.mozilla.org/builddata/buildjson/builds-running.js and https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central (or any other particular branch) are all returning 500 errors.
Comment 1•12 years ago
|
||
Ok current state, and brief investigation:
As of when I logged into buildapi01
~buildapi/buildapi.log was a 0-byte file, with nothing new going in.
per https://wiki.mozilla.org/ReleaseEngineering/How_To/Restart_BuildAPI I restarted buildapi and then the log started to populate and the js files recovered.
The self-serve page still is broken however, so I restarted the self-serve agent on bm36 and still no luck.
philor claims a manual retrigger worked, but the global page is still busted. I'm going away since I'm still at hotel in BRU from summit... so hoping either this can reopen trees or someone else can hop in and take over.
Comment 2•12 years ago
|
||
...and after I type c#1 I reload self-serve page again and it loads up fine... so something unstuck itself, I suggest we leave this bug open for now until someone can peek at what may have gone wrong here.
See-Also Bug 922275
Comment 3•12 years ago
|
||
Looks like things are working OK now, lowering severity. Not sure if there's more investigation we can do here or not.
Severity: blocker → major
Comment 4•12 years ago
|
||
We can't tell what's behind the 500s when bug 806777 results in empty logs on the server. Probably this was a DB connection issue.
Comment 5•12 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #4)
> We can't tell what's behind the 500s when bug 806777 results in empty logs
> on the server. Probably this was a DB connection issue.
fox2mike, sheeri: trees reopened, but looking for root cause: do you see any alerting for this? Any insights?
Flags: needinfo?(shyam)
Flags: needinfo?(scabral)
Comment 6•12 years ago
|
||
https://mana.mozilla.org/wiki/display/IT/BuildAPI#BuildAPI-Database says the database is the buildbot cluster, and there haven't been any alerts on that database.
If you can see NewRelic, you can see that buildbot1 and buildbot2 have been consistent with # queries and connections, with a little dip in queries only (that's hard to tell if it's because the trees were closed, or it caused it):
https://rpm.newrelic.com/accounts/263620/dashboard/3101981
https://rpm.newrelic.com/accounts/263620/dashboard/3101982
(I looked at a 3 hour resolution from 06:00 PDT until 09:00 PDT).
Flags: needinfo?(scabral)
Comment 7•12 years ago
|
||
10:17:15.755 GET https://secure.pub.build.mozilla.org/builddata/buildjson/builds-pending.js [HTTP/1.1 503 Service Temporarily Unavailable 563ms]
10:17:15.756 GET https://secure.pub.build.mozilla.org/builddata/buildjson/builds-running.js [HTTP/1.1 503 Service Temporarily Unavailable 560ms]
Trees closed again.
Severity: major → blocker
Comment 8•12 years ago
|
||
***** Nagios *****
Notification Type: PROBLEM
Service: http file age - /buildjson/builds-4hr.js.gz
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: CRITICAL
Date/Time: 10-09-2013 02:18:06
Additional Info:
HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - Document modification date unknown - 548 bytes in 0.011 second response time
and then...
***** Nagios *****
Notification Type: RECOVERY
Service: http file age - /buildjson/builds-4hr.js.gz
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: OK
Date/Time: 10-09-2013 02:23:06
Additional Info:
HTTP OK: HTTP/1.1 200 OK - 673270 bytes in 1.056 second response time
---
-> seems to be ok again now.
Trees reopened.
Severity: blocker → critical
Reporter | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Updated•12 years ago
|
Flags: needinfo?(shyam)
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•