Closed Bug 924109 Opened 12 years ago Closed 12 years ago

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

Ok current state, and brief investigation: As of when I logged into buildapi01 ~buildapi/buildapi.log was a 0-byte file, with nothing new going in. per https://wiki.mozilla.org/ReleaseEngineering/How_To/Restart_BuildAPI I restarted buildapi and then the log started to populate and the js files recovered. The self-serve page still is broken however, so I restarted the self-serve agent on bm36 and still no luck. philor claims a manual retrigger worked, but the global page is still busted. I'm going away since I'm still at hotel in BRU from summit... so hoping either this can reopen trees or someone else can hop in and take over.
...and after I type c#1 I reload self-serve page again and it loads up fine... so something unstuck itself, I suggest we leave this bug open for now until someone can peek at what may have gone wrong here. See-Also Bug 922275
Looks like things are working OK now, lowering severity. Not sure if there's more investigation we can do here or not.
Severity: blocker → major
We can't tell what's behind the 500s when bug 806777 results in empty logs on the server. Probably this was a DB connection issue.
(In reply to Nick Thomas [:nthomas] from comment #4) > We can't tell what's behind the 500s when bug 806777 results in empty logs > on the server. Probably this was a DB connection issue. fox2mike, sheeri: trees reopened, but looking for root cause: do you see any alerting for this? Any insights?
Flags: needinfo?(shyam)
Flags: needinfo?(scabral)
https://mana.mozilla.org/wiki/display/IT/BuildAPI#BuildAPI-Database says the database is the buildbot cluster, and there haven't been any alerts on that database. If you can see NewRelic, you can see that buildbot1 and buildbot2 have been consistent with # queries and connections, with a little dip in queries only (that's hard to tell if it's because the trees were closed, or it caused it): https://rpm.newrelic.com/accounts/263620/dashboard/3101981 https://rpm.newrelic.com/accounts/263620/dashboard/3101982 (I looked at a 3 hour resolution from 06:00 PDT until 09:00 PDT).
Flags: needinfo?(scabral)
10:17:15.755 GET https://secure.pub.build.mozilla.org/builddata/buildjson/builds-pending.js [HTTP/1.1 503 Service Temporarily Unavailable 563ms] 10:17:15.756 GET https://secure.pub.build.mozilla.org/builddata/buildjson/builds-running.js [HTTP/1.1 503 Service Temporarily Unavailable 560ms] Trees closed again.
Severity: major → blocker
***** Nagios ***** Notification Type: PROBLEM Service: http file age - /buildjson/builds-4hr.js.gz Host: builddata.pub.build.mozilla.org Address: 63.245.215.57 State: CRITICAL Date/Time: 10-09-2013 02:18:06 Additional Info: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - Document modification date unknown - 548 bytes in 0.011 second response time and then... ***** Nagios ***** Notification Type: RECOVERY Service: http file age - /buildjson/builds-4hr.js.gz Host: builddata.pub.build.mozilla.org Address: 63.245.215.57 State: OK Date/Time: 10-09-2013 02:23:06 Additional Info: HTTP OK: HTTP/1.1 200 OK - 673270 bytes in 1.056 second response time --- -> seems to be ok again now. Trees reopened.
Severity: blocker → critical
Blocks: 926246
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(shyam)
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.