Closed Bug 1008238 Opened 10 years ago Closed 10 years ago

Replication lag on buildbot2.db and self-serve timing out

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

Details

16:58 <Callek> coop|buildduty: ok, we may have a need to close trees pretty soon, buildbot2.db looks pretty unhealthy (replication lag is major --- and due to new queries not coming in due to something slow afaict from new relic)
16:58 <Callek> coop|buildduty: if you have access: https://rpm.newrelic.com/accounts/263620/dashboard/3101982/page/2
16:59 <Callek> coop|buildduty: #sysadmins is on it at least

==

16:55 <Gijs_away> is retriggering on try dead right now?
16:55 <Gijs> edmorley|sheriffduty: ^^ self-serve sadness? :(
16:58 <edmorley|sheriffduty> Gijs: it timed out on me a couple of times in the last 30 mins, but then recovered - I'll tkae a look

===

All trees closed.
***** Nagios  *****

Notification Type: PROBLEM

Service: http file age - /buildjson/builds-4hr.js.gz
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: CRITICAL

Date/Time: 05-09-2014 08:32:23

Additional Info:
HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:10:46 ago - 1424 bytes in 0.006 second response time


***** Nagios  *****

Notification Type: PROBLEM

Service: http - /buildapi/self-serve/jobs
Host: buildapi.pvt.build.mozilla.org
Address: 10.22.74.160
State: CRITICAL

Date/Time: 05-09-2014 08:53:53

Additional Info:
CRITICAL - Socket timeout after 10 seconds


***** Nagios  *****

Notification Type: PROBLEM

Service: http file age - /buildjson/builds-pending.js
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: CRITICAL

Date/Time: 05-09-2014 08:58:13

Additional Info:
HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:13:09 ago - 8187 bytes in 0.006 second response time


***** Nagios  *****

Notification Type: PROBLEM

Service: http file age - /buildjson/builds-running.js
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: CRITICAL

Date/Time: 05-09-2014 09:03:23

Additional Info:
HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:13:10 ago - 934 bytes in 0.006 second response time
Sorry, my fault. I was running some queries to try and get total times (wait + build) out of the DB and into graphite.

I've trimmed the queries down now so they shouldn't add so much load.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
We now appear to be caught up again, trees reopened :-)
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.