Closed Bug 1008238 Opened 11 years ago Closed 11 years ago

Replication lag on buildbot2.db and self-serve timing out

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

Details

16:58 <Callek> coop|buildduty: ok, we may have a need to close trees pretty soon, buildbot2.db looks pretty unhealthy (replication lag is major --- and due to new queries not coming in due to something slow afaict from new relic) 16:58 <Callek> coop|buildduty: if you have access: https://rpm.newrelic.com/accounts/263620/dashboard/3101982/page/2 16:59 <Callek> coop|buildduty: #sysadmins is on it at least == 16:55 <Gijs_away> is retriggering on try dead right now? 16:55 <Gijs> edmorley|sheriffduty: ^^ self-serve sadness? :( 16:58 <edmorley|sheriffduty> Gijs: it timed out on me a couple of times in the last 30 mins, but then recovered - I'll tkae a look === All trees closed.
***** Nagios ***** Notification Type: PROBLEM Service: http file age - /buildjson/builds-4hr.js.gz Host: builddata.pub.build.mozilla.org Address: 63.245.215.57 State: CRITICAL Date/Time: 05-09-2014 08:32:23 Additional Info: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:10:46 ago - 1424 bytes in 0.006 second response time ***** Nagios ***** Notification Type: PROBLEM Service: http - /buildapi/self-serve/jobs Host: buildapi.pvt.build.mozilla.org Address: 10.22.74.160 State: CRITICAL Date/Time: 05-09-2014 08:53:53 Additional Info: CRITICAL - Socket timeout after 10 seconds ***** Nagios ***** Notification Type: PROBLEM Service: http file age - /buildjson/builds-pending.js Host: builddata.pub.build.mozilla.org Address: 63.245.215.57 State: CRITICAL Date/Time: 05-09-2014 08:58:13 Additional Info: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:13:09 ago - 8187 bytes in 0.006 second response time ***** Nagios ***** Notification Type: PROBLEM Service: http file age - /buildjson/builds-running.js Host: builddata.pub.build.mozilla.org Address: 63.245.215.57 State: CRITICAL Date/Time: 05-09-2014 09:03:23 Additional Info: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:13:10 ago - 934 bytes in 0.006 second response time
Sorry, my fault. I was running some queries to try and get total times (wait + build) out of the DB and into graphite. I've trimmed the queries down now so they shouldn't add so much load.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
We now appear to be caught up again, trees reopened :-)
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.