Closed Bug 1163439 Opened 9 years ago Closed 9 years ago

builds-4hr.js.gz not updating, all trees closed

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

<nagios-releng> Sun 15:18:21 PDT [4137] builddata.pub.build.mozilla.org:http file age - /buildjson/builds-4hr.js.gz is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:11:07 ago - 1424 bytes in 0.003 second response time
And scheduling is hosed, the last two things I pushed to m-c, https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=7c00628cbfb1 and https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=8926c318d115 both only have TC builds running and when https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central/rev/7c00628cbfb1 happens to respond it claims to have never heard of such a revision.

Somebody's eating the db?
newrelic says that we had a big spike in replication lag on buildbot1 starting around 15:08, ending around 16:00. replication lag peaked around 5.7k seconds. (I'm not sure how you get to more than 3600 seconds lag in less than an hour...)

builds-4hr looks to be stuck:

[root@relengwebadm.private.scl3 buildapi]# ps -ef | grep builds-4hr
buildapi 17684 17678  0 15:08 ?        00:00:02 python2.7 /data/releng/www/buildapi/run-reporter --config /data/releng/www/buildapi/reporter.cfg -z -o /mnt/netapp/relengweb/builddata/buildjson/builds-4hr.js.gz --starttime 1431281281

strace says stuck on read(4,)
fd 4 is:
python2.7 17684 buildapi    4u  IPv4 31500270      0t0      TCP relengwebadm.private.scl3.mozilla.com:38460->buildbot-ro-vip.db.scl3.mozilla.c
om:mysql (ESTABLISHED)

I've killed the stuck process, and it's running properly now.
also, from IRC earlier today:
sheeri | [07:10:41] hwine-ooo: hal buildbot1 db master crashed, can't figure out why, failed over to buildbot2
trees re-opened; dropping to major for now.
Severity: blocker → major
Depends on: 1163610
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.