Closed Bug 1381269 Opened 7 years ago Closed 7 years ago

Buildbot pending & running jobs graphs stop at July 14 0830 UTC

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

References

Details

URLs are:
https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/pending
https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/running

aws_watch_pending.py on aws-manager2.srv.releng.scl3.mozilla.co sends this data out via graphite relay.private.scl3.mozilla.com (see https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/watch_pending.cfg#L304)

Network flow still seems OK:
[nthomas@aws-manager2.srv.releng.scl3.mozilla.com ~]$ nc -vz graphite-relay.private.scl3.mozilla.com 2003
Connection to graphite-relay.private.scl3.mozilla.com 2003 port [tcp/cfinger] succeeded!

No errors in the aws_watch_pending log in papertrail.
So https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/ec2-dashboard still works (mostly, there are a few graphs which have been broken > 1 week), and that's using the releng.aws.aws_watch_pending prefix.

The broken graphs are using reportor, with prefix reportor.releng.pending. Looking at logs like https://secure.pub.build.mozilla.org/builddata/reports/reportor/2017-07-17:01:15/pending/logs/output.log we have:

2017-07-16 18:15:01,852 - Fetching pending info from http://buildapi.pvt.build.mozilla.org/buildapi/pending?format=json
2017-07-16 18:15:01,862 - Starting new HTTP connection (1): buildapi.pvt.build.mozilla.org
2017-07-16 18:15:02,111 - Fetching running info from http://buildapi.pvt.build.mozilla.org/buildapi/running?format=json
2017-07-16 18:15:02,112 - Starting new HTTP connection (1): buildapi.pvt.build.mozilla.org
2017-07-16 18:15:02,388 - Fetching allthethings from https://secure.pub.build.mozilla.org/builddata/reports/allthethings.json
2017-07-16 18:15:02,390 - Starting new HTTPS connection (1): secure.pub.build.mozilla.org
Traceback (most recent call last):
  File "pending.py", line 40, in <module>
    pending_by_pool[pool[0].rsplit('-', 1)[0]] = 0
IndexError: list index out of range

which is here https://github.com/catlee/reportor/blob/master/reports/pending/pending.py#L39. Turns out we have an empty slavepool da39a3ee5e6b4b0d3255bfef95601890afd80709 in allthethings, which is set for these jobs:
Ubuntu HW 12.04 x64 devedition mozilla-beta talos g3-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos g2-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos dromaeojs-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos other-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos tp5o-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos g1-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos g4-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos chromez-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos g5-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos perf-reftest-e10s
Ubuntu HW 12.04 x64 devedition mozilla-beta talos svgr-e10s

So this is from bug 1359595.
Depends on: 1359595
Actually I was overly optimistic and I backed out my change. Needs at least one other fix first.
Fixed by https://bugzilla.mozilla.org/show_bug.cgi?id=1359595#c138.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.