Closed
Bug 1381269
Opened 7 years ago
Closed 7 years ago
Buildbot pending & running jobs graphs stop at July 14 0830 UTC
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Unassigned)
References
Details
URLs are: https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/pending https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/running aws_watch_pending.py on aws-manager2.srv.releng.scl3.mozilla.co sends this data out via graphite relay.private.scl3.mozilla.com (see https://github.com/mozilla-releng/build-cloud-tools/blob/master/configs/watch_pending.cfg#L304) Network flow still seems OK: [nthomas@aws-manager2.srv.releng.scl3.mozilla.com ~]$ nc -vz graphite-relay.private.scl3.mozilla.com 2003 Connection to graphite-relay.private.scl3.mozilla.com 2003 port [tcp/cfinger] succeeded! No errors in the aws_watch_pending log in papertrail.
Reporter | ||
Comment 1•7 years ago
|
||
So https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/ec2-dashboard still works (mostly, there are a few graphs which have been broken > 1 week), and that's using the releng.aws.aws_watch_pending prefix. The broken graphs are using reportor, with prefix reportor.releng.pending. Looking at logs like https://secure.pub.build.mozilla.org/builddata/reports/reportor/2017-07-17:01:15/pending/logs/output.log we have: 2017-07-16 18:15:01,852 - Fetching pending info from http://buildapi.pvt.build.mozilla.org/buildapi/pending?format=json 2017-07-16 18:15:01,862 - Starting new HTTP connection (1): buildapi.pvt.build.mozilla.org 2017-07-16 18:15:02,111 - Fetching running info from http://buildapi.pvt.build.mozilla.org/buildapi/running?format=json 2017-07-16 18:15:02,112 - Starting new HTTP connection (1): buildapi.pvt.build.mozilla.org 2017-07-16 18:15:02,388 - Fetching allthethings from https://secure.pub.build.mozilla.org/builddata/reports/allthethings.json 2017-07-16 18:15:02,390 - Starting new HTTPS connection (1): secure.pub.build.mozilla.org Traceback (most recent call last): File "pending.py", line 40, in <module> pending_by_pool[pool[0].rsplit('-', 1)[0]] = 0 IndexError: list index out of range which is here https://github.com/catlee/reportor/blob/master/reports/pending/pending.py#L39. Turns out we have an empty slavepool da39a3ee5e6b4b0d3255bfef95601890afd80709 in allthethings, which is set for these jobs: Ubuntu HW 12.04 x64 devedition mozilla-beta talos g3-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos g2-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos dromaeojs-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos other-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos tp5o-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos g1-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos g4-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos chromez-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos g5-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos perf-reftest-e10s Ubuntu HW 12.04 x64 devedition mozilla-beta talos svgr-e10s So this is from bug 1359595.
Depends on: 1359595
Reporter | ||
Comment 2•7 years ago
|
||
Should be fixed by https://bugzilla.mozilla.org/show_bug.cgi?id=1359595#c127.
Reporter | ||
Comment 3•7 years ago
|
||
Actually I was overly optimistic and I backed out my change. Needs at least one other fix first.
Comment 4•7 years ago
|
||
Fixed by https://bugzilla.mozilla.org/show_bug.cgi?id=1359595#c138.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•