1332337 - migrate machine health dashboard to releng services

Reporter

Description

•

9 years ago

The machine health tool should be migrated to releng services. We could move the https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ Here is the source https://github.com/mozilla-releng/services Documentation here https://docs.mozilla-releng.net/ As a first step, you could migrate the backend and leave the front end as js. (Releng services default for front-end is elm) Rok has mentioned that he is willing to work with you on a pair programming basis if you get stuck. When I talked to coop about this yesterday he mentioned that we should incorporate taskcluster auth to this page. Also he mentioned that if there are links on the machine health page that aren't useful or needed, it would be a good exercise to clean up the page as part of this exercise.

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Updated

•

9 years ago

Assignee: nobody → aobreja

Kim Moir [:kmoir] ET

Reporter

Comment 1

•

9 years ago

wAdding bug 1329255 as a dependency because we need to have additional security when we deploy

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 2

•

8 years ago

The master plan is to : - create the backend structure of api.py and api.yml in (1),for now in test phase we get the information on api.py from (2) and create api.yml based on data from api.py -redirect frontend (3) to use these informations from api.py and api.yml -change api.py to use the script instead of using the data directly from (2) -rewrite from scratch frontend using .elm,also check for improvements where is the case (1) https://github.com/mozilla-releng/services/tree/releng_slavehealth/src/releng_slavehealth/releng_slavehealth (2) https://dxr.mozilla.org/build-central/source/slave_health/json/test_json (3) https://github.com/mozilla-releng/services/tree/releng_slavehealth/src/releng_frontend/src/static/slavehealth

Kim Moir [:kmoir] ET

Reporter

Comment 3

•

8 years ago

Looks good, just a reminder that we won't be calling it "slave health", in taskcluster the terminology for machines is workers. So something like "worker dashboard" or whatever you want to call it. Also, slavery is very very terrible and we shouldn't be naming our systems after it

Kim Moir [:kmoir] ET

Reporter

Updated

•

8 years ago

Depends on: 1351705

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Updated

•

8 years ago

Depends on: 1354184

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 4

•

8 years ago

The first step is done,we have the api.py and api.yml created and the frontend uses the information from api.py. Tests on frontend worked well, in order to be able to access some URLs [1] where we added data,we needed login credentials so the proxy nodes were created and used in api.py. [2] Now we need to see how can we extract the information from "Buildbot" and "Slavealloc" databases and use them to generate the reports, for the moment these information are generated in [3] by a cronjob on cruncher-aws host.For this I filed Bug 1354184. [1]https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=build&type=bld-lion-r5&name=bld-lion-r5-090 [2]https://github.com/mozilla-releng/services/blob/releng_slavehealth/src/releng_slavehealth/releng_slavehealth/__init__.py#L34 [3] https://hg/build/slave_health/file/tip/json/

Chris AtLee [:catlee]

Comment 5

•

8 years ago

Are we still doing this?

Priority: -- → P2

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 6

•

8 years ago

Yes but currently is on hold,some other more urgent bugs came.

Kim Moir [:kmoir] ET

Reporter

Comment 7

•

8 years ago

Andrei did you have a chance to talk to Rok as I mentioned last week regarding the remaining work to migrate the backend to releng services? If we are close to the end, it makes sense to migrate. If not we don't want to invest much time in it because we will be changing our tools to support taskcluster queues/pending counts etc.

Flags: needinfo?(aobreja)

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 8

•

8 years ago

Didn't had much time to speak with Rok,more urgent tasks came,I think there are still some things to do here (testing and rewriting entire slave_health.py),it could take few weeks.

Flags: needinfo?(aobreja)

Kim Moir [:kmoir] ET

Reporter

Comment 9

•

8 years ago

If we limit the scope to migrating the backend, while minimally fixing the front end so that it can connect to the db etc, what is your time estimate for completion. (i.e. not rewrite everything in the front end to elm). I'm just trying to get an estimate of for converting the back end to releng services given that we seem to be close. At the same time, we will probably be modifying the tooling to support taskcluster.

Flags: needinfo?(aobreja)

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 10

•

8 years ago

It will still require some time as the new script that will take the place of slave_health.py is part of the backend.

Flags: needinfo?(aobreja)

Mihai Tabara [:mtabara]⌚️GMT

Comment 11

•

8 years ago

22:18:27 <mtabara> catlee: is this something we still want to do 1332337 ? 22:18:57 <@catlee> mtabara: I think so. 22:19:04 <@catlee> updated for TC workers

Mihai Tabara [:mtabara]⌚️GMT

Comment 12

•

8 years ago

Andrei asked this in the weekly releng meeting. We're pending a decision here until :garndt comes back from PTO so that we make sure we're not building the same piece of software as Hassan from TC does. So pending this work until further decision is taken.

Mihai Tabara [:mtabara]⌚️GMT

Comment 13

•

8 years ago

Note explaining the priority level: P5 doesn't mean we've lowered the priority, but the contrary. However, we're aligning these levels to the buildduty quarterly deliverables, where P1-P3 are taken by our daily waterline KTLO operational tasks.

Priority: P2 → P5

Andrei Obreja [:aobreja NOT AVAILABLE][:buildduty]

Assignee

Comment 14

•

8 years ago

I think the work here was moved to Bug 1394809, if I'm wrong and Worker Dashboard will not hold what is in machine health dashboard, please re-open.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → DUPLICATE

BMO Automation

Updated

•

7 years ago

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

Bugzilla

migrate machine health dashboard to releng services

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P5)

Tracking

(Not tracked)

People

(Reporter: kmoir, Assigned: aobreja)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Updated

Updated