Closed Bug 1332337 Opened 9 years ago Closed 8 years ago

migrate machine health dashboard to releng services

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P5)

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1394809

People

(Reporter: kmoir, Assigned: aobreja)

References

Details

The machine health tool should be migrated to releng services. We could move the https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ Here is the source https://github.com/mozilla-releng/services Documentation here https://docs.mozilla-releng.net/ As a first step, you could migrate the backend and leave the front end as js. (Releng services default for front-end is elm) Rok has mentioned that he is willing to work with you on a pair programming basis if you get stuck. When I talked to coop about this yesterday he mentioned that we should incorporate taskcluster auth to this page. Also he mentioned that if there are links on the machine health page that aren't useful or needed, it would be a good exercise to clean up the page as part of this exercise.
Assignee: nobody → aobreja
wAdding bug 1329255 as a dependency because we need to have additional security when we deploy
The master plan is to : - create the backend structure of api.py and api.yml in (1),for now in test phase we get the information on api.py from (2) and create api.yml based on data from api.py -redirect frontend (3) to use these informations from api.py and api.yml -change api.py to use the script instead of using the data directly from (2) -rewrite from scratch frontend using .elm,also check for improvements where is the case (1) https://github.com/mozilla-releng/services/tree/releng_slavehealth/src/releng_slavehealth/releng_slavehealth (2) https://dxr.mozilla.org/build-central/source/slave_health/json/test_json (3) https://github.com/mozilla-releng/services/tree/releng_slavehealth/src/releng_frontend/src/static/slavehealth
Looks good, just a reminder that we won't be calling it "slave health", in taskcluster the terminology for machines is workers. So something like "worker dashboard" or whatever you want to call it. Also, slavery is very very terrible and we shouldn't be naming our systems after it
Depends on: 1351705
The first step is done,we have the api.py and api.yml created and the frontend uses the information from api.py. Tests on frontend worked well, in order to be able to access some URLs [1] where we added data,we needed login credentials so the proxy nodes were created and used in api.py. [2] Now we need to see how can we extract the information from "Buildbot" and "Slavealloc" databases and use them to generate the reports, for the moment these information are generated in [3] by a cronjob on cruncher-aws host.For this I filed Bug 1354184. [1]https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=build&type=bld-lion-r5&name=bld-lion-r5-090 [2]https://github.com/mozilla-releng/services/blob/releng_slavehealth/src/releng_slavehealth/releng_slavehealth/__init__.py#L34 [3] https://hg/build/slave_health/file/tip/json/
Are we still doing this?
Priority: -- → P2
Yes but currently is on hold,some other more urgent bugs came.
Andrei did you have a chance to talk to Rok as I mentioned last week regarding the remaining work to migrate the backend to releng services? If we are close to the end, it makes sense to migrate. If not we don't want to invest much time in it because we will be changing our tools to support taskcluster queues/pending counts etc.
Flags: needinfo?(aobreja)
Didn't had much time to speak with Rok,more urgent tasks came,I think there are still some things to do here (testing and rewriting entire slave_health.py),it could take few weeks.
Flags: needinfo?(aobreja)
If we limit the scope to migrating the backend, while minimally fixing the front end so that it can connect to the db etc, what is your time estimate for completion. (i.e. not rewrite everything in the front end to elm). I'm just trying to get an estimate of for converting the back end to releng services given that we seem to be close. At the same time, we will probably be modifying the tooling to support taskcluster.
Flags: needinfo?(aobreja)
It will still require some time as the new script that will take the place of slave_health.py is part of the backend.
Flags: needinfo?(aobreja)
22:18:27 <mtabara> catlee: is this something we still want to do 1332337 ? 22:18:57 <@catlee> mtabara: I think so. 22:19:04 <@catlee> updated for TC workers
Andrei asked this in the weekly releng meeting. We're pending a decision here until :garndt comes back from PTO so that we make sure we're not building the same piece of software as Hassan from TC does. So pending this work until further decision is taken.
Note explaining the priority level: P5 doesn't mean we've lowered the priority, but the contrary. However, we're aligning these levels to the buildduty quarterly deliverables, where P1-P3 are taken by our daily waterline KTLO operational tasks.
Priority: P2 → P5
I think the work here was moved to Bug 1394809, if I'm wrong and Worker Dashboard will not hold what is in machine health dashboard, please re-open.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.