Closed Bug 631316 Opened 13 years ago Closed 10 years ago

create a buildapi page that shows what each slave is running or last run

Categories

(Release Engineering :: General, defect, P5)

x86
All
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: armenzg, Unassigned)

Details

(Keywords: buildapi)

I want to be able to filter by a machine type to know what each machine is currently running and what was the last build it did.

This is very useful to know if we have less machines working than should be.

Currently I can go to the running page and filter by "Rev3 MacOSX 10.6.2" and I can know how many snow machines are currently running any job but it doesn't help me to know when was the last time a machine did a job (the machine might be rebooting or not connected to any master).

If instead I use the buildapi/recent/talos-r3-snow-036 page I can know which was the last job done on that machine but I can't tell if it is currently running any job.

Therefore what I want to merge is:
* what each slave is currently running
* if no job is being run then let me know what was the last job completed
* the page can be filtered

The second on the list can help us spot slaves that have not done a job in quite some time.

A bonus would be if the slave name could have a hyperlink to the buildbot's slave page where it is currently running a job or where it last did a job.

If instead of creating a new page we could:
1) modify the running page to add a column for the slave and a hyperlink to the slave page
2) modify the recent/slavenames page to jobs/slavename which would have the recent + currently running jobs
3) if 1 & 2 could be done, then the hyperlink on the running page could point to the "jobs/slavename" page which could have the hyperlinks to the jobs on buildbot

Please let me know if it doesn't make sense.
I agree that this would be very useful but it is currently not possible. The scheduler db has no record of what slave is doing a job (see the builds and buildrequests table) and it would be a buildbot enhancement to track that. I remember we talked about that recently so it may be filed upstream already.

See also idle slave reports Anamaria is working on.
OS: Mac OS X → All
Priority: -- → P5
Hi Armen,

I addressed some of the issues more or less in the Slave Reports.

"Status at endtime" column (last one) in the Slaves Report, 1 is busy, 0 is idle. I should change it to the actual words to allow filtering/search:

http://cruncher.build.mozilla.org/~anamarias/wsgi/reports/slaves

** If busy, I could add the name of the build the slave is doing, and for how long it's been working at it
** If idle, I could display how long has it been since the completion of the last build

All this could easily work for any timeframe way back in the past (startime and endtime some time ago). However, I'm not sure how well it would work as a monitoring tool ..., as I remember the info about a build is logged in a lot later, maybe even after it's done... I'll have to recheck with catlee again on this.

Slave Reports gets it's data from statusdb, and as Nick said, there's no info about the machines/slaves in the schedulerdb.

For one single machine, you could check it's activity: all builds done, length and succession, on the Slave Report page:

e.g. http://cruncher.build.mozilla.org/~anamarias/wsgi/reports/slaves/99?starttime=1296511200&endtime=1296856800

Again, I'm not sure it's helpful to see what a slave is doing RIGHT now. Haven't looked at the reports as a 'right now' monitoring tool.

All the ideas are great and helpful, but they all rely on the ability to track what a slave is currently doing.
I rechecked with catlee, and statusdb is updated with the info about builds only after they are done.

Also schedulerdb, which holds data about currently running builds, has no information on the slaves executing them.

It would be cool to have this in the future though.
Keywords: buildapi
Whiteboard: [buildapi]
Product: mozilla.org → Release Engineering
This is slave health.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.