Closed Bug 751994 Opened 13 years ago Closed 12 years ago

Many tegras in the last build per slave report

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Unassigned)

Details

There are many tegras reported that have not been taking jobs. If these tegras are not coming back to production we should remove them from the list. http://build.mozilla.org/builds/last-job-per-slave.html#test
I note that we have a lot of queue collapsing going on for tegra jobs, because the machines in production are not keeping up. Instead of removing these from the list, I think it might be more important to figure out why these tegras are not accepting jobs and reducing queue collapsing.
that is what Callek and I are doing. 1 - we balanced the tegras across foopies to see if overloaded foopies were causing timeouts and oranges. This has been running for 2 days now so by early next week will have an idea if it worked or if we need to do more. 2 - callek has a number of improvements to the clientproxy and buildbot code that are pending that will increase the uptime of tegras, those just need some attention love so we can land. yep, 7k jobs for tegras on average makes for a very busy pool.
There are some tegras that have exceptionally high values for their last-jobs-per-slave view in the report. I do recommend removing at least some of these, if we can sanely do so. http://build.mozilla.org/builds/last-job-per-slave.html#test Going through the all listed tegras with last-job older than 12 days... There are various reasons for this, I'll list the ones sure to remove, and why. * tegra-031 -- Aki using for Bug 650890; also staging. * tegra-033 -- Pulled in Bug 707584 * tegra-034 -- Pulled in Bug 707584 * tegra-043 -- Pulled in Bug 707584 * tegra-044 -- Pulled in Bug 707584 * tegra-069 -- Pulled in Bug 707584 * tegra-077 -- Pulled in Bug 736630 * tegra-110 -- Assigned to staging, on foopy13! (todo: swap out for a non-staging tegra on this foopy) * tegra-153 -- Pulled in Bug 707584 * tegra-156 -- Pulled in Bug 707584 * tegra-175 -- Pulled in Bug 707584 * tegra-176 -- Pulled in Bug 707584 * tegra-224 -- Moving to staging in Bug 747641 * tegra-230 -- Pulled in Bug 715762 * tegra-268 -- Pulled in Bug 738422 The following are broken for other reasons, and should NOT be pulled from the report: * tegra-039 -- 13 days -- Need to remote format SDCard -- * tegra-059 -- 13 days -- Bug 740456 * tegra-105 -- 14 days -- Bug 750780 * tegra-192 -- 50 days -- Online but needs SUTAgent fixed. * tegra-223 -- 50 days -- Brought back in Bug 740438, SUTAgent issues too. * tegra-251 -- 56 days -- Online but needs SUTAgent fixed. * tegra-276 -- 51 days -- Online but needs SUTAgent fixed.
In an effort to make stuff saner, I dove into the TODO's there... (In reply to Justin Wood (:Callek) from comment #3) > The following are broken for other reasons, and should NOT be pulled from > the report: > > * tegra-039 -- 13 days -- Need to remote format SDCard -- > * tegra-059 -- 13 days -- Bug 740456 These two have been fixed earlier and are taking jobs as we speak > * tegra-105 -- 14 days -- Bug 750780 Still broken > * tegra-192 -- 50 days -- Online but needs SUTAgent fixed. > * tegra-223 -- 50 days -- Brought back in Bug 740438, SUTAgent issues too. > * tegra-251 -- 56 days -- Online but needs SUTAgent fixed. > * tegra-276 -- 51 days -- Online but needs SUTAgent fixed. These had SUTAgent manually updated (since my updateSUT fix didn't officially deploy yet) and are now taking jobs.
This bug has outlived its time-specific usefulness
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.