Closed Bug 1409439 Opened 7 years ago Closed 7 years ago

Migrate 10 OS X machines back from Tasckcluster to Buildbot

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aobreja, Unassigned)

References

Details

Attachments

(1 file)

Right now we have 9 machines left on t-yosemite pool which every day have to ran abot 200 jobs/day so we have got backlog every day(check [1]). I can move back 10 machines from taskcluster tomorrow to better handle the load for BB OS X tests if everyone agree [1] https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/pending?from=now-7d&to=now
Flags: needinfo?(jlund)
Is that backlog the long pole for pushes where they run, or is that backlog regularly cleared before the pool of 7 WinXP machines clears its backlog? And how bad is the backlog for the pool where those 10 would be coming back from, that we will be increasing in order to speed up e2e time for the esr52 pushes that nobody actually waits for results from? Anecdotally, I know that people land pushes which are broken on OS X on the trunk because they didn't run OS X tests on Try because the pool is so backlogged, and it looks like right now Try is doing 6 hour old OS X tests, but I don't know whether that's the good part of our regular backlog or the bad part.
(In reply to Phil Ringnalda (:philor) from comment #1) > Is that backlog the long pole for pushes where they run, or is that backlog > regularly cleared before the pool of 7 WinXP machines clears its backlog? > > And how bad is the backlog for the pool where those 10 would be coming back > from, that we will be increasing in order to speed up e2e time for the esr52 > pushes that nobody actually waits for results from? Anecdotally, I know that > people land pushes which are broken on OS X on the trunk because they didn't > run OS X tests on Try because the pool is so backlogged, and it looks like > right now Try is doing 6 hour old OS X tests, but I don't know whether > that's the good part of our regular backlog or the bad part. This bug was open only for t-yosemite-r7 pool where we only have 9 machines in BB and the rest are in Taskcluster.This bug will only solve the problem with the BB backlog.The general issue with the OS X backlog is generated by the fact that we have to few machines there and to many jobs but after mdc1 machines will be in production the problem will be solved.
I think this is a good idea. Saying that, we do have option 2. I raised this and the xp pool with others this week as per your feedback Andrei. The net result is we can absolutely take machines we disabled recently in buildbot[1] and re-purpose them as yosemite and xp. This may take longer as we will need some help from ops but could be a good way to keep taskcluster unaffected. garndt, thoughts on borrowing machines from taskcluster infra? fubar, thoughts on re-imaging/purposing disabled machines? Where should we file that work and who should we assign? [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1393774
Flags: needinfo?(klibby)
Flags: needinfo?(jlund)
Flags: needinfo?(garndt)
(In reply to Jordan Lund (:jlund) from comment #3) > I think this is a good idea. Saying that, we do have option 2. I raised this > and the xp pool with others this week as per your feedback Andrei. The net > result is we can absolutely take machines we disabled recently in > buildbot[1] and re-purpose them as yosemite and xp. This may take longer as > we will need some help from ops but could be a good way to keep taskcluster > unaffected. > > garndt, thoughts on borrowing machines from taskcluster infra? > > fubar, thoughts on re-imaging/purposing disabled machines? Where should we > file that work and who should we assign? > > [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1393774 I discussed this more with Andrei. He can do 99% of the work for repurposing some xp machines(Bug 1410024) so we will proceed with that. For yosemite, he raised good points that it would be easiest to move existing yosemite machines from tc back to bb temporarily (this bug). pending looks fine for tc so moving 10 machines over shouldn't be too concerning, as long as garndt signs off.
Flags: needinfo?(klibby)
See Also: → tcmigration_cleanup
I think this would be fine. When we first migrated, I believe 30 machines were dedicated to bb and things still moved along. reallocating 10 machines might mean 30-50 tasks per hour that will need to be balanced among the remaining machines. We already usually schedule more than we can complete per hour, so this would just make that slightly worse.
Flags: needinfo?(garndt)
Puppet patch to move t-yosemite-r7-001[0-9] to Buildbot.
Attachment #8920521 - Flags: review?(aselagea)
Attachment #8920521 - Flags: review?(aselagea) → review+
t-yosemite-r7-001[0-9] machines were moved back and re-imaged.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: