Closed
Bug 1298437
Opened 8 years ago
Closed 7 years ago
create a pool of 10 machines for OSX tests to be run as tier-2 in taskcluster
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jmaher, Assigned: jmaher)
References
Details
Attachments
(2 files)
151.93 KB,
image/png
|
Details | |
804 bytes,
patch
|
coop
:
review+
|
Details | Diff | Splinter Review |
our plan is to run a subset of tests as tier-2 for osx on trunk. This would allow us to get by with a smaller pool of machines. When we feel all worker, build issues, and deployment issues are resolved, we will make a larger pool and do a quick transition of all debug tests to be tier-1.
Comment 1•8 years ago
|
||
Specifically this is to use a pool of machines to run the in-progress taskcluster-worker for os x tests.
Updated•8 years ago
|
Summary: create a pool of 20 machines for OSX tests to be run as tier-2 → create a pool of 20 machines for OSX tests to be run as tier-2 in taskcluster
Comment 2•8 years ago
|
||
Is this supposed to be subtracting from the existing pool of 400 10.10 machines? If so, I'm not sure what this looks like for implementation (hostname, OS version, what gets installed, etc). Could someone provide some guidance?
Assignee | ||
Comment 3•8 years ago
|
||
it would subtract from the existing pool- right now we have 1 machine already out as a 'loaner', should we just add 19 more loaners? These should be identical to the existing osx 10.10 machines except we won't need buildbot running on them. possibly :wcosta knows more about what is needed?
Comment 4•8 years ago
|
||
Okay, in that case, I'll move this over to the buildduty queue, since they handle loaners.
Assignee: relops → nobody
Component: RelOps → Buildduty
Product: Infrastructure & Operations → Release Engineering
QA Contact: arich → bugspam.Callek
Comment 5•8 years ago
|
||
We can help with this one, but would like to know what exactly is needed. @wcosta Any ideas? :-)
Flags: needinfo?(wcosta)
Comment 6•8 years ago
|
||
(In reply to Alin Selagea [:aselagea][:buildduty] from comment #5) > We can help with this one, but would like to know what exactly is needed. > > @wcosta Any ideas? :-) As taskcluster-worker is still WIP, I would like to keep one loaner to myself for development. Notice things are not ready yet, it will take a couple of weeks before we have a version of taskcluster-worker ready for tier-2.
Flags: needinfo?(wcosta)
Comment 7•8 years ago
|
||
There two issues in loaner machines: 1) We need a public IP for liveloggin. 2) We need to redirect syslog to papertrail. Should I file bugs for these?
Flags: needinfo?(arich)
Comment 8•8 years ago
|
||
The machines already send logs to papertail, so either it's a matter of changing the config to send them to a different account or you just look for the logs in the releng papertrail account. For security purposes, I can't imagine that we will ever allow machines in the datacenter (especially running desktop OSes) to have a public IP, though. At best you're going to need to connect to the VPN and auth with MFA.
Flags: needinfo?(arich)
Comment 9•8 years ago
|
||
That's good to know about those requirements. I wasn't not aware that we would not be able to open up a port to those machines, but it makes sense. We might need to get creative with how to allow live logging from those instances. We originally had some kind of azure logging backend that we wrote to that could be made accessible without touchign the machine, maybe that's what we do here too.
Comment 10•8 years ago
|
||
hrm, We have this Treeherder bug that prevents it parses taskcluster-worker logs. Not sure if this a blocker to deploy OS X Tier 2 or not.
Flags: needinfo?(garndt)
Updated•8 years ago
|
Comment 11•8 years ago
|
||
I talked to garndt in irc, I think this is ready to go!
Comment 12•8 years ago
|
||
Some of the reasoning behind this be ok is that the breakage is outside of TaskCluster logging and is affecting other systems. If a job was already live, the tier level (or being hidden) would not be touched (I hope) if log parsing was an issue with another system.
Flags: needinfo?(garndt)
Comment 13•8 years ago
|
||
Okay, I'd have two questions here: - should we create a separate pool for these 20 yosemite machines which are going to be used for tests? Or simply disable them in slavealloc and add a corresponding note? - are there any other requirements for these machines/loaners?
Flags: needinfo?(garndt)
Comment 14•8 years ago
|
||
I'm not sure how the pools of hardware have been managed in the past, but these 20 machines are still considered a trial of sorts. If it proves that we're on the right track the next steps would be to work with someone familiar with how to provision these machines to provision them in a more permanent fashion and slowly move machines from the buildbot pool to a taskcluster pool. I hope I didn't confuse it more.
Flags: needinfo?(garndt)
Assignee | ||
Comment 15•8 years ago
|
||
I believe the intention here would be to get these as loaners and we would do the manual work to get the machines setup for taskcluster. The medium term goal is to support both buildbot and taskcluster based images/machines, and long term would be only taskcluster. I do not thing we are ready yet for these to be in a special pool, possibly if we feel we are ready after getting the loaners, then we could create a pool with the 20 machines to start with and grow it as we make it more formalized.
Comment 16•8 years ago
|
||
All right then! I disabled t-yosemite-r7 machines in range [0040 - 0069] and added a note in slavealloc for each of them (excluding t-yosemite-r7-0050 which is already loaned to :wcosta). It will take some time for these machines to finish the current jobs and will then reboot. I didn't do any sort of cleaning for now. Let me know if anything else is needed.
Comment 17•8 years ago
|
||
I'll assign this to Joel while the work here is in progress.
Assignee: nobody → jmaher
Assignee | ||
Comment 18•8 years ago
|
||
:aselagea, will those machines in 0040-0069 have ssh/vnc credentials to access them? I assume they will be accessible by :wcosta (and maybe me, :jmaher) ?
Flags: needinfo?(aselagea)
Comment 19•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #18) > :aselagea, will those machines in 0040-0069 have ssh/vnc credentials to > access them? I assume they will be accessible by :wcosta (and maybe me, > :jmaher) ? The range actually is [0040-0059], sorry for mistyping that. Noticed that :wcosta has been added to the releng and vpn_releng LDAP groups in bug 1309408, so he should have access to those machines via ssh (using his ssh keypair). The VNC access is not set at the moment though. Per IRC: aselagea|buildduty> Alin Selagea jmaher: hello! 16:15:44 jmaher: I was wondering if you also need access to the yosemite machines disabled in bug 1298437 :) 16:16:25 <jmaher> aselagea|buildduty: technically I don't, but it would be nice if others like myself could help out wcosta as needed So I'd have two questions here: 1. @wcosta: is VNC access also needed? If yes, we can go on and ensure that access, then e-mail the password for that 2. @coop: any suggestions on how we could ensure access for Joel to those machines?
Flags: needinfo?(wcosta)
Flags: needinfo?(coop)
Flags: needinfo?(aselagea)
Comment 20•8 years ago
|
||
(In reply to Alin Selagea [:aselagea][:buildduty] from comment #19) [snip] > 1. @wcosta: is VNC access also needed? If yes, we can go on and ensure that > access, then e-mail the password for that Yes, please :)
Flags: needinfo?(wcosta)
Comment 21•8 years ago
|
||
(In reply to Alin Selagea [:aselagea][:buildduty] from comment #19) > 2. @coop: any suggestions on how we could ensure access for Joel to those > machines? Joel's key should be in authorized_keys for cltbld if he has access to those machines.
Flags: needinfo?(coop)
Comment 22•8 years ago
|
||
Trying to summarize the current situation and see what's needed here: - Joel does not have access to those machines (checked authorized_keys for both root and cltbld users) - per #c3: "These should be identical to the existing osx 10.10 machines except we won't need buildbot running on them" - so I haven't do any cleaning here ==> puppet will continue to run - setting up a VNC password does not help by much, as it will also require an SSH password to connect (see attachement). The SSH passwords will be reset when running puppet.
Comment 23•8 years ago
|
||
Comment 24•7 years ago
|
||
Hi Alin, For the 20 machines, can my public key also be added to an admin user's authorized_keys? Also, can you confirm the hostnames? I'll raise a separate bug about getting VPN access granted to me, once I can confirm the hostnames / subnet they reside on. Many thanks! Pete
Comment 25•7 years ago
|
||
buildduty will grant you the VPN access and you can get the existing password from wcosta. Keys don't get added to loaners, but you can do that yourself.
Comment 26•7 years ago
|
||
(In reply to Amy Rich [:arr] [:arich] from comment #25) > buildduty will grant you the VPN access and you can get the existing > password from wcosta. Keys don't get added to loaners, but you can do that > yourself. Ah, dang, read this too late! Thanks Amy.
Comment 27•7 years ago
|
||
I'm reclaiming t-yosemite-r7-005[0-9] to help with the current test backlog.
Comment 28•7 years ago
|
||
Attachment #8836173 -
Flags: review?(kmoir)
Comment 29•7 years ago
|
||
https://hg.mozilla.org/build/puppet/rev/68d21ba8ceaae00cc4d4a35d120e49236398add1 Bug 1298437 - reclaim 10 yosemite machines - r=kmoir
Comment 30•7 years ago
|
||
Comment on attachment 8836173 [details] [diff] [review] [puppet] Reclaim 10 yosemite machines I just grabbed a quick review here from mtabara so I can start reimaging the machines.
Attachment #8836173 -
Flags: review?(kmoir) → review+
Comment 31•7 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #30) > Comment on attachment 8836173 [details] [diff] [review] > [puppet] Reclaim 10 yosemite machines > > I just grabbed a quick review here from mtabara so I can start reimaging the > machines. These hosts are starting to pick up test jobs now.
Updated•7 years ago
|
Blocks: t-yosemite-r7-0045
Updated•7 years ago
|
Summary: create a pool of 20 machines for OSX tests to be run as tier-2 in taskcluster → create a pool of 10 machines for OSX tests to be run as tier-2 in taskcluster
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•