Closed Bug 902694 Opened 11 years ago Closed 11 years ago

Need 4 AWS VMs for B2G emulator memory tests

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jgriffin, Assigned: glob)

References

Details

Per jlebar's request, Rob Wood is going to set up some B2G memory tests that will be run on B2G emulators and report to areweslimyet.com (AWSY).  To run these, we'd like a set of 4 AWS VMs with the current Ubuntu64 test slave configuration, but not inside the build VPN.

One of these VMs, which will host a Jenkins master, should be open to the internet on port 8080; the others don't need any public open ports, but should be able to accept ssh connections from the master.

Is this something you guys in releng can set up for us?  It's something like the VMs you used to provide us for B2G unit testing, before all such testing was moved to buildbot/TBPL.
Please contact dougt or jst if you need approvals here.
Component: Release Engineering: Automation (General) → Release Engineering: Loan Requests
QA Contact: catlee → coop
Ok, I may have moved this into the Loaners component prematurely..

I read this initially as "We want to get 4 AWS Ubuntu VMs like you use for current tests"

On re-reading this it sounds like you want 4 near-permanent machines to run a set of tests over jenkins that Mozilla Releng will not maintain and report elsewhere than tbpl/etc.

If you do only want loaners I can get them to you pretty fast (will have an outstanding Q or two about the ports open to the internet though) if its actually my new understanding and permanent I'll need to have a chat within releng to find out what we can recommend as a proper path forward here.

sending needinfo to all names on this bug so far, to expedite an answer (in an attempt to not block you guys for longer than necessary)
Flags: needinfo?(justin.lebar+bug)
Flags: needinfo?(jst)
Flags: needinfo?(jgriffin)
Flags: needinfo?(doug.turner)
> On re-reading this it sounds like you want 4 near-permanent machines to run a set of tests over 
> jenkins that Mozilla Releng will not maintain and report elsewhere than tbpl/etc.

That's what I understand jgriffin is asking for.
Flags: needinfo?(justin.lebar+bug)
Flags: needinfo?(jst)
Flags: needinfo?(doug.turner)
(In reply to Justin Lebar [:jlebar] (limited availability 8/9 – 8/12) from comment #3)
> > On re-reading this it sounds like you want 4 near-permanent machines to run a set of tests over 
> > jenkins that Mozilla Releng will not maintain and report elsewhere than tbpl/etc.
> 
> That's what I understand jgriffin is asking for.

Yes, that's what we'd like; specifically, we are not asking for temporary loaners.
Flags: needinfo?(jgriffin)
Product: mozilla.org → Release Engineering
Component: Loan Requests → General Automation
QA Contact: coop → catlee
The reason this is in the releng request queue is because we are going to run these memory tests per-checkin on b2g emulators. Because of that load, because we want to run on exactly the same VM configuration that the emulator tests currently run on (and for these machines to remain current as those machines are updated), and because of the various restrictions we always hit moving b2g builds around, we would ideally want these to live in the releng VPC since it is already configured to handle these issues.

And that's why this request is here.
Sorry for the spam, per-checkin meaning - they would run constantly, but they would likely run for long periods of time.  So the load would be high, but they would *not* be able to run every single checkin coming through the scheduler, hence the reason for their own scheduler.
Catlee, is this something you can help out with?  I know you've managed some of our AWS VM's in the past.
Flags: needinfo?(catlee)
So it sounds like the ask is for 4 VMs inside RelEng's VPC, and to have one of these instances be accessibly publicly so it can act as a jenkins server?

I don't see a problem setting up the test workers inside the VPC, but setting up a publicly accessible jenkins server worries me. Is there another place we can put this?
Flags: needinfo?(catlee)
We could hook it up to the existing Jenkins master at http://ec2-107-20-108-245.compute-1.amazonaws.com/jenkins/.
per discussion with jgriffin in b2g meeting just now:

1) it would be helpful if RelEng were able to create these 4 instances, like typical loaner test slaves, allow the new instances to connect to puppet for the initial setup, then scrub production keys, power off, move outside of the RelEng VPC, and bring back up. Lets try this and see if it works.
1a) These machines would no longer be able to connect to RelEng  puppetmaster, so over time would become out of sync with current  production test machines. However, jgriffin is ok with this for now. If  these are *so* out of sync that it becomes a problem, we can delete and  recreate again.
1b) If that isnt possible, worst case, jgriffin could create these 4 instances as just "plain" instances, in which case jgriffin will move this bug to Testing:General and then create the 4 instances in the ateam AWS account. However, RelEng's initial setup in (1) would be preferred.

2) For running, these 4 VMs do not need to run in the RelEng VPC. They will be downloading builds from public ftp.m.o, cloning from public hg.m.o, and need to communicate with  http://ec2-107-20-108-245.compute-1.amazonaws.com/jenkins/, which means they need to be accessible via ssh *from* that machine.

Because of all this, moving this bug back into buildduty component.
Component: General Automation → Buildduty
QA Contact: catlee → armenzg
It's not possible to move instances in/out of VPCs.
All right, then it looks like we'll need to create 4 generic Ubuntu VM's unrelated to the test slaves.  I can set up a VM locally on my account and then have it cloned in either the rel-eng or a-team AWS accounts.
joduinn and I spoke about this yesterday, and I think I understand the requirements better.

If we can get you an AMI for one of our test machine instances, would that be useful? Then you can create whatever instances you need in your own account.
Sure, that would be fine.  Thanks!
Ok, here's what I've done:

- Stopped an existing instance (tst-linux64-ec2-302) to create an AMI from it
- Create a new instance from that AMI, boot it up
- Cleanup:
  - remove /builds/slave
  - remove /var/lib/puppet/
  - remove ~cltbld/.vnc/passwd
  - set up ssh key management (/etc/rc.local and /usr/local/bin/get-credentials.sh from https://forums.aws.amazon.com/message.jspa?messageID=76866#)
  - truncate /root/.ssh/authorized_keys

- Created an AMI from this scrubbed instance. The AMI id is ami-24188414. I need your account # to share it with you.
Assignee: nobody → catlee
Flags: needinfo?(jgriffin)
glob, can you e-mail catlee our AMS account id, so we can clone the AMI he's set up?
Flags: needinfo?(jgriffin) → needinfo?(glob)
emailed.
Flags: needinfo?(glob)
Found in triage. catlee, now that you're back from PTO, whats status here?
Flags: needinfo?(catlee)
my bad -- catlee provided me with the details but i haven't been able to see the ami, and i failed to ask catlee for assistance.  i'll work with him over email to avoid bugspam.
Flags: needinfo?(catlee)
(In reply to Byron Jones ‹:glob› from comment #19)
> my bad -- catlee provided me with the details but i haven't been able to see
> the ami, and i failed to ask catlee for assistance.  i'll work with him over
> email to avoid bugspam.

Thanks Glob, let rwood know as soon as you have the machines up so he can start testing with them.
glob confirmed on irc last week he is able to see the AMI. If you have any problems with it, let me know!
Assignee: catlee → glob
:glob ping, just wondering if you have any updates on the VMs? Thanks :)
(In reply to Rob Wood [:rwood] from comment #22)
> :glob ping, just wondering if you have any updates on the VMs? Thanks :)

worked a bit on it today, hit an issue with ssh keys.  i'll work through this with catlee via email/irc.
created:

ec2-54-200-198-57.us-west-2.compute.amazonaws.com
ec2-54-200-198-58.us-west-2.compute.amazonaws.com
ec2-54-200-193-154.us-west-2.compute.amazonaws.com
ec2-54-200-194-98.us-west-2.compute.amazonaws.com

rwood has the connections details has is able to connect.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.