Closed Bug 1026800 Opened 10 years ago Closed 10 years ago

loan jgriffin an AWS linux64 test box, and then bump up the instance cpu/ram til tests pass

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: jgriffin)

References

Details

This is partially a loan request, and partially an AWS bump for the instance once it's loaned.

We don't have the capacity to run all of our tests on ix, so let's find what test VM resources are needed for our tests to pass.

Once we do that, let's create a new pool of that type of node.
Buildduty: feel free to kick this bug out of the loan requests queue once the loan's done... maybe General Automation ?  Dunno.
I resolve the loan request by tomorrow am PT.
Bug 1007211 - loan linux64 ec2 slave to jmaher <- this loan was only stopped and never terminated.

Joel, can you try using this again. You are still on the vpn access list for it and I just started it back up again:

full fqdn: tst-linux64-ec2-jmaher.test.releng.use1.mozilla.com

passwords have not changed
Assignee: nobody → jmaher
I am on buildduty. I must be losing it.

jgriffin != jmaher.

/me terminates jmahers old instance that was never reclaimed fully and creates one for jgriffin.
Email sent to jgriffin for further instructions. 

Loaning slaves: 
    - tst-linux64-ec2-jgriffin.test.releng.use1.mozilla.com

Hi j<ateam mozillian>, I am going to assign this to you to keep track of the loan(s). 

When you are finished with the loan(s) forever, please comment stating so and mark this bug as resolved.

By the way, now that this aws instance has been created, starting and stopping it can happen in a flash!
If you are not going to be using this machine for multiple hours, let us know in this bug and we can stop it.
Comment again when you want it started back up.
*For really fast turnaround, ping #releng (look for nick with 'buildduty')
Assignee: jmaher → jgriffin
Blocks: 1027473
Component: Loan Requests → Tools
QA Contact: coop → hwine
Rail, Catlee, if you can either expedite changing the type of AWS node this loaner runs on at jgriffin's request, or document it well enough for others to do so, that would be much appreciated.
(In reply to Aki Sasaki [:aki] from comment #7)
> Rail, Catlee, if you can either expedite changing the type of AWS node this
> loaner runs on at jgriffin's request, or document it well enough for others
> to do so, that would be much appreciated.

Specifically, can I get this loaner rebooted as an m1.large instance?
It's very simple. Go to the AWS console:

https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:instancesFilter=all-instances;instanceTypeFilter=all-instance-types;search=tst-linux64-ec2-jgriffin

* Select the instance
* Select Actions -> Stop
* Wait for it to stop
* Select Actions -> Change Instance Type
* Select the type you want.
* Select Actions -> Start
* ???
* Profit!

I've done this to change it to an m1.large instance. I've also enabled termination protection on this instance to prevent accidental deletion since we're going to be doing lots of manual work in the AWS console for this instance.
jgriffin, please keep bug 966070 in mind when testing new instance types!
Update:  the m1.large instance allows the tests to progress just a little further into startup than the m1.medium, but they still die.  If I launch the emulator manually, I can see gaia come up eventually, but it's excruciatingly slow.

Taras suggested switching to a c3.xlarge instance.  Catlee, can you do this or redirect to someone else to do it?  Thanks!
Flags: needinfo?(catlee)
I will do this now.
Flags: needinfo?(catlee)
done. tst-linux64-ec2-jgriffin is now running and is a c3.xlarge instance
Good news.  The tests run as well on the c3.xlarge instance as they do for me locally.  Based on my experiment with m1.large, I doubt c3.large would be adequate, but I'm willing to try it if you'd like me to.
Flags: needinfo?(catlee)
Worth a shot. The c3 family has a slightly more powerful processor than m3 I believe. I've converted your instance to c3.large.
Flags: needinfo?(catlee)
gaia-ui-tests do run on the c3.large, about 20% slower on average.

For the media mochitests (another chunk we want to run here), I haven't seen the tests timeout in a few runs, but they're intermittent so it's hard to be sure if c3.large is enough to avoid them.

Probably the best thing to do is stand up a new platform based on c3.large so we can play with them on cedar and see how stable they are there, and consider switching to c3.xlarge if intermittent CPU-related problems are frequent.
If you have time, can you look at some of our desktop unittest suites. I'm particularly interested to see how much faster they run.
I ran mochitest-plain chunk 1 on the c3.large, which seemed to go about 20% faster than on the current m1.medium.  The average time for the test harness part (excluding setup and teardown in mozharness) is about 30 minutes on TBPL; it took 24 minutes on the c3.large.
I don't need this slave any longer, so we can terminate; see bug 1031083 for follow-up.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.