+++ This bug was initially created as a clone of Bug #1140459 +++
In https://treeherder.mozilla.org/#/jobs?repo=try&revision=865a03c2014e, "Android 2.3 Opt" tests are actually running on the Android 4.3 emulator against the 4.0 Debug build. There are some crashes, but most jsreftest, crashtest, and reftest jobs show that tests are running and passing before running out of time:
18:30:07 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=ecma/String/220.127.116.11-1.js | application ran for longer than allowed maximum time
18:30:07 INFO - INFO | automation.py | Application ran for: 1:05:35.454386
I suppose this is not surprising: reftests are slow on android, slower on the emulator, and slower on debug builds.
At a glance, it looks like we would need to run jsreftests in at least 14 chunks and plain reftests in at least 35 chunks.
Bug 1125998 may provide some useful lessons / strategies.
This is shaping up similar to our mochitest and xpcshell test experience for Android 4.3 Debug: we need to run faster!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cec8e0ee2fe1&exclusion_profile=false has increased timeouts and shows most reftests running green. The remaining failures are timing related: J2 has a test that times out after 930 seconds (!) and the webgl-color-test failures in R6 are also caused by timeouts.
Most jobs run in between 2 and 3 hours.
Without increased timeouts, nearly all reftests jobs fail on 4.3 Debug.
I think we are already running on c3.xlarge. Can we try running on a faster instance type?
Kim -- The end of 4.3 Debug greening is in our sights! All 4.3 Debug reftests (including js-reftests and crashtests) run green if we can eliminate both job and individual tests timeouts; if we can get them to run faster. But these jobs are already running on c3.xlarge. Is a faster instance type an option? Let me know how you want to proceed.
Have you done any testing on an instance type to determine which would be suitable? If not, it would make sense for me to loan you an instance type to determine if the tests run successfully before deploying it to production? How about a c3.2xlarge or m3.2xlarge?
Model vCPU Mem (GiB) SSD Storage (GB)
c3.xlarge 4 7.5 2 x 40
c3.2xlarge 8 15 2 x 80
m3.xlarge 4 15 2 x 40
m3.2xlarge 8 30 2 x 80
I have not tested on any other instance type.
A loan sounds good; maybe start with c3.2xlarge. I'm a bit busy for the next couple of days - no rush to activate that.
I ran a set of reftests (plain reftests in 16 chunks, 2 crashtest chunks, 6 jsreftest chunks) on the loaner from bug 1172749. All timed out after 75 minutes; jobs were between 55% and 75% complete.
According to 'top', the emulator was using about 10% of memory and between 300% and 350% cpu; it looks like 1 cpu is usually 100% and the remainder 20-30%.
gbrown: Do you think it's worth running tests on a an even more powerful instance type in terms of cpu?
snorp -- Debug reftests, in the emulator, run at about the same super-slow rate on c3.2xlarge as on c3.xlarge. Got any thoughts on what's holding us back or how to proceed?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=30ec0a144382&exclusion_profile=false skips the Debug reftest assertion count checks -- no significant difference to 4.3 Debug run-time.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8bb28a35f963&exclusion_profile=false suppresses output in httpd.js (output volume is greater in Debug builds, and we print and buffer all of it before throwing it away, unseen) -- no significant difference to 4.3 Debug run-time.
gbrown: re our discussion this morning do you want to try running the reftests on linux ix hardware since the AWS hardware is not working out? If so, we can arrange a loaner
Yes, let's try an ix loaner - thanks.
(In reply to Kim Moir [:kmoir] from comment #7)
> gbrown: Do you think it's worth running tests on an even more powerful instance type in terms of cpu?
Kim and I talked about this earlier, but for the record, I am pessimistic about other aws instance types because in going from c3.xlarge to c3.2xlarge, we doubled both cpu and memory and hardly improved reftest run-times at all. Also, observation of 'top' on the c3.2xlarge loaner suggested there was lots of unused memory and cpu.
I made some gross harness-level timing measurements and found that 4.3 Debug page load, draw, and test times were all significantly larger than corresponding Opt times. For example, for R1:
Total real time: 3085 8605
Draw time: 1445 2077
Page load time: 768 3342
I am a little surprised by the debug page load time (time from LoadURI to onDocumentLoad); why would that be so much more on Debug? Optimizations? Assertions?
It may also be worth noting that desktop reftests run 2x to 3x slower on debug vs. opt -- just like we are seeing on Android.
I tried running tests on the ix hardware, but results were disappointing: Comparable to c3.xlarge.
'top' indicated the emulator was using about 15% memory and 130% cpu (as usual, 1 core at 100%, light usage on all others).
We discussed this in today's mobile testing meeting. We think there is value in running Debug reftests on Android and we think the run-times are not surprising in context (we expect debug to run 2x to 3x slower, Android tests are slow, the emulator is slow and limits our ability to use multi-core cpu effectively).
Let's try running these with as many chunks as necessary.
AWS has a new offering where you can run tests on real devices.
Not sure if this would be useful for us, they seem to use different test harnesses to invoke tests.
We (mostly kmoir) are still working on getting these scheduled, using as many chunks as necessary -- bug 1183877.
Here is a recent try run showing that all reftests are passing; note that job chunks are spoofed in the second and third push to run the extra chunks:
All Android 4.3 Debug reftests, js-reftests, and crashtests are now running (in lots of chunks). Thanks kmoir!