Last Comment Bug 1140471 - Many Android 4.3 Debug reftests jobs run for too long
: Many Android 4.3 Debug reftests jobs run for too long
Product: Testing
Classification: Components
Component: General (show other bugs)
: Trunk
: x86_64 Linux
-- normal (vote)
: ---
Assigned To: Geoff Brown [:gbrown]
Depends on: 1172749 1177532 1183877
Blocks: 1140454
  Show dependency treegraph
Reported: 2015-03-06 09:21 PST by Geoff Brown [:gbrown]
Modified: 2015-10-20 07:02 PDT (History)
3 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---


Description User image Geoff Brown [:gbrown] 2015-03-06 09:21:35 PST
+++ This bug was initially created as a clone of Bug #1140459 +++

In, "Android 2.3 Opt" tests are actually running on the Android 4.3 emulator against the 4.0 Debug build. There are some crashes, but most jsreftest, crashtest, and reftest jobs show that tests are running and passing before running out of time:

18:30:07  WARNING -  TEST-UNEXPECTED-FAIL | | application ran for longer than allowed maximum time
18:30:07     INFO -  INFO | | Application ran for: 1:05:35.454386

I suppose this is not surprising: reftests are slow on android, slower on the emulator, and slower on debug builds.

At a glance, it looks like we would need to run jsreftests in at least 14 chunks and plain reftests in at least 35 chunks.
Comment 1 User image Geoff Brown [:gbrown] 2015-03-17 20:01:44 PDT
Bug 1125998 may provide some useful lessons / strategies.
Comment 2 User image Geoff Brown [:gbrown] 2015-05-22 10:07:53 PDT
This is shaping up similar to our mochitest and xpcshell test experience for Android 4.3 Debug: we need to run faster! has increased timeouts and shows most reftests running green. The remaining failures are timing related: J2 has a test that times out after 930 seconds (!) and the webgl-color-test failures in R6 are also caused by timeouts.

Most jobs run in between 2 and 3 hours.

Without increased timeouts, nearly all reftests jobs fail on 4.3 Debug.

I think we are already running on c3.xlarge. Can we try running on a faster instance type?
Comment 3 User image Geoff Brown [:gbrown] 2015-06-08 10:00:40 PDT
Kim -- The end of 4.3 Debug greening is in our sights! All 4.3 Debug reftests (including js-reftests and crashtests) run green if we can eliminate both job and individual tests timeouts; if we can get them to run faster. But these jobs are already running on c3.xlarge. Is a faster instance type an option? Let me know how you want to proceed.
Comment 4 User image Kim Moir [:kmoir] 2015-06-08 10:31:17 PDT
Have you done any testing on an instance type to determine which would be suitable? If not, it would make sense for me to loan you an instance type to determine if the tests run successfully before deploying it to production?  How about a c3.2xlarge or m3.2xlarge?

Model 	      vCPU Mem (GiB) 	SSD Storage  (GB)
c3.xlarge 	4 	7.5 	2 x 40
c3.2xlarge 	8 	15 	2 x 80

m3.xlarge 	4 	15 	2 x 40
m3.2xlarge 	8 	30 	2 x 80
Comment 5 User image Geoff Brown [:gbrown] 2015-06-08 12:32:32 PDT
I have not tested on any other instance type.

A loan sounds good; maybe start with c3.2xlarge. I'm a bit busy for the next couple of days - no rush to activate that.
Comment 6 User image Geoff Brown [:gbrown] 2015-06-21 05:26:39 PDT
I ran a set of reftests (plain reftests in 16 chunks, 2 crashtest chunks, 6 jsreftest chunks) on the loaner from bug 1172749. All timed out after 75 minutes; jobs were between 55% and 75% complete.

According to 'top', the emulator was using about 10% of memory and between 300% and 350% cpu; it looks like 1 cpu is usually 100% and the remainder 20-30%.
Comment 7 User image Kim Moir [:kmoir] 2015-06-23 14:20:53 PDT
gbrown: Do you think it's worth running tests on a an even more powerful instance type in terms of cpu?
Comment 8 User image Geoff Brown [:gbrown] 2015-06-23 23:27:29 PDT
snorp -- Debug reftests, in the emulator, run at about the same super-slow rate on c3.2xlarge as on c3.xlarge. Got any thoughts on what's holding us back or how to proceed?
Comment 9 User image Geoff Brown [:gbrown] 2015-06-25 12:33:58 PDT skips the Debug reftest assertion count checks -- no significant difference to 4.3 Debug run-time. suppresses output in httpd.js (output volume is greater in Debug builds, and we print and buffer all of it before throwing it away, unseen) -- no significant difference to 4.3 Debug run-time.
Comment 10 User image Kim Moir [:kmoir] 2015-06-25 13:18:04 PDT
gbrown: re our discussion this morning do you want to try running the reftests on linux ix hardware since the AWS hardware is not working out? If so, we can arrange a loaner
Comment 11 User image Geoff Brown [:gbrown] 2015-06-25 13:34:39 PDT
Yes, let's try an ix loaner - thanks.
Comment 12 User image Geoff Brown [:gbrown] 2015-06-25 13:58:01 PDT
(In reply to Kim Moir [:kmoir] from comment #7)
> gbrown: Do you think it's worth running tests on an even more powerful instance type in terms of cpu?

Kim and I talked about this earlier, but for the record, I am pessimistic about other aws instance types because in going from c3.xlarge to c3.2xlarge, we doubled both cpu and memory and hardly improved reftest run-times at all. Also, observation of 'top' on the c3.2xlarge loaner suggested there was lots of unused memory and cpu.

I made some gross harness-level timing measurements and found that 4.3 Debug page load, draw, and test times were all significantly larger than corresponding Opt times. For example, for R1:
                     Opt         Debug
Total real time:     3085        8605  
Draw time:           1445        2077
Page load time:       768        3342

I am a little surprised by the debug page load time (time from LoadURI to onDocumentLoad); why would that be so much more on Debug? Optimizations? Assertions?

It may also be worth noting that desktop reftests run 2x to 3x slower on debug vs. opt -- just like we are seeing on Android.
Comment 13 User image Geoff Brown [:gbrown] 2015-06-30 07:03:38 PDT
I tried running tests on the ix hardware, but results were disappointing: Comparable to c3.xlarge.

'top' indicated the emulator was using about 15% memory and 130% cpu (as usual, 1 core at 100%, light usage on all others).
Comment 14 User image Geoff Brown [:gbrown] 2015-07-08 12:28:25 PDT
We discussed this in today's mobile testing meeting. We think there is value in running Debug reftests on Android and we think the run-times are not surprising in context (we expect debug to run 2x to 3x slower, Android tests are slow, the emulator is slow and limits our ability to use multi-core cpu effectively).

Let's try running these with as many chunks as necessary.
Comment 15 User image Kim Moir [:kmoir] 2015-07-14 07:41:05 PDT
AWS has a new offering where you can run tests on real devices.

Not sure if this would be useful for us, they seem to use different test harnesses to invoke tests.
Comment 16 User image Geoff Brown [:gbrown] 2015-09-04 15:11:05 PDT
We (mostly kmoir) are still working on getting these scheduled, using as many chunks as necessary -- bug 1183877.

Here is a recent try run showing that all reftests are passing; note that job chunks are spoofed in the second and third push to run the extra chunks:
Comment 17 User image Geoff Brown [:gbrown] 2015-10-20 07:01:57 PDT
All Android 4.3 Debug reftests, js-reftests, and crashtests are now running (in lots of chunks). Thanks kmoir!

Note You need to log in before you can comment on or make changes to this bug.