1140471 - Many Android 4.3 Debug reftests jobs run for too long

Assignee

Description

•

10 years ago

+++ This bug was initially created as a clone of Bug #1140459 +++ In https://treeherder.mozilla.org/#/jobs?repo=try&revision=865a03c2014e, "Android 2.3 Opt" tests are actually running on the Android 4.3 emulator against the 4.0 Debug build. There are some crashes, but most jsreftest, crashtest, and reftest jobs show that tests are running and passing before running out of time: http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/gbrown@mozilla.com-865a03c2014e/try-android-api-9/try_ubuntu64_vm_large_test-jsreftest-1-bm120-tests1-linux64-build51.txt.gz 18:30:07 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=ecma/String/15.5.3.1-1.js | application ran for longer than allowed maximum time 18:30:07 INFO - INFO | automation.py | Application ran for: 1:05:35.454386 I suppose this is not surprising: reftests are slow on android, slower on the emulator, and slower on debug builds. At a glance, it looks like we would need to run jsreftests in at least 14 chunks and plain reftests in at least 35 chunks.

Geoff Brown [:gbrown]

Assignee

Comment 1

•

10 years ago

Bug 1125998 may provide some useful lessons / strategies.

Geoff Brown [:gbrown]

Assignee

Updated

•

10 years ago

Assignee: gbrown → nobody

Geoff Brown [:gbrown]

Assignee

Comment 2

•

10 years ago

This is shaping up similar to our mochitest and xpcshell test experience for Android 4.3 Debug: we need to run faster! https://treeherder.mozilla.org/#/jobs?repo=try&revision=cec8e0ee2fe1&exclusion_profile=false has increased timeouts and shows most reftests running green. The remaining failures are timing related: J2 has a test that times out after 930 seconds (!) and the webgl-color-test failures in R6 are also caused by timeouts. Most jobs run in between 2 and 3 hours. Without increased timeouts, nearly all reftests jobs fail on 4.3 Debug. I think we are already running on c3.xlarge. Can we try running on a faster instance type?

Geoff Brown [:gbrown]

Assignee

Comment 3

•

10 years ago

Kim -- The end of 4.3 Debug greening is in our sights! All 4.3 Debug reftests (including js-reftests and crashtests) run green if we can eliminate both job and individual tests timeouts; if we can get them to run faster. But these jobs are already running on c3.xlarge. Is a faster instance type an option? Let me know how you want to proceed.

Flags: needinfo?(kmoir)

Kim Moir [:kmoir] ET

Comment 4

•

10 years ago

Have you done any testing on an instance type to determine which would be suitable? If not, it would make sense for me to loan you an instance type to determine if the tests run successfully before deploying it to production? How about a c3.2xlarge or m3.2xlarge? Model vCPU Mem (GiB) SSD Storage (GB) c3.xlarge 4 7.5 2 x 40 c3.2xlarge 8 15 2 x 80 m3.xlarge 4 15 2 x 40 m3.2xlarge 8 30 2 x 80

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 5

•

10 years ago

I have not tested on any other instance type. A loan sounds good; maybe start with c3.2xlarge. I'm a bit busy for the next couple of days - no rush to activate that.

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Updated

•

10 years ago

Assignee: nobody → gbrown

Kim Moir [:kmoir] ET

Updated

•

10 years ago

Depends on: 1172749

Kim Moir [:kmoir] ET

Updated

•

10 years ago

Flags: needinfo?(kmoir)

Geoff Brown [:gbrown]

Assignee

Comment 6

•

10 years ago

I ran a set of reftests (plain reftests in 16 chunks, 2 crashtest chunks, 6 jsreftest chunks) on the loaner from bug 1172749. All timed out after 75 minutes; jobs were between 55% and 75% complete. According to 'top', the emulator was using about 10% of memory and between 300% and 350% cpu; it looks like 1 cpu is usually 100% and the remainder 20-30%.

Kim Moir [:kmoir] ET

Comment 7

•

10 years ago

gbrown: Do you think it's worth running tests on a an even more powerful instance type in terms of cpu?

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 8

•

10 years ago

snorp -- Debug reftests, in the emulator, run at about the same super-slow rate on c3.2xlarge as on c3.xlarge. Got any thoughts on what's holding us back or how to proceed?

Flags: needinfo?(snorp)

Geoff Brown [:gbrown]

Assignee

Comment 9

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=30ec0a144382&exclusion_profile=false skips the Debug reftest assertion count checks -- no significant difference to 4.3 Debug run-time. https://treeherder.mozilla.org/#/jobs?repo=try&revision=8bb28a35f963&exclusion_profile=false suppresses output in httpd.js (output volume is greater in Debug builds, and we print and buffer all of it before throwing it away, unseen) -- no significant difference to 4.3 Debug run-time.

Flags: needinfo?(gbrown)

Kim Moir [:kmoir] ET

Comment 10

•

10 years ago

gbrown: re our discussion this morning do you want to try running the reftests on linux ix hardware since the AWS hardware is not working out? If so, we can arrange a loaner

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 11

•

10 years ago

Yes, let's try an ix loaner - thanks.

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Assignee

Comment 12

•

10 years ago

(In reply to Kim Moir [:kmoir] from comment #7) > gbrown: Do you think it's worth running tests on an even more powerful instance type in terms of cpu? Kim and I talked about this earlier, but for the record, I am pessimistic about other aws instance types because in going from c3.xlarge to c3.2xlarge, we doubled both cpu and memory and hardly improved reftest run-times at all. Also, observation of 'top' on the c3.2xlarge loaner suggested there was lots of unused memory and cpu. I made some gross harness-level timing measurements and found that 4.3 Debug page load, draw, and test times were all significantly larger than corresponding Opt times. For example, for R1: Opt Debug Total real time: 3085 8605 Draw time: 1445 2077 Page load time: 768 3342 I am a little surprised by the debug page load time (time from LoadURI to onDocumentLoad); why would that be so much more on Debug? Optimizations? Assertions? It may also be worth noting that desktop reftests run 2x to 3x slower on debug vs. opt -- just like we are seeing on Android.

Kim Moir [:kmoir] ET

Updated

•

10 years ago

Depends on: 1177532

Geoff Brown [:gbrown]

Assignee

Comment 13

•

10 years ago

I tried running tests on the ix hardware, but results were disappointing: Comparable to c3.xlarge. 'top' indicated the emulator was using about 15% memory and 130% cpu (as usual, 1 core at 100%, light usage on all others).

Geoff Brown [:gbrown]

Assignee

Comment 14

•

10 years ago

We discussed this in today's mobile testing meeting. We think there is value in running Debug reftests on Android and we think the run-times are not surprising in context (we expect debug to run 2x to 3x slower, Android tests are slow, the emulator is slow and limits our ability to use multi-core cpu effectively). Let's try running these with as many chunks as necessary.

Flags: needinfo?(snorp)

Kim Moir [:kmoir] ET

Comment 15

•

10 years ago

AWS has a new offering where you can run tests on real devices. https://aws.amazon.com/blogs/aws/aws-device-farm-test-mobile-apps-on-real-devices/ Not sure if this would be useful for us, they seem to use different test harnesses to invoke tests.

Geoff Brown [:gbrown]

Assignee

Updated

•

10 years ago

Depends on: 1183877

Geoff Brown [:gbrown]

Assignee

Comment 16

•

9 years ago

We (mostly kmoir) are still working on getting these scheduled, using as many chunks as necessary -- bug 1183877. Here is a recent try run showing that all reftests are passing; note that job chunks are spoofed in the second and third push to run the extra chunks: https://treeherder.mozilla.org/#/jobs?repo=try&revision=57072b9e02ab&exclusion_profile=false https://treeherder.mozilla.org/#/jobs?repo=try&revision=5487828f5665&exclusion_profile=false https://treeherder.mozilla.org/#/jobs?repo=try&revision=de34c2ad6447&exclusion_profile=false

Geoff Brown [:gbrown]

Assignee

Updated

•

9 years ago

Comment 17

•

9 years ago

All Android 4.3 Debug reftests, js-reftests, and crashtests are now running (in lots of chunks). Thanks kmoir!

Geoff Brown [:gbrown]

Assignee

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Bugzilla

Many Android 4.3 Debug reftests jobs run for too long

Categories

(Testing :: General, defect)

Tracking

(Not tracked)

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Comment 15

Updated

Comment 16

Updated

Comment 17

Updated