Closed Bug 967913 Opened 10 years ago Closed 10 years ago

Android 2.3 reftests are too slow

Categories

(Testing :: General, defect)

x86
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(2 files)

The first results for running reftests on Android 2.3 emulators show reftests running very slowly.

https://tbpl.mozilla.org/php/getParsedLog.php?id=34039984&tree=Ash#error0

19:53:29     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/image/test/reftest/pngsuite-ancillary/ch2n3p08.png | 149 / 3451 (4%)
19:53:29     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/image/test/reftest/pngsuite-ancillary/ch2n3p08.html | 149 / 3451 (4%)
19:53:29  WARNING -  TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/tests/image/test/reftest/pngsuite-ancillary/ch2n3p08.png | application ran for longer than allowed maximum time
19:53:29     INFO -  INFO | automation.py | Application ran for: 1:02:20.784677

These are running in 3 chunks already, so 4% is really about 1% of the total, running in 1 hour, suggesting we need nearly 100 hours to run all of the (plain) reftests.

jsreftests are similar:

19:47:30     INFO -  REFTEST TEST-END | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=ecma/LexicalConventions/7.5-3-n.js
19:47:30     INFO -  REFTEST TEST-START | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=ecma/LexicalConventions/7.5-4-n.js
19:47:31     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=ecma/LexicalConventions/7.5-4-n.js | 398 / 6714 (5%)
19:47:31  WARNING -  TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=ecma/LexicalConventions/7.5-4-n.js | application ran for longer than allowed maximum time
19:47:31     INFO -  INFO | automation.py | Application ran for: 1:03:37.009765

Crashtests are slow but could be run in perhaps 4 chunks:

9:46:13     INFO -  REFTEST INFO | Loading a blank page
19:46:13     INFO -  REFTEST TEST-END | http://10.0.2.2:8854/tests/js/xpconnect/crashtests/117307-1.html
19:46:13     INFO -  REFTEST TEST-START | http://10.0.2.2:8854/tests/js/xpconnect/crashtests/193710.html
19:46:13     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/js/xpconnect/crashtests/193710.html | 719 / 2608 (27%)
19:46:13  WARNING -  TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/tests/js/xpconnect/crashtests/193710.html | application ran for longer than allowed maximum time
19:46:13     INFO -  INFO | automation.py | Application ran for: 1:03:40.779075


We had a similar issue for Android x86 tests on emulator, which we resolved by using kvm. I assume our acceleration options are more limited for Android 2.3 on ec2.
Assignee: nobody → dminor
I've spent some time investigating this.

Using the Android profiler (am profile) it seems that the we are getting bogged down in the layout code, e.g.
    ViewRoot.handleMessage, ViewRoot.measureTraversals (27.3%)
    View.measure, View.layout
    ViewGroup.measureChildWithMargins
    FrameLayout.onMeasure
    RelativeLayout.onMeasure, RelativeLayout.measureChildHorizontal, RelativeLayout.measureChild
    FrameLayout.onLayout
    and a bunch of other layout stuff

I then spent some time attempting to get an SPS profile without much success.

Running the emulators on ec2 I had difficulties connecting to the remote debugger with either the GeckoProfiler plugin or just telnet. Even once connected I was unable to retrieve a profile.

I tried running the emulator locally. I was able to connect and retrieve a profile there, but fennec would crash consistently after 15 or 20 seconds of the profiler being connected.

The profile I got was not consistent with what I saw under ec2 using the Android profiler, probably because of the short period I profiled or the fact that the emulator runs much better locally than under ec2.

Running the emulator locally is around 8x faster than running it under ec2, so it is likely not CPU bound in the same way that is under ec2.
My test instance was restarted as a c3.large instead of a m1.medium and I reran the reftests.

Sadly, the tests were only 50% faster (6% of chunk 1 instead of 4%) in this configuration, so I don't think different AWS instances are going to gain us much.
I recompiled the emulator with gprof support (-pg). Unfortunately it crashes shortly after startup and well before I can get a useful reftest profile from it.

I've been asked to work on other things, so I'm unassigning myself.
Assignee: dminor → nobody
Depends on: 980519
Depends on: 985542
Depends on: 985650
Depends on: 992969
Bug 991279 has some observations on Android 2.3 performance which might be of interest here (but 991279 is about an xpcshell test, where the httpd server is running on the emulator).
We began running Android 2.3 tests on Ash using ix slaves, on April 20. Run-times are much improved, as anticipated. 

js-reftests run to completion in 6 chunks, each taking 40-45 minutes; this seems about right. 

crashtests run to completion in 5 chunks, each taking 14-22 minutes; we can reduce this to 3 chunks probably.

Initially (April 20), plain reftests ran to completion in 10 chunks, each taking 45-52 minutes. In a new merge today, most chunks take close to 60 minutes, with R5 timing out near the end of the job; I'm hoping that an increase to 12 chunks will make this more reliable.
Assignee: nobody → gbrown
See Comment 6. 

This should reduce the crashtests chunks to 3, and increase the plain reftest chunks to 12.
Attachment #8414842 - Flags: review?(kmoir)
Attachment #8414845 - Flags: review?(kmoir)
Attachment #8414842 - Flags: review?(kmoir) → review+
Attachment #8414845 - Flags: review?(kmoir) → review+
Verified on ash: 
 - crashtests in 3 chunks on ix run fine still: max run time is about 25 minutes
 - plain reftests in 12 chunks on ix now run without any timeouts: max run time is about 55 minutes
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: