Closed Bug 1549776 Opened 4 months ago Closed 4 months ago

Get wrench reftests running on Android in CI

Categories

(Core :: Graphics: WebRender, task, P2)

task

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: kats, Assigned: kats)

References

(Depends on 2 open bugs)

Details

Attachments

(7 files)

In order to get better testing coverage of WebRender on Android I'd like to run the wrench reftests in CI. This is separate from the gecko reftests (which is tracked by bug 1525314). Ideally we want to run both on an emulator and real devices, although the emulator might be easier to start with.

I was looking at the taskcluster transforms and mozharness scripts that are used to run other test suites on Android, and I have some idea of how things are structured, but it's not totally clear to me what the best approach is for adding wrench reftests.

What we need to is basically take an APK (which is produced as an artifact from a different task), install it on the device/emulator, push the reftests (which are in-tree, and can easily be exported as an artifact as well) to the SD card, and then launch the APK. It will load and run the reftests from the sdcard and emit results to logcat.

I'd like to use some of the mozharness machinery (e.g. the code to download and set up the AVD, and control the device via adb). However using the test transform doesn't seem as appropriate to me because it seems to make some assumptions about build platforms and test platforms and such that don't really apply in this case, because all the standalone webrender/wrench stuff happens separate from the normal gecko builds.

Really what I think I want is to just use mozharness but without mozharness-test, and then teach it about the wrench APK which would treat mostly the same as the geckoview test APK. But I'm open to suggestions on the best way to go about doing this. gbrown/ahal, thoughts?

Flags: needinfo?(gbrown)
Flags: needinfo?(ahal)

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0)

Really what I think I want is to just use mozharness but without mozharness-test

I should mention that I started doing this but ran into problems. Specifically: just using mozharness still checks out the gecko tree, which means that this line tries to use ./mach artifact toolchain instead of tooltool.py, and for some reason that fails with this error.

A few quick thoughts:

Flags: needinfo?(gbrown)

(In reply to Geoff Brown [:gbrown] from comment #2)

A few quick thoughts:

I could go with either option, as long as the rest of the steps work. Currently checking out gecko causes the tooltool problem, so maybe not checking it out would be better. But I think checkout: works for run-task type tasks, not mozharness tasks. Once I have using: mozharness the checkout happens unconditionally because it hits the codepath here. I could maybe modify that code to make it conditional, but I wanted to figure out the high-level approach first.

  • most of the mozharness support for android is in AndroidMixin now, which is used by android_emulator_unittest.py, raptor.py, web_platform_tests.py, etc -- you're not necessarily locked in to using android_emulator_unittest, if that helps

Yeah, one thing I was considering was writing a new android_emulator_wrench.py which would use this mixin and just do the things I need it to do. But a chunk of it would be copied from android_emulator_unittest.py so that's why I wanted to try and use that existing mechanism first.

Yeah that seems like it might get messy. So that seems to be an argument in favour of diverging a little bit and making a android_emulator_wrench.py which I can do more customized things with.

  • down the road, I think you are going to run into trouble with an architecture conflict: it looks like the android wrench build builds an arm apk, while the mozharness androidx86_7_0 runs an x86_64 avd

The APK actually contains both ARM and x86 libraries and I've verified locally that I can run it on an x86_64 AVD. So I'm hoping I have this problem solved already.

How do you get the tests on the worker? If you don't use a gecko checkout, you might still rely on the build task to package them for you.

From a higher level, maybe you could make a source-test task that runs the mozharness script in its command. That way you could avoid the mozharness-test kind.

Flags: needinfo?(ahal)

(In reply to Andrew Halberstadt [:ahal] from comment #4)

How do you get the tests on the worker? If you don't use a gecko checkout, you might still rely on the build task to package them for you.

I was initially planning to use the gecko checkout but since having the checkout causes the mozharness/tooltool problem I might go the other route and have the build task generate an artifact. But as long as I can get it to work I don't have any strong arguments in favour of one vs the other.

From a higher level, maybe you could make a source-test task that runs the mozharness script in its command. That way you could avoid the mozharness-test kind.

Hm, interesting. Will take a look at this. From reading the description of the source-test kind it doesn't seem super appropriate but maybe the implementation will be easier to work with for my purposes.

Note there are currently source-test tasks that have a build dependency (the python-mochitest/python-reftest ones). You can also use fetches to download an artifact if you don't want to use a dependency.

The easiest path is to just use mozharness-test and either try to hack around the build dependency or just run the builds even though you aren't using them.

So after many try pushes and much fiddling I finally have a working run on taskcluster: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=3b9fb5ae875309062587aa7c8486b417e193c067

The high-level structure I have is that I'm using run-task as my task configuration transform instead of mozharness or mozharness-test because the latter assume a bunch of gecko-specific test flags during the task configuration. In my run-task command I set a few env vars and then invoke test-linux.sh directly, which does some needed setup and then invokes my mozharness script. The mozharness script is similar to android_emulator_unittests.py but stripped down muchly so it just has the things I need. In particular it doesn't use the "actions" machinery but just invokes the desired steps (e.g. setup_avds, install_apk) manually.

I still need to hook up the logging a bit better so that the TEST-* output in logcat gets mirrored to the main log.

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=1c1868fe0b075b4c1168bac0e418c51ed2c75194 is the latest try push. It has all the stuff I want to land, there's just one problem: the jobs are showing as green when they shouldn't be, because there are reftest failures. If you open the reftest analyzer you can see the failures, it's just that whatever component is responsible for parsing job logs and determining pass/fail status and generating the error summary isn't recognizing the UNEXPECTED-FAIL lines in the log. TreeHerder itself seems to be doing it, because if you open the logviewer on the job it highlights the UNEXPECTED-FAIL lines properly.

I'm confused though as to why the output gets parsed fine with the desktop wrench jobs. Those aren't going through structured logging or the mozharness output parser either. They are just running a run-task job completely outside of mozharness and spewing to stdout, and TreeHerder correctly picks up the summary: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=246347549&revision=01feb26a1a4726cba1f54b1f7421e694e3e87ab5

I wonder if the return code from the process triggers parsing. I can experiment with that tomorrow.

Yeah it turned out to be the return code. If I exit the process with a nonzero return code it parses as I would expect.

That being said, I guess it would be good to actually use structured logging for all the wrench jobs but at least on the desktop jobs it seems nontrivial to use mozharness to do so. So I'd rather just use unstructured to get the test up and running for now, and then look into making the logs structured without necessarily pulling in all of mozharness.

Typo when I first landed this, but nothing relied on it so it didn't matter.

This makes it so that when running reftests, wrench actually terminates
after a panic rather than just hanging. Termination is detectable and so
we can clean up properly instead of waiting until some other layer hits
a timeout.

Depends on D32009

These tests cause panics in debug mode because of the extra GL error
checking. Tests that are disabled are annotated with the failing
GL call.

Depends on D32011

This adds an android_emulator_wrench.py script that uses mozharness to
control the Android emulator, and run the wrench reftests. It has an
associated wrench.py config script which is similar to existing android
config scripts.

The android_emulator_wrench script is structured a little differently
from other android mozharness scripts, mostly for two reasons:

  1. I tried hard to make it locally runnable by developers, using
    ./mach python. This allows develpers to more easily reproduce the
    setup that runs in automation, and does so without duplicating a lot
    of code.

  2. I also tried to make the script use fewer of what I consider to be
    "opaque" mozharness features, like the actions list which can run
    hard-to-find preflight and postflight functions. Instead of treating
    mozharness like a framework and filling in some functions for it to
    invoke as part of it's grand plan, I treat it more like a library and
    specifically the functions I want in the order that I want, which
    makes it easier for novice developers to debug problems.

As part of writing this script I extracted a few helper functions and made
some minor changes to existing android/adb mozharness machinery, but these
are all simple refactorings and should introduce no functional change.

Depends on D32013

Also docs for running the same thing locally.

Depends on D32014

https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=a64644d7048c385046dd824f7f550d6dc353df98 is the green try push. I also added a couple of regular android jobs to make sure I didn't break anything with the mozharness modifications.

The priority flag is not set for this bug.
:jbonisteel, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jbonisteel)
Blocks: 1525314
Type: defect → task
Priority: -- → P2
See Also: 1525314
Flags: needinfo?(jbonisteel)
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/743a0b35f1cb
Fix path to artifact. r=gw
https://hg.mozilla.org/integration/autoland/rev/8f4f4cc896ab
Ensure debug wrench aborts on panic. r=gw
https://hg.mozilla.org/integration/autoland/rev/a08b3b243d9d
Disable some reftests on Android. r=gw
https://hg.mozilla.org/integration/autoland/rev/91aef90259cc
Disable some reftests on debug Android. r=gw
https://hg.mozilla.org/integration/autoland/rev/f1bf5f2b37a8
Disable more reftests due to failures on Android. r=gw
https://hg.mozilla.org/integration/autoland/rev/83eafb86df0f
Add a script to run wrench reftests on an Android emulator. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/9ab833800af2
Add taskcluster jobs for running wrench on Android. r=jrmuizel
You need to log in before you can comment on or make changes to this bug.