Closed Bug 874291 Opened 7 years ago Closed 4 years ago

Fix the WebGL tests on Android test slaves

Categories

(Core :: Canvas: WebGL, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: bjacob, Unassigned)

References

Details

(Whiteboard: [leave open] webgl-internal)

Spun off from bug 873725.

See bug 873725 comment 19:

(In reply to Joel Maher (:jmaher) from comment #19)
> We focus so much energy on conserving resources, but we have tests that have
> never passed in 6+ months of running on the pandas, and have been hidden on
> the tegras as well.  If they are hidden by default, I would like somebody
> besides to sheriffs to speak up and tell me they are watching those hidden
> tests and on what branches.  
> 
> If nobody has watched them, then I don't see why turning these off is a big
> deal.  We are wasting resources that could be better used to speed up
> turnaround time, run more reftests, xpcshell tests, robocop tests, or better
> yet debug builds.
> 
> We split webgl tests out of mochitest chunk 1 because of the high failure
> rate and with the hope that we would give 100% of available resources to
> just the webgl test suite.  Even without all the other tests loading before
> it, we still run into problems.  If we need to wait a week or so for person
> 'X' to finish a few bugs and take a look at it fine.  If nobody is going to
> look at this for a few months, then I don't see how it benefits us to run
> thousands of jobs nobody will look at.

Really it's just incredible that we've had so little effective WebGL test coverage on Android (no coverage on Pandas and hidden runs on Tegras).

We must fix this fast, otherwise:
 - in the very short term, there is concrete discussion of disabling WebGL tests on Android, in bug 873725
 - medium-term, having WebGL tests disabled _will_ mean regressions and will make it hard to keep WebGL enabled by default on Android.
Description of the work involved here:

An engineer with WebGL knowledge and Android debugging experience needs to get physical devices (A Panda board and a Tegra board), run mochitest-gl on it, and figure what actually is going wrong there. This is probably going to be at least 2 weeks of work.
Forward duping bug 872468 here, since there's more info in this bug.

To clarify the current situation:

webgl passes around 70% of the time on Tegras (Android 2.2), but 0% of the time on the Pandaboards (Android 4.0). Both Tegra and Pandaboard jobs are hidden on TBPL for all trees - append &showall=1&jobname=gl to view them.
Blocks: 663657, 872477
Duplicate of this bug: 872468
Okay, made the 4.0/Panda tests green enough to be visible and give us some coverage in https://hg.mozilla.org/integration/mozilla-inbound/rev/6595cb04ce77, at the cost of permaoranging the hapless and hopeless 2.2/Tegra tests, which I tried multiple times to disable enough subtests to make stable, and only learned that it's not going to be possible.

Some notes for the hypothetical future person who will hypothetically actually fix things up correctly:

If in your future world we still run tests on Tegras, your first step will be to split skipped_tests_android.txt and failing_tests_android.txt in two, because you'll need skipped_tests_android_panda.txt and skipped_tests_android_tegra.txt. (Well, your zeroth step will probably be to get the tests turned back on on Try on Tegras, since they'll probably be turned off before you exist.)

If you don't do the right thing and completely rewrite the harness so that memory gets freed up between subtests, by opening each test in a separate new window or whatever it's going to take to do that, and instead try to disable enough tests on Tegras to stabilize things, be very wary of things that look like random infra failures - it's possible to disable the right set of subtests and get 90 green runs and 10 instances of the "oh, that's just infra" bug 807230, but if you look they've all hit on the exact same subtest because you've disabled the right things to make  one remaining one OOM and it shows up as a timeout in the command activity and a failure to be able to pull the log anymore. Disable that one, and you'll be back to timeouts in other ones. Do that long enough, and you'll start to see the wisdom of rewriting the whole harness instead.
Whiteboard: [leave open]
Blocks: 875633
Depends on: 877048
Whiteboard: [leave open] → [leave open] webgl-internal
In recent months, mochitest-gl has run fairly consistently green on Android 4.0 Opt (Pandas).

We are still not running mochitest-gl on Android 2.2 (Tegras). The Tegras are being retired, at least for functional tests, in 2014. In place of the current Tegra tests, we intend to run our functional tests in emulators running Android 2.3. At this time, mochitest-gl consistently fails on the Android 2.3 emulators -- bug 975487.

On Cedar only, we run mochitest-gl on Android 4.0 Debug (Pandas). Those currently fail -- bug 977679.
Depends on: 984229
No longer depends on: 1035379
Both referenced bugs --  bug 975487 and bug 977679 -- and the dependencies are resolved now. Is there further work to do here?
Other than bug 777574 hitting mochitest-gl so often on 2.3 that I'm thinking hiding the suite by default, things are fine I guess.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.