Closed Bug 1531441 Opened 6 years ago Closed 5 years ago

Investigate and fix raptor-unity-webgl-geckoview intermittent failure

Categories

(Testing :: Raptor, defect, P2)

Version 3
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rwood, Assigned: rwood)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

Benchmark raptor-unity-webgl-geckoview was disabled everywhere (except on try) in Bug 1524495 because it went permafail. Investigate why this benchmark is now failing, fix it and re-enable it.

This was never permafail, but an intermittent. It only went permafail with the first attempt to disable it in Bug 1524495.

Trying to reproduce the intermittent locally on my GP2 and here on try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=74ffce415c8196d2c0fd7a201942532c74386466

Summary: Investigate and fix raptor-unity-webgl-geckoview permafail → Investigate and fix raptor-unity-webgl-geckoview intermittent failure

Aha, found the problem - the device is running out of memory while running the benchmark (from logcat):

02-28 12:14:03.040 6165 6181 I Web Content: Invoking error handler due to
02-28 12:14:03.040 6165 6181 I Web Content: uncaught exception: out of memory
02-28 12:14:03.040 6165 6181 I GeckoDump: Invoking error handler due to
02-28 12:14:03.040 6165 6181 I GeckoDump: uncaught exception: out of memory

:bc, :gbrown - any ideas what we can do here? Devices intermittently running out of memory. Guessing not much we can really do about this?

In the meantime, I suggest we re-enable this test but make it tier 3 so that it won't cause grief for the sheriffs.

Flags: needinfo?(gbrown)
Flags: needinfo?(bob)

I looked into this before. There are some things the mobile team could possibly do for gc and other memory management things but I'm not sure how that would fly. The other possibility is to stop running this old asm.js based test and migrate to the more modern wasm based test that might be better behaved with regard to memory usage. The mobile team might have an idea if using the wasm based test would help.

The g5s are the most memory constrained and have the higher failure rate but the p2s see it as well just not as often. Have you looked into relative failure rate for g5 vs p2? g5 fails much much more often.
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-02-01&endday=2019-02-28&tree=all&bug=1524495

If we are going to make it tier 3, then it will become broken before too long and unless someone from the perf/perftest teams sheriff it, no one will notice. In that case, I don't really see the point in running it at all.

If we can't fix the g5 failures via the ugl update, then I recommend just turning them off on g5.

I don't believe the failure rate for p2 is sufficiently high to justify downgrading it to tier 3 or turning it off there.

Flags: needinfo?(bob)

(In reply to Bob Clary [:bc:] from comment #3)

Thanks :bc. Ok, let's go ahead with the unity webgl source upgrade (Bug 1506865).

I don't believe the failure rate for p2 is sufficiently high to justify downgrading it to tier 3 or turning it off there.

I agree, but the sheriffs decided it should be disabled in Bug 1524495. I do think the test should be re-enabled while we do the benchmark source upgrade, so I'd suggest tier 3 until we get the upgrade done, then if that goes well promote it back to tier 2 (?)

Depends on: 1506865
Flags: needinfo?(dave.hunt)
Flags: needinfo?(bob)

I think they just got tired of all of the g5 failures and weren't aware of the distinction with the p2s nor the ability to restrict the disabling to just the g5s. But if everyone is happy with tier 3 for both until ugl can be upgraded, that is fine with me. You should sheriff ugl while you are waiting for the ugl update though. Otherwise you might run into the situation I did when I wasn't watching closely enough and it totally broke and no one noticed.

Flags: needinfo?(bob)
Flags: needinfo?(gbrown)

(In reply to Bob Clary [:bc:] from comment #5)

I think they just got tired of all of the g5 failures and weren't aware of the distinction with the p2s nor the ability to restrict the disabling to just the g5s. But if everyone is happy with tier 3 for both until ugl can be upgraded, that is fine with me. You should sheriff ugl while you are waiting for the ugl update though. Otherwise you might run into the situation I did when I wasn't watching closely enough and it totally broke and no one noticed.

I'll submit a patch to re-enable the current raptor-unity-webgl-geckoview as tier 3 pending the upgrade; I'll try to keep my eye on it to make sure it's not broken.

Keywords: leave-open
Flags: needinfo?(dave.hunt)
Pushed by rwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dcfebc36f75c Re-enable Raptor android ugl temporarily as tier 3, pending benchmark upgrade; r=davehunt
Pushed by rwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/17423f4f1520 Revert disabling of raptor-unity-webgl-geckoview; r=bc

The raptor-unity-webgl-geckoview is now re-enabled with existing benchmark source but as tier 3, due to the intermittent device OOM error. Leaving this bug open for that intermittent; which hopefully will be resolved when we upgrade the unity-webgl benchmark source in Bug 1506865.

Priority: P1 → P2

Can this be closed?

Flags: needinfo?(rwood)

Yes that's fine as we have Bug 1506865 filed and in progress to do the benchmark upgrade, and the existing test is tier 3 so there won't be any intermittents flagged.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(rwood)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: