Investigate and fix raptor-unity-webgl-geckoview intermittent failure
Categories
(Testing :: Raptor, defect, P2)
Tracking
(Not tracked)
People
(Reporter: rwood, Assigned: rwood)
References
(Depends on 1 open bug)
Details
Attachments
(2 files)
Benchmark raptor-unity-webgl-geckoview was disabled everywhere (except on try) in Bug 1524495 because it went permafail. Investigate why this benchmark is now failing, fix it and re-enable it.
Assignee | ||
Comment 1•6 years ago
|
||
This was never permafail, but an intermittent. It only went permafail with the first attempt to disable it in Bug 1524495.
Trying to reproduce the intermittent locally on my GP2 and here on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=74ffce415c8196d2c0fd7a201942532c74386466
Assignee | ||
Comment 2•6 years ago
|
||
Aha, found the problem - the device is running out of memory while running the benchmark (from logcat):
02-28 12:14:03.040 6165 6181 I Web Content: Invoking error handler due to
02-28 12:14:03.040 6165 6181 I Web Content: uncaught exception: out of memory
02-28 12:14:03.040 6165 6181 I GeckoDump: Invoking error handler due to
02-28 12:14:03.040 6165 6181 I GeckoDump: uncaught exception: out of memory
:bc, :gbrown - any ideas what we can do here? Devices intermittently running out of memory. Guessing not much we can really do about this?
In the meantime, I suggest we re-enable this test but make it tier 3 so that it won't cause grief for the sheriffs.
Comment 3•6 years ago
|
||
I looked into this before. There are some things the mobile team could possibly do for gc and other memory management things but I'm not sure how that would fly. The other possibility is to stop running this old asm.js based test and migrate to the more modern wasm based test that might be better behaved with regard to memory usage. The mobile team might have an idea if using the wasm based test would help.
The g5s are the most memory constrained and have the higher failure rate but the p2s see it as well just not as often. Have you looked into relative failure rate for g5 vs p2? g5 fails much much more often.
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-02-01&endday=2019-02-28&tree=all&bug=1524495
If we are going to make it tier 3, then it will become broken before too long and unless someone from the perf/perftest teams sheriff it, no one will notice. In that case, I don't really see the point in running it at all.
If we can't fix the g5 failures via the ugl update, then I recommend just turning them off on g5.
I don't believe the failure rate for p2 is sufficiently high to justify downgrading it to tier 3 or turning it off there.
Assignee | ||
Comment 4•6 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #3)
Thanks :bc. Ok, let's go ahead with the unity webgl source upgrade (Bug 1506865).
I don't believe the failure rate for p2 is sufficiently high to justify downgrading it to tier 3 or turning it off there.
I agree, but the sheriffs decided it should be disabled in Bug 1524495. I do think the test should be re-enabled while we do the benchmark source upgrade, so I'd suggest tier 3 until we get the upgrade done, then if that goes well promote it back to tier 2 (?)
Comment 5•6 years ago
|
||
I think they just got tired of all of the g5 failures and weren't aware of the distinction with the p2s nor the ability to restrict the disabling to just the g5s. But if everyone is happy with tier 3 for both until ugl can be upgraded, that is fine with me. You should sheriff ugl while you are waiting for the ugl update though. Otherwise you might run into the situation I did when I wasn't watching closely enough and it totally broke and no one noticed.
Updated•6 years ago
|
Assignee | ||
Comment 6•6 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #5)
I think they just got tired of all of the g5 failures and weren't aware of the distinction with the p2s nor the ability to restrict the disabling to just the g5s. But if everyone is happy with tier 3 for both until ugl can be upgraded, that is fine with me. You should sheriff ugl while you are waiting for the ugl update though. Otherwise you might run into the situation I did when I wasn't watching closely enough and it totally broke and no one noticed.
I'll submit a patch to re-enable the current raptor-unity-webgl-geckoview as tier 3 pending the upgrade; I'll try to keep my eye on it to make sure it's not broken.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 9•6 years ago
|
||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Comment 12•6 years ago
|
||
Assignee | ||
Comment 13•6 years ago
|
||
The raptor-unity-webgl-geckoview is now re-enabled with existing benchmark source but as tier 3, due to the intermittent device OOM error. Leaving this bug open for that intermittent; which hopefully will be resolved when we upgrade the unity-webgl benchmark source in Bug 1506865.
Comment 14•6 years ago
|
||
bugherder |
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 16•5 years ago
|
||
Yes that's fine as we have Bug 1506865 filed and in progress to do the benchmark upgrade, and the existing test is tier 3 so there won't be any intermittents flagged.
Updated•5 years ago
|
Description
•