Intermittent Android reftest,mochitest TEST-UNEXPECTED-FAIL | Shutdown | application ran for longer than allowed maximum time

RESOLVED WORKSFORME

Status

defect
RESOLVED WORKSFORME
7 years ago
10 months ago

People

(Reporter: emorley, Unassigned)

Tracking

({intermittent-failure})

Trunk
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Android Tegra 250 mozilla-inbound opt test reftest-2 on 2012-12-10 11:52:29 PST for push 508d1f5c60b0

slave: tegra-196

https://tbpl.mozilla.org/php/getParsedLog.php?id=17791905&tree=Mozilla-Inbound

{
REFTEST FINISHED: Slowest test took 16109ms (http://10.250.48.219:30196/tests/layout/reftests/font-matching/font-stretch-1.html)
REFTEST INFO | Result summary:
REFTEST INFO | Successful: 1428 (1419 pass, 9 load only)
REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 75 (66 known fail, 0 known asserts, 9 random, 0 skipped, 0 slow)
REFTEST INFO | Total canvas count = 18
REFTEST TEST-START | Shutdown

TEST-UNEXPECTED-FAIL | Shutdown | application ran for longer than allowed maximum time
INFO | automation.py | Application ran for: 1:02:35.827856
INFO | automation.py | Reading PID log: /tmp/tmponmgYjpidlog
getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/'
WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!

REFTEST INFO | runreftest.py | Running tests: end.
}
(Seemingly back again)
Duplicate of this bug: 822027
The frequency of this failure has picked up dramatically on mozilla-inbound Armv6 R2 over the last couple days. Unfortunately, I haven't been able to pin down conclusively a specific changeset where it started.
armv6 (which at least the ones I saw, wasn't actually shutdown) may well just be bug 840229 - not only does that save on reboots into a higher resolution, it also saves pain having to draw at that higher resolution.
The odd thing is that Armv6 R2 seems to have a kind-of bimodal distribution of running either ~60min or ~80min, with the latter obviously being the problem here.
This also seems to have clearly regressed in the last day and a half.
yeah, not rebooting or running at the large resolution will help a lot.  I still think we have been falling prey to something going wrong causing us to fail much more frequently in the past week.
I'm pretty sure that whatever is causing it is in the merge I pushed to m-c earlier today, FWIW. Unfortunately, that push had 100 changesets in it (mostly due to me trying to debug this yesterday already).
(In reply to Ryan VanderMeulen [:RyanVM] from comment #30)
> I'm pretty sure that whatever is causing it is in the merge I pushed to m-c
> earlier today, FWIW. Unfortunately, that push had 100 changesets in it
> (mostly due to me trying to debug this yesterday already).

https://tbpl.mozilla.org/?rev=702d2814efbf FWIW
As an update, retriggers have made it quite clear that bug 716589 is responsible for the spike in Armv6 R2 failures.
I've filed bug 809753 about increasing the timeout.
https://tbpl.mozilla.org/php/getParsedLog.php?id=20084214&tree=Mozilla-Inbound (an armv6 R3 which was running reftestsmall, so that wasn't a magic bullet)
Information from etherpad regarding Android releng crashes

* 80 failures in the last 2 months
* usually all failures are a completed reftest run that times out waiting for shutdown
* reboot and connections to the device are great
* assume this is related to the shutdown crashes we see in other bugs
https://tbpl.mozilla.org/php/getParsedLog.php?id=21282814&tree=Mozilla-Inbound
Summary: Intermittent Android reftest TEST-UNEXPECTED-FAIL | Shutdown | application ran for longer than allowed maximum time → Intermittent Android reftest,mochitest TEST-UNEXPECTED-FAIL | Shutdown | application ran for longer than allowed maximum time
https://tbpl.mozilla.org/php/getParsedLog.php?id=22128306&tree=Mozilla-Inbound

(Must fix starring for this, I can rearrange the full line search fallback to catch this too)
So what are these webgl failures telling us? The single webgl test runs in 4-5 minutes, and then... DM sits around trying and failing to pull some bit of log that would come after the todo count, after SimpleTest FINISHED, and after an hour of trying to pull air, we time out the run?
Ah, I guess

INFO | automation.py | Application ran for: 0:03:43.259977
INFO | zombiecheck | Reading PID log: /tmp/tmp1sJqzTpidlog
WARNING | leakcheck | refcount logging is off, so leaks can't be detected!

would be the remaining things it's trying to pull.
Android Tegra 250 mozilla-central opt test mochitest-gl on 2013-05-12 21:05:43 PDT for push 7130e5134a6e
https://tbpl.mozilla.org/php/getParsedLog.php?id=22889247&tree=Mozilla-Central
Depends on: 882670
Duplicate of this bug: 900728
Closing inactive keywords:intermittent-failure bugs where the TBPLbot has previously commented and the test isn't marked as disabled; filter on orange-cleanup-201401.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.