Closed
Bug 967223
Opened 11 years ago
Closed 11 years ago
Intermittent Gaia unit test "timed out after 1760 seconds of no output" due to a hang while running
Categories
(Firefox OS Graveyard :: Gaia::TestAgent, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: RyanVM, Assigned: jgriffin)
References
Details
Attachments
(1 file)
(We really need a component for Gaia unit tests). We've seen failures like these before and just retriggered. Looks like they're intermittently hanging mid-test. Note that it's 13:28 at the time it's finally killed.
https://tbpl.mozilla.org/php/getParsedLog.php?id=34021800&tree=B2g-Inbound
b2g_ubuntu64_vm b2g-inbound opt test gaia-unit on 2014-02-03 12:57:52 PST for push c2da8d1505fe
slave: tst-linux64-spot-447
13:08:26 INFO - gaia-unit-tests TEST-START | calendar/test/unit/calc_test.js | #daysBetween
13:08:26 INFO - gaia-unit-tests TEST-PASS | calendar/test/unit/calc_test.js | calendar/calc #daysBetween same day
13:08:26 INFO - gaia-unit-tests TEST-PASS | calendar/test/unit/calc_test.js | calendar/calc #daysBetween include time
13:08:27 INFO - gaia-unit-tests TEST-PASS | calendar/test/unit/calc_test.js | calendar/calc #daysBetween exclude time
13:08:27 INFO - gaia-unit-tests TEST-END | calendar/test/unit/calc_test.js | #daysBetween
13:08:27 INFO - gaia-unit-tests TEST-START | calendar/test/unit/calc_test.js | #getWeekEndDate
13:08:27 INFO - gaia-unit-tests TEST-PASS | calendar/test/unit/calc_test.js | calendar/calc #getWeekEndDate when given middle
13:08:27 INFO - gaia-unit-tests TEST-PASS | calendar/test/unit/calc_test.js | calendar/calc #getWeekEndDate when given start
command timed out: 1200 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1827.358479
========= Finished '/tools/buildbot/bin/python scripts/scripts/gaia_unit.py ...' failed (results: 2, elapsed: 30 mins, 27 secs) (at 2014-02-03 13:28:27.475131) =========
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 23•11 years ago
|
||
We actually have a component.
I have also a semi-ready patch for bug 892048, and I wonder if that could fix this as well. Even if the trigger cause is not the same, maybe it would make it more reliable.
Component: Gaia → Gaia::TestAgent
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 26•11 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #23)
> We actually have a component.
>
> I have also a semi-ready patch for bug 892048, and I wonder if that could
> fix this as well. Even if the trigger cause is not the same, maybe it would
> make it more reliable.
No activity in that bug for 3 months? Can we please bump the priority then? You can see how frequently this occurs on TBPL.
Comment 27•11 years ago
|
||
Actually, it's no visible activity ;) No real activity for about 3 weeks. And 3 weeks ago I landed the patch that made it possible to run them at all so.. :)
I definitely want to finish this patch once I'm over with my 1.3+ bugs.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 36•11 years ago
|
||
Summary: Intermittent Gaia unit test "command timed out: 1200 seconds without output, attempting to kill" due to a hang while running → Intermittent Gaia unit test "timed out after 1760 seconds of no output" due to a hang while running
Comment 37•11 years ago
|
||
It's actually currently expected that we don't send any output until the very last suite: bug 907621.
And this is not really easy to fix, although I'd like to find something.
Assignee | ||
Comment 38•11 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #37)
> It's actually currently expected that we don't send any output until the
> very last suite: bug 907621.
>
> And this is not really easy to fix, although I'd like to find something.
We don't run tests this way in TBPL; here, we run tests one-at-a-time, and there is output for each test.
Comment 39•11 years ago
|
||
Oh right, forgot this.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 41•11 years ago
|
||
Bug 892048 landed February 17th. Also, I think you did some backend changes recently. Can you tell here when you did the backend changes, so that we can appreciate whether all this fixed this issue?
Assignee | ||
Comment 42•11 years ago
|
||
From the dates of the occurrences of this bug, it's quite likely that most were related to some changes of the AWS node type that is used to run these tests; that change was reverted in bug 969590.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 51•11 years ago
|
||
This happened a lot last night, do you know if it's something somewhat expected on your side? Do you see similar issues in other jobs?
Also, the errors from comment 48 and comment 49 look like it's doing nothing (and especially not waiting for the test-agent) because a suite just finished.
The errors from comment 46 and 47 stopped at about the same area too.
I wonder if we don't have an issue in Firefox here :/
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 55•11 years ago
|
||
I'd guess this is a crash, but that our crash detection isn't catching it correctly. I'll take a look at it later.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → jgriffin
Assignee | ||
Updated•11 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 59•11 years ago
|
||
Each time we get a message from the JS harness, we start a 120s timer; if we hit that timer, we assume the test is hung/crashed, perform crash detection, and abort the run. This should put an end to these mozharness timeouts.
Attachment #8387035 -
Flags: review?(ahalberstadt)
Comment 60•11 years ago
|
||
Comment on attachment 8387035 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/16968
Lgtm. Like I mentioned in the pull request, feel free to ignore my comment if it doesn't make sense.
Attachment #8387035 -
Flags: review?(ahalberstadt) → review+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 63•11 years ago
|
||
Landing without a killAndGetStack implementation for now; will handle in a follow-up if needed:
https://github.com/mozilla-b2g/gaia/commit/a737fd38a0e8b3c678fdeb623a6ddeb9d3190817
Assignee | ||
Comment 64•11 years ago
|
||
No occurrences in 5 days; I'm optimistically calling this fixed.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 65•11 years ago
|
||
So this will catch crashes, report crashes, and restart the tests when this happens?
Assignee | ||
Comment 66•11 years ago
|
||
Yes, yes, and no. After a crash, the crash is reported and the test run is aborted. This is consistent with how our other harnesses work; we could look at resuming the tests in a subsequent patch, if it seems that it would be useful.
Comment 67•11 years ago
|
||
Ok, so if it's a crash and not a "real" timeout, I assume it shows differently in TBPL.
Thanks !
Assignee | ||
Comment 68•11 years ago
|
||
Yep, they'll look very different in TBPL.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 78•10 years ago
|
||
Jonathan, do you have a clue on how to fix this? Seems to come back these days...
Flags: needinfo?(jgriffin)
Reporter | ||
Comment 79•10 years ago
|
||
The most recent ones were bug 1023001, which is fixed now.
Flags: needinfo?(jgriffin)
Comment 80•10 years ago
|
||
okay
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•9 years ago
|
Keywords: intermittent-failure
You need to log in
before you can comment on or make changes to this bug.
Description
•