Closed Bug 1381839 Opened 7 years ago Closed 7 years ago

Intermittent Android crashtest [taskcluster:error] Task timeout after 3600 seconds. Force killing container.

Categories

(Firefox for Android Graveyard :: Testing, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1381933

People

(Reporter: aryx, Assigned: gbrown)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:other])

Runs of the Android crashtests suite randomly time out.

It seems this started last Friday ~3pm PDT: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=f5687797261d69daf1a788d15275909be03c4c2f&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable&filter-resultStatus=success&filter-searchStr=android%20crashtest&tochange=922f398557affed695563755c1b92469075aea1b

There is usually nothing in the log between a TEST-END and the information that the task got killed, e.g.

[task 2017-07-14T22:40:43.262883Z] 22:40:43     INFO -  REFTEST TEST-PASS | http://10.0.2.2:8854/tests/layout/base/crashtests/373919.xhtml | (LOAD ONLY)
[task 2017-07-14T22:40:43.263310Z] 22:40:43     INFO -  REFTEST TEST-END | http://10.0.2.2:8854/tests/layout/base/crashtests/373919.xhtml

[taskcluster:error] Task timeout after 3600 seconds. Force killing container.
[taskcluster 2017-07-14 23:33:32.864Z] === Task Finished ===

gbrown, can you take a look at this, please?
Flags: needinfo?(gbrown)
Blocks: 1204281
See also bug 1381283 just because it seems unusual for crashtests to start timing out on 2 different platforms at around the same time; the logs look quite different.
Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
Priority: -- → P1
See Also: → 1381283
It looks like there are a few tests that hang in a similar way:

Debug crashtest-6:

22:31:02     INFO -  REFTEST NoneREFTEST TEST-START | http://10.0.2.2:8854/tests/layout/generic/crashtests/370174-3.html
22:31:02     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/layout/generic/crashtests/370174-3.html | 187 / 319 (58%)
22:31:02     INFO -  REFTEST TEST-PASS | http://10.0.2.2:8854/tests/layout/generic/crashtests/370174-3.html | (LOAD ONLY)
22:31:02     INFO -  REFTEST TEST-END | http://10.0.2.2:8854/tests/layout/generic/crashtests/370174-3.html

[taskcluster:error] Task timeout after 3600 seconds. Force killing container.
[taskcluster 2017-07-15 23:17:34.140Z] === Task Finished ===

Debug crashtest-5:

22:40:43     INFO -  REFTEST NoneREFTEST TEST-START | http://10.0.2.2:8854/tests/layout/base/crashtests/373919.xhtml
22:40:43     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/layout/base/crashtests/373919.xhtml | 10 / 323 (3%)
22:40:43     INFO -  REFTEST TEST-PASS | http://10.0.2.2:8854/tests/layout/base/crashtests/373919.xhtml | (LOAD ONLY)
22:40:43     INFO -  REFTEST TEST-END | http://10.0.2.2:8854/tests/layout/base/crashtests/373919.xhtml

[taskcluster:error] Task timeout after 3600 seconds. Force killing container.
[taskcluster 2017-07-14 23:33:32.864Z] === Task Finished ===

Debug crashtest-3:

09:31:28     INFO -  REFTEST NoneREFTEST TEST-START | http://10.0.2.2:8854/tests/editor/composer/crashtests/351236-1.html
09:31:28     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/editor/composer/crashtests/351236-1.html | 185 / 322 (57%)
09:31:39     INFO -  REFTEST TEST-PASS | http://10.0.2.2:8854/tests/editor/composer/crashtests/351236-1.html | (LOAD ONLY)
09:31:39     INFO -  REFTEST TEST-END | http://10.0.2.2:8854/tests/editor/composer/crashtests/351236-1.html

[taskcluster:error] Task timeout after 3600 seconds. Force killing container.
[taskcluster 2017-07-18 10:17:11.019Z] === Task Finished ===
The range in comment 3 strongly suggests this problem started with https://hg.mozilla.org/integration/mozilla-inbound/rev/c038d1ebf74fa3b58d01937f553b050520b45ffa, bug 1362903.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb0ea9caa7e5c3b993090b89ac191ecf2a226d75 appears to confirm that bug 1362903 is responsible.

https://bugzilla.mozilla.org/show_bug.cgi?id=1362903#c20 even notes the issue, but assumed it was a pre-existing condition; while job timeouts were pre-existing, we did not have any significant Android crashtest job timeouts before this change.

:freesamael -- Do you understand what is going wrong here? Note that 2 of the 3 problematic tests (comment 2) rely on reload().
Flags: needinfo?(sawang)
Blocks: 1362903
Thanks Samael!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Whiteboard: [stockwell fixed:other]
Blocks: 1411358
No longer blocks: 1411358
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.