Closed Bug 1547952 Opened 6 years ago Closed 6 years ago

Intermittent partials [taskcluster:error] Task timeout after 600 seconds. Force killing container.

Categories

(Release Engineering :: Release Automation, defect)

defect
Not set
normal

Tracking

(firefox-esr60 unaffected, firefox-esr68 fixed, firefox68 wontfix, firefox69 fixed, firefox70 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- fixed
firefox68 --- wontfix
firefox69 --- fixed
firefox70 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: aryx)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=243581100&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/OVvxfNY6SsCqgwSSCuBFYQ/runs/0/artifacts/public/logs/live_backing.log


[task 2019-04-30T11:01:33.763Z] 11:01:33 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2019-04-30T11:01:33.901Z] 11:01:33 INFO - [Parent 2189, StreamTrans #3] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/modules/libjar/nsJARChannel.cpp, line 371
[task 2019-04-30T11:01:33.902Z] 11:01:33 INFO - [Parent 2189, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/worker/workspace/build/src/modules/libjar/nsJARChannel.cpp, line 994
[task 2019-04-30T11:01:33.910Z] 11:01:33 INFO - --DOMWINDOW == 2 (0x7f441cbb8d40) [pid = 2266] [serial = 1] [outer = (nil)] [url = about:blank]
[task 2019-04-30T11:01:33.910Z] 11:01:33 INFO - --DOMWINDOW == 1 (0x7f441df40400) [pid = 2266] [serial = 3] [outer = (nil)] [url = about:blank]
[task 2019-04-30T11:01:33.912Z] 11:01:33 INFO - --DOMWINDOW == 0 (0x7f441cb62c00) [pid = 2266] [serial = 2] [outer = (nil)] [url = about:blank]
[task 2019-04-30T11:01:33.913Z] 11:01:33 INFO - 1556622093906 Marionette TRACE [11] Received DOM event DOMContentLoaded for about:about
[task 2019-04-30T11:01:33.947Z] 11:01:33 INFO - nsStringStats
[task 2019-04-30T11:01:33.947Z] 11:01:33 INFO - => mAllocCount: 9117
[task 2019-04-30T11:01:33.947Z] 11:01:33 INFO - => mReallocCount: 0
[task 2019-04-30T11:01:33.947Z] 11:01:33 INFO - => mFreeCount: 9117
[task 2019-04-30T11:01:33.947Z] 11:01:33 INFO - => mShareCount: 5969
[task 2019-04-30T11:01:33.947Z] 11:01:33 INFO - => mAdoptCount: 449
[task 2019-04-30T11:01:33.948Z] 11:01:33 INFO - => mAdoptFreeCount: 449
[task 2019-04-30T11:01:33.948Z] 11:01:33 INFO - => Process ID: 2266, Thread ID: 139931217004352
[task 2019-04-30T11:01:34.030Z] 11:01:34 INFO - 1556622094028 Marionette TRACE [11] Received DOM event pageshow for about:about
[task 2019-04-30T11:01:34.054Z] 11:01:34 INFO - 1556622094050 Marionette DEBUG 1 <- [1,11,null,{"value":null}]
[task 2019-04-30T11:01:34.196Z] 11:01:34 INFO - 1556622094191 Marionette DEBUG 1 -> [0,12,"Marionette:SetContext",{"value":"content"}]
[task 2019-04-30T11:01:34.196Z] 11:01:34 INFO - 1556622094194 Marionette DEBUG 1 <- [1,12,null,{"value":null}]
[task 2019-04-30T11:01:34.200Z] 11:01:34 INFO - wait_for_pings running action 'send_ping_request'.
[task 2019-04-30T11:01:34.216Z] 11:01:34 INFO - pingserver pings_handler received 'server-test-ping' ping
[task 2019-04-30T11:01:34.224Z] 11:01:34 INFO - 1556622094219 Marionette DEBUG 1 -> [0,13,"WebDriver:GetChromeWindowHandles",{}]
[task 2019-04-30T11:01:34.224Z] 11:01:34 INFO - 1556622094221 Marionette DEBUG 1 <- [1,13,null,["1"]]
[task 2019-04-30T11:01:34.232Z] 11:01:34 INFO - 1556622094227 Marionette DEBUG 1 -> [0,14,"WebDriver:GetWindowHandles",{}]
[task 2019-04-30T11:01:34.232Z] 11:01:34 INFO - 1556622094229 Marionette DEBUG 1 <- [1,14,null,["11"]]
[task 2019-04-30T11:01:34.777Z] 11:01:34 INFO - TEST-PASS | telemetry/marionette/tests/unit/test_ping_server_received_ping.py TestPingServer.test_ping_server_received_ping | took 20076ms
[task 2019-04-30T11:01:34.777Z] 11:01:34 INFO -
[task 2019-04-30T11:01:34.777Z] 11:01:34 INFO - SUMMARY
[task 2019-04-30T11:01:34.777Z] 11:01:34 INFO - -------
[task 2019-04-30T11:01:34.777Z] 11:01:34 INFO - passed: 4
[task 2019-04-30T11:01:34.778Z] 11:01:34 INFO - failed: 0
[task 2019-04-30T11:01:34.778Z] 11:01:34 INFO - todo: 0
[task 2019-04-30T11:01:34.778Z] 11:01:34 INFO - SUITE-END | took 240s
[task 2019-04-30T11:01:35.207Z] 11:01:35 INFO - Return code: 0
[task 2019-04-30T11:01:35.208Z] 11:01:35 INFO - # TBPL SUCCESS #
[task 2019-04-30T11:01:35.208Z] 11:01:35 INFO - Running post-action listener: package_coverage_data
[task 2019-04-30T11:01:35.244Z] 11:01:35 INFO - Getting output from command: ['/builds/worker/fetches/grcov', '-t', 'lcov', '-p', '/builds/worker/workspace/build/src/', '--ignore-dir', 'gcc*', '--ignore-dir', 'vs2017
', '/builds/worker/fetches/target.code-coverage-gcno.zip', '/tmp/tmpNW_g1d']
[task 2019-04-30T11:01:35.244Z] 11:01:35 INFO - Copy/paste: /builds/worker/fetches/grcov -t lcov -p /builds/worker/workspace/build/src/ --ignore-dir gcc
--ignore-dir vs2017_* /builds/worker/fetches/target.code-coverage-gcno.zip /tmp/tmpNW_g1d
[taskcluster:error] Task timeout after 600 seconds. Force killing container.
[taskcluster 2019-04-30 11:01:47.528Z] === Task Finished ===
[taskcluster 2019-04-30 11:01:47.528Z] Unsuccessful task run with exit code: -1 completed in 601.689 seconds

Recent failures are about partial generation for localized builds getting stuck (also possible that it's not frequent enough to hit en-US), mostly on macOS.

Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=257150997&repo=mozilla-central&lineNumber=1717
has

2019-07-18 13:34:17,962 - INFO - Checking /tmp/tmp4witvgfm/target.partial-1.mar for MAR_CHANNEL_ID firefox-mozilla-central
2019-07-18 13:34:17,964 - INFO - /tmp/tmp4witvgfm/target.partial-1.mar channel firefox-mozilla-central in {'firefox-mozilla-central'}
2019-07-18 13:34:18,287 - INFO - target.partial-2.mar: s3 cache hits now 1
2019-07-18 13:34:19,351 - INFO - target.partial-2.mar: Completed 2.9 KiB/2.9 KiB (9.0 KiB/s) with 1 file(s) remaining
download: s3://tc-gp-private-1d-us-east-1/releng/mbsdiff-cache/cff8e36a5b619e1747ee96c9926f4f2220b1fd2de57b8002146d6f57c6a1121915189b732a9095257872a68dc753428aecdb3136a7bb202328b8549c096d4a60/ee525ed192d848a9fc6c578464c84a2f7f6733f126e01a13b80f521d0431256349c300c095d6934d6f249eb8b5a30409897ae90a2f8661b90fb9c201d3c41529/manifest.mf.patch.xz to to.work/browser/META-INF/manifest.mf.patch.xz

[taskcluster:error] Task timeout after 600 seconds. Force killing container.

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=257803283&repo=mozilla-central&lineNumber=1984 has

2019-07-23 02:14:14,173 - WARNING - target.partial-2.mar: 
2019-07-23 02:14:14,174 - INFO - Checking /tmp/tmpi8nhfsmr/target.partial-2.mar for MAR_CHANNEL_ID firefox-mozilla-central
2019-07-23 02:14:14,177 - INFO - /tmp/tmpi8nhfsmr/target.partial-2.mar channel firefox-mozilla-central in {'firefox-mozilla-central'}

[taskcluster:error] Task timeout after 600 seconds. Force killing container.

Is this a download issue in general?

Component: Telemetry → Release Automation: Updates
Flags: needinfo?(mhentges)
Priority: P5 → --
Product: Toolkit → Release Engineering
QA Contact: mtabara
Summary: Intermittent [taskcluster:error] Task timeout after 600 seconds. Force killing container. → Intermittent partials [taskcluster:error] Task timeout after 600 seconds. Force killing container.

Looking at those logs, I'm not sure that they're getting stuck. The logs keep having updates (and progress) right up until they time out.
For example, the macOS log shows its last update at 13:34:19,351, but it times out at 13:34:29.243. There are bigger-than-10-second gaps earlier in the log (that didn't involve the task getting stuck), therefore I just think the task needs more time.
Can we increase the timeout limit?

Flags: needinfo?(mhentges)
Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/autoland/rev/a310e297040f Increase maximum task run time for partials generation on non-asan platforms to 15 minutes. r=mhentges
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee: nobody → aryx.bugmail
Whiteboard: [checkin-needed-beta][checkin-needed-esr68]
Whiteboard: [checkin-needed-beta][checkin-needed-esr68] → [checkin-needed-esr68]
Component: Release Automation: Updates → Release Automation
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: