Open Bug 1446791 Opened 2 years ago Updated 3 days ago

Intermittent wpt [taskcluster:error] Task aborted - max run time exceeded

Categories

(Testing :: web-platform-tests, defect)

Version 3
defect
Not set

Tracking

(firefox64 fixed, firefox65 fixed)

REOPENED
mozilla64
Tracking Status
firefox64 --- fixed
firefox65 --- fixed

People

(Reporter: aryx, Assigned: marco)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(1 file)

https://taskcluster-artifacts.net/OK46ELoLQqqI0aqzlbtTCA/0/public/logs/live_backing.log

Because there are permafails in the log which get ignored in the decision if the job failed or not but shown as failures in Treeherder and the test hitting the run time limit, this gets frequently misclassified.
Summary: Intermittent wpt Aborting task - max run time exceeded! → Intermittent wpt [taskcluster:error] Task aborted - max run time exceeded
Update:
There have been 39 failures in the last week.
All the failures occur on the debug build type.

Failures per platform:

windows10-64-ccov: 34
osx-10-10: 3
linux64-ccov: 1
linux32: 1

Recent relevant log file
https://treeherder.mozilla.org/logviewer.html#?job_id=197721811&repo=mozilla-central&lineNumber=57986
	
Summary: Intermittent wpt [taskcluster:error] Task aborted - max run time exceeded
Flags: needinfo?(james)
Whiteboard: [stockwell needswork]
Probably need to split the windows debug jobs into more chunks like the Linux ones.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4592866fa0bbffdd89066c8d2b2d84eee1ef3be1
Flags: needinfo?(james)
These builds were frequently timing out.
Comment on attachment 9006969 [details]
Bug 1446791 - Add more windows 10 debug chunks for wpt

Sebastian Hengst [:aryx] (needinfo on intermittent or backout) has approved the revision.
Attachment #9006969 - Flags: review+
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/5bb92b2866b2
Add more windows 10 debug chunks for wpt r=aryx
Backout by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b13ca0abc854
Backed out changeset 5bb92b2866b2 for failing at embedded-content/the-area-element/area-coords.html
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/55d804ef829a
Add more windows 10 debug chunks for wpt r=aryx
https://hg.mozilla.org/integration/mozilla-inbound/rev/eef095893d11
Backed out changeset 5bb92b2866b2 for failing at embedded-content/the-area-element/area-coords.html
https://hg.mozilla.org/mozilla-central/rev/55d804ef829a
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Assignee: nobody → james
This is still happening.

Recent failure: 
https://treeherder.mozilla.org/logviewer.html#?job_id=200238780&repo=mozilla-central&lineNumber=57888

Couldn't find source info for moz-extension://ea88da40-3a49-4108-8f0a-11c6c390edc9/background.js, removing record
Error: No objdir path for moz-extension://ccebc78d-0d6d-423f-90ed-ec8f0109f05d/background.js.
Couldn't find source info for moz-extension://ccebc78d-0d6d-423f-90ed-ec8f0109f05d/background.js, removing record
Error: No objdir path for blob:http://web-platform.test:8000/1459a386-bb1f-4765-bbd0-5c15a83be284.
Couldn't find source info for blob:http://web-platform.test:8000/1459a386-bb1f-4765-bbd0-5c15a83be284, removing record
21:07:25     INFO - Getting output from command: ['fetches\\grcov', '-t', 'lcov', '-p', 'z:/build/build/src/', '--ignore-dir', 'gcc*', '--ignore-dir', 'vs2017_*', 'fetches\\target.code-coverage-gcno.zip', 'c:\\users\\task_1537380193\\appdata\\local\\temp\\tmp2eknah', '--llvm']
21:07:25     INFO - Copy/paste: fetches\grcov -t lcov -p z:/build/build/src/ --ignore-dir gcc* --ignore-dir vs2017_* fetches\target.code-coverage-gcno.zip c:\users\task_1537380193\appdata\local\temp\tmp2eknah --llvm
21:08:00     INFO - Running post-action listener: _resource_record_post_action
21:08:00     INFO - Running post-action listener: process_java_coverage_data
21:08:00     INFO - [mozharness: 2018-09-19 21:08:00.703000Z] Finished run-tests step (success)
21:08:00     INFO - Running post-run listener: _resource_record_post_run
21:08:01     INFO - Total resource usage - Wall time: 10527s; CPU: 21.0%; Read bytes: 3267963392; Write bytes: 73901486592; Read time: 143; Write time: 2137
21:08:01     INFO - TinderboxPrint: CPU usage<br/>20.7%
21:08:01     INFO - TinderboxPrint: I/O read bytes / time<br/>3,267,963,392 / 143
21:08:01     INFO - TinderboxPrint: I/O write bytes / time<br/>73,901,486,592 / 2,137
21:08:01     INFO - TinderboxPrint: CPU idle<br/>66,813.0 (79.3%)
21:08:01     INFO - TinderboxPrint: CPU system<br/>1,848.0 (2.2%)
21:08:01     INFO - TinderboxPrint: CPU user<br/>15,550.6 (18.4%)
21:08:01     INFO - pull - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
21:08:01     INFO - install - Wall time: 5s; CPU: 14.0%; Read bytes: 0; Write bytes: 73572352; Read time: 0; Write time: 1
21:08:01     INFO - run-tests - Wall time: 10522s; CPU: 21.0%; Read bytes: 3267934720; Write bytes: 73751308800; Read time: 143; Write time: 2135
[taskcluster:error] Aborting task...
[taskcluster 2018-09-19T21:08:04.432Z] SUCCESS: The process with PID 8144 (child process of PID 6256) has been terminated.
[taskcluster 2018-09-19T21:08:04.432Z] SUCCESS: The process with PID 1112 (child process of PID 6256) has been terminated.
[taskcluster 2018-09-19T21:08:04.432Z] SUCCESS: The process with PID 6256 (child process of PID 8044) has been terminated.
[taskcluster 2018-09-19T21:08:04.432Z] 
[taskcluster 2018-09-19T21:08:04.432Z] === Task Finished ===
[taskcluster 2018-09-19T21:08:04.433Z] Task Duration: 2h59m51.0675961s
[taskcluster 2018-09-19T21:08:05.043Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:06.039Z] Uploading artifact public/logs/log_critical.log from file logs\log_critical.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:06.441Z] Uploading artifact public/logs/log_error.log from file logs\log_error.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:06.927Z] Uploading artifact public/logs/log_fatal.log from file logs\log_fatal.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:07.344Z] Uploading artifact public/logs/log_info.log from file logs\log_info.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:08.722Z] Uploading artifact public/logs/log_raw.log from file logs\log_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:09.957Z] Uploading artifact public/logs/log_warning.log from file logs\log_warning.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:10.354Z] Uploading artifact public/test_info/code-coverage-grcov.zip from file build\blobber_upload_dir\code-coverage-grcov.zip with content encoding "", mime type "application/x-zip-compressed" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:12.108Z] Uploading artifact public/test_info/code-coverage-jsvm.zip from file build\blobber_upload_dir\code-coverage-jsvm.zip with content encoding "", mime type "application/x-zip-compressed" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:14.092Z] Uploading artifact public/test_info/resource-usage.json from file build\blobber_upload_dir\resource-usage.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:14.880Z] Uploading artifact public/test_info/wpt_errorsummary.log from file build\blobber_upload_dir\wpt_errorsummary.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:15.347Z] Uploading artifact public/test_info/wpt_raw.log from file build\blobber_upload_dir\wpt_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:18.697Z] Uploading artifact public/test_info/wptreport.json from file build\blobber_upload_dir\wptreport.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2019-09-19T16:45:24.794Z
[taskcluster 2018-09-19T21:08:20.674Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/XJnUnrF4S2q95poME2K8aQ/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-09-19T16:45:24.794Z
[taskcluster:error] Task aborted - max run time exceeded
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Flags: needinfo?(james)
Looks like the ccov builds sometimes run out of time. marco - want to adjust this somehow?
Flags: needinfo?(james) → needinfo?(mcastelluccio)
I see some chunks are really bad and some are acceptable. We have some optimizations in the pipeline that might help here. In the meantime, we could increase the chunking.

Calixte, does this failing chunk contain the tests that we disabled on Mac as they were really slow?
Flags: needinfo?(cdenizet)
According to the logs, the encoding/ tests are executed here (and we disabled them for mac). So maybe we should do the same here and investigate why are they so slow (it's not due to gcda dump since the pid is always the same).
Flags: needinfo?(cdenizet)
In the last 7 days, there have been 33 occurrences on Windows 10 64, debug.
Whiteboard: [stockwell unknown] → [stockwell unknown][stockwell needswork]
This bug has failed 43 times in the last 7 days. Occurs on windows10-64-ccov, linux64-ccov, windows10-64, windows7-32 on debug, pgo and opt build types.

Recent log:
https://treeherder.mozilla.org/logviewer.html#?job_id=206209167&repo=mozilla-central&lineNumber=122385
This has failed 37 times in the last 7 days on Windows,Linux and OSX affecting debug and opt build types.
Log to a recent failure https://treeherder.mozilla.org/logviewer.html#?job_id=207894200&repo=mozilla-central

Do you guys :calixte, :marco have any updates?
Flags: needinfo?(cdenizet)
What should we do here? The failure doesn't seem restricted to ccov (even though it is more frequent there).
Should we increase the max run time? Or increase the number of chunks?
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(jmaher)
Flags: needinfo?(cdenizet)
I am seeing issues on ccov osx as well for wpt-4 timeouts (bug 1500177).  I always lean towards more chunks instead of more runtime.  For the osx case (bug 1500177) we have 1 job that is abnormally long.  So maybe 50% increase in chunks for now and if this doesn't work figure out if there are issues (like the encoding tests that take a LONG time) and if we can fix a few of them to make the runtime more manageable.
Flags: needinfo?(jmaher)
Duplicate of this bug: 1492098
Stealing from James.
Assignee: james → mcastelluccio
Status: REOPENED → ASSIGNED
So, it looks like this was wrongly backed out in https://hg.mozilla.org/mozilla-central/rev/b13ca0abc854.
Pushed by mcastelluccio@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29421c0b3e6b
Add more windows 10 debug chunks for wpt. r=aryx
Depends on: 1503335
Flags: needinfo?(mcastelluccio)
Pushed by mcastelluccio@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e71e139ea3c6
Add more windows 10 debug chunks for wpt. r=aryx
https://hg.mozilla.org/mozilla-central/rev/e71e139ea3c6
Status: ASSIGNED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
That's a different test suite, we should open a new bug for it.
Flags: needinfo?(james)
Let's see how frequent it is and then we can decide if we want to increase the chunking for ccov.
New failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=213666090&repo=autoland&lineNumber=43392

05:43:27     INFO - TEST-START | /referrer-policy/same-origin/http-rp/same-origin/http-http/img-tag/keep-origin-redirect/same-origin-insecure.http.html
05:43:27     INFO - Closing window 4294967301
05:43:27     INFO - PID 2508 | ++DOMWINDOW == 2 (0x1141a3400) [pid = 2513] [serial = 2] [outer = 0x1141a2000]
05:43:27     INFO - PID 2508 | [Child 2511, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x80004005: file /builds/worker/workspace/build/src/docshell/shistory/nsSHistory.cpp, line 1292
05:43:27     INFO - PID 2508 | [Child 2511, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x80004005: file /builds/worker/workspace/build/src/docshell/shistory/nsSHistory.cpp, line 1292
05:43:27     INFO - PID 2508 | ++DOCSHELL 0x11edec800 == 5 [pid = 2511] [id = {95ca7abf-3048-314c-9341-7ec90ba865f6}]
05:43:27     INFO - PID 2508 | ++DOMWINDOW == 11 (0x11e6cec00) [pid = 2511] [serial = 12] [outer = 0x0]
05:43:27     INFO - PID 2508 | ++DOMWINDOW == 12 (0x122521c00) [pid = 2511] [serial = 13] [outer = 0x11e6cec00]
05:43:28     INFO - PID 2508 | ++DOMWINDOW == 13 (0x11649d800) [pid = 2511] [serial = 14] [outer = 0x11e6cec00]
05:43:28     INFO - PID 2508 | --DOMWINDOW == 6 (0x11d30dc00) [pid = 2510] [serial = 12] [outer = 0x0] [url = moz-extension://53f69c04-5b48-8741-9a0b-cb92367325c0/_generated_background_page.html]
05:43:28     INFO - PID 2508 | ++DOMWINDOW == 3 (0x11c1b5c00) [pid = 2513] [serial = 3] [outer = 0x1141a2000]
[taskcluster:error] Aborting task...
[taskcluster 2018-11-24T13:43:29.318Z] === Task Finished ===
[taskcluster 2018-11-24T13:43:29.318Z] Task Duration: 1h59m58.165143104s

Marco, do we reopen this or file a new bug?
Status: RESOLVED → REOPENED
Flags: needinfo?(mcastelluccio)
Resolution: FIXED → ---
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
You need to log in before you can comment on or make changes to this bug.