Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Categories
(Testing :: General, defect, P3)
Tracking
(Not tracked)
People
(Reporter: gbrown, Unassigned)
References
(Depends on 5 open bugs)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell infra] [see summary at comment 278])
+++ This bug was initially created as a clone of Bug #1204281 +++ There is considerable history in bug 1204281; I have cloned and closed that bug because it was simply too big -- too many comments, too much ancient history. I expect these failures to continue. Task timeouts are the last resort whenever a task is hung or otherwise runs for too long. The immediate goal is to address timeouts which recur in the same task frequently.
![]() |
Reporter | |
Updated•3 years ago
|
![]() |
Reporter | |
Comment 2•3 years ago
|
||
Bug 1339568 is the single biggest cause of these failures.
Comment hidden (Intermittent Failures Robot) |
Comment 4•3 years ago
|
||
Waiting for updates on bug 1339568, which depends on bug 1357466, which had no progress in the last months.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 7•3 years ago
|
||
Bug 1339568 remains the biggest contributer of failures reported in this bug.
Comment hidden (Intermittent Failures Robot) |
Comment 9•3 years ago
|
||
Update: there are 65 failures for the last 7 days
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 12•3 years ago
|
||
Some tests have been disabled in bug 1339568, reducing mochitest-media-e10s-2 shutdown hangs considerably. I expect to still see some intermittent failures reported here from other causes.
Comment hidden (Intermittent Failures Robot) |
Comment 14•3 years ago
|
||
In the last week there have been 31 total failures. The failures occur on debug and opt build types and on the following platforms: - android-4-3-armv7-api16: 14 - android-api-16-gradle: 5 - Android 4.2 x86: 2 - Linux x64: 2 - linux64-stylo-disabled: 2 - android-4-0-armv7-api16: 1 - Linux: 1 - linux32-stylo-disabled: 1 - linux64-ccov: 1 - linux64-qr: 1 - windows-mingw32-32: 1 Here is a recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=150857059&lineNumber=9563 And a relevant snippet from it: [task 2017-12-08T21:22:31.523Z] 21:22:31 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/build' 9560 [task 2017-12-08T21:22:31.524Z] 21:22:31 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/build' 9561 [task 2017-12-08T21:22:31.557Z] 21:22:31 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/xpcom/build' 9562 9563 [taskcluster:error] Task timeout after 7200 seconds. Force killing container. 9564 [taskcluster 2017-12-08 21:22:56.375Z] === Task Finished === 9565 [taskcluster 2017-12-08 21:22:56.383Z] Unsuccessful task run with exit code: -1 completed in 7596.094 seconds :gbrown, any updates on this?
![]() |
Reporter | |
Comment 15•3 years ago
|
||
The latest increase in frequency is due to failing Android builds. I cannot tell if that is an on-going problem, or a temporary glitch but will continue to monitor.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 18•3 years ago
|
||
Most recent failures have been Android tests, for a variety of reasons. Some show intermittent download failures/retries -- hopefully a temporary condition. I don't see enough consistency to warrant further investigation right now.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 23•3 years ago
|
||
Last week there were a variety of tasks failing here. The biggest group is linux32/debug jsreftests -- bug 1430668.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 25•3 years ago
|
||
Some recent android opt mochitest failures here - bug 1433560 should help. Also some test-verify failures - bug 1431125.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 30•3 years ago
|
||
Recent failures seem to be mostly bug 1321605 -- probably nothing more I can do there. There may also be some fallout from android mochitest run-by-manifest -- keeping an eye on that.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 35•3 years ago
|
||
There are 83 failures in the past 7 days. Platforms: most of them on android-4-3-armv7-api16 debug/ opt and linux64-jsdcov opt; and a few occurrences on Linux/Linux x64 debug, linux32-stylo-disabled debug, linux64-jsdcov opt, linux64-qr debug and windows10-64-ccov debug. Recent failure log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=165586428&lineNumber=5170877
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 38•3 years ago
|
||
(In reply to OrangeFactor Robot from comment #37) > Platform breakdown: > * linux64-jsdcov: 91 > * android-4-3-armv7-api16: 11 The linux64-jsdcov problem is apparent here -- working on that in bug 1442823.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Updated•3 years ago
|
Comment 42•3 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c291e7dfe010 Increase taskcluster max-run-time for Android mochitest-chrome and Android/opt mochitest; r=me,a=test-only
Comment hidden (Intermittent Failures Robot) |
Comment 44•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c291e7dfe010
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 48•3 years ago
|
||
There are 105 failures in the past week. Platforms: - Android 4.2 x86 opt - android-4-0-armv7-api16 opt and debug - android-4-3-armv7-api16 opt and debug - android-5-0-aarch64 opt - Linux opt and debug (many occurrences) - Linux x64 opt, debug and asan (many occurrences) - linux32-stylo-disabled debug - linux64-ccov opt - linux64-jsdcov opt (many occurrences) - linux64-noopt debug - linux64-qr opt and debug - linux64-stylo-disabled opt and debug - windows10-64-ccov debug Recent log failure: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=168753923&lineNumber=1537 Relevant part of the log: [task 2018-03-17T23:01:53.204Z] 23:01:53 INFO - Successfully installed browsermob-proxy-0.6.0 firefox-puppeteer-52.1.0 marionette-driver-2.5.0 marionette-harness-4.3.0 wptserve-1.4.0 [task 2018-03-17T23:01:53.253Z] 23:01:53 INFO - Return code: 0 [task 2018-03-17T23:01:53.255Z] 23:01:53 INFO - Installing None into virtualenv /builds/worker/workspace/build/venv [task 2018-03-17T23:01:53.256Z] 23:01:53 INFO - error resolving pypi.pvt.build.mozilla.org (ignoring): [taskcluster:error] Task timeout after 5400 seconds. Force killing container. [task 2018-03-17T23:01:53.260Z] 23:01:53 INFO - retry: Calling run_command with args: [['/builds/worker/workspace/build/venv/bin/pip', 'install', '--timeout', '120', '-r', '/builds/worker/workspace/build/tests/config/marionette_requirements.txt', '--no-index', '--find-links', 'http://pypi.pub.build.mozilla.org/pub', '--trusted-host', 'pypi.pub.build.mozilla.org']], kwargs: {'error_level': 'warning', 'error_list': [{'substr': 'not found or a compiler error:', 'level': 'warning'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc33d4490>, 'level': 'error'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc3490300>, 'level': 'warning'}, {'regex': <_sre.SRE_Pattern object at 0x158aa80>, 'level': 'debug'}, {'substr': 'command not found', 'level': 'error'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc488fbb8>, 'level': 'warning'}, {'substr': 'Traceback (most recent call last)', 'level': 'error'}, {'substr': 'SyntaxError: ', 'level': 'error'}, {'substr': 'TypeError: ', 'level': 'error'}, {'substr': 'NameError: ', 'level': 'error'}, {'substr': 'ZeroDivisionError: ', 'level': 'error'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc3493188>, 'level': 'critical'}, {'regex': <_sre.SRE_Pattern object at 0x7f7cc48a7b28>, 'level': 'critical'}], 'cwd': '/builds/worker/workspace/build/tests/config', 'env': {'TASKCLUSTER_PORT_80_TCP_ADDR': '172.17.0.2', 'TASKCLUSTER_INSTANCE_TYPE': 'm3.large', 'TASKCLUSTER_WORKER_TYPE': 'gecko-t-linux-large', 'MOZ_AUTOMATION': '1', 'MOZ_SOURCE_CHANGESET': 'efce78e62b6dac195e7cb4898684da54155d1661', 'MOCHITEST_FLAVOR': 'plain', 'LOGNAME': 'worker', 'USER': 'worker', 'HOME': '/builds/worker', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/builds/worker/bin', 'DISPLAY': ':0', 'LANG': 'en_US.UTF-8', 'TERM': 'xterm', 'SHELL': '/bin/bash', 'MOZ_NODE_PATH': '/usr/local/bin/node', 'RUN_ID': '0', 'HG_STORE_PATH': '/builds/worker/checkouts/hg-store', 'MOZILLA_BUILD_URL': 'https://queue.taskcluster.net/v1/task/cjvNZxdBSKKfmKHtz6MnGQ/artifacts/public/build/target.tar.bz2', 'TASKCLUSTER_PORT': 'tcp://172.17.0.2:80', 'MOZHARNESS_SCRIPT': 'desktop_unittest.py', 'TASKCLUSTER_NAME': '/laughing_khorana/taskcluster', 'GECKO_HEAD_REPOSITORY': 'https://hg.mozilla.org/mozilla-central', 'SCCACHE_DISABLE': '1', 'TASKCLUSTER_PORT_80_TCP_PORT': '80', 'MOZ_SOURCE_REPO': 'https://hg.mozilla.org/mozilla-central', 'GECKO_HEAD_REV': 'efce78e62b6dac195e7cb4898684da54155d1661', 'MOZHARNESS_URL': 'https://queue.taskcluster.net/v1/task/cjvNZxdBSKKfmKHtz6MnGQ/artifacts/public/build/mozharness.zip', 'GECKO_BASE_REPOSITORY': 'https://hg.mozilla.org/mozilla-unified', 'LC_ALL': 'en_US.UTF-8', 'TASKCLUSTER_PORT_80_TCP_PROTO': 'tcp', 'TASK_ID': 'OgTQKS-PS-udVgG1SWL7lA', 'TASKCLUSTER_PORT_80_TCP': 'tcp://172.17.0.2:80', 'OLDPWD': '/builds/worker/workspace', 'HOSTNAME': 'taskcluster-worker', 'SHLVL': '1', 'PWD': '/builds/worker/workspace', 'NEED_PULSEAUDIO': 'true', 'ENABLE_E10S': 'false', 'NEED_WINDOW_MANAGER': 'true', 'MOZHARNESS_CONFIG': 'unittests/linux_unittest.py remove_executables.py', 'TASKCLUSTER_WORKER_GROUP': 'us-west-1', 'TASKCLUSTER_PUBLIC_IP': '54.183.87.60'}}, attempt #1 [task 2018-03-17T23:01:53.261Z] 23:01:53 INFO - Running command: ['/builds/worker/workspace/build/venv/bin/pip', 'install', '--timeout', '120', '-r', '/builds/worker/workspace/build/tests/config/marionette_requirements.txt', '--no-index', '--find-links', 'http://pypi.pub.build.mozilla.org/pub', '--trusted-host', 'pypi.pub.build.mozilla.org'] in /builds/worker/workspace/build/tests/config [task 2018-03-17T23:01:53.261Z] 23:01:53 INFO - Copy/paste: /builds/worker/workspace/build/venv/bin/pip install --timeout 120 -r /builds/worker/workspace/build/tests/config/marionette_requirements.txt --no-index --find-links http://pypi.pub.build.mozilla.org/pub --trusted-host pypi.pub.build.mozilla.org [task 2018-03-17T23:01:53.261Z] 23:01:53 INFO - Using env: {'DISPLAY': ':0',
![]() |
Reporter | |
Comment 49•3 years ago
|
||
Some failures are from 'try -p all -u all' runs which have triggered jsdcov/dev-tools failures -- carry over from bug 1442823.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 51•3 years ago
|
||
Many of the recent logs on linux and android show pypi failures, similar to bug 1445580.
Comment hidden (Intermittent Failures Robot) |
Comment 53•3 years ago
|
||
Over the last 7 days there are 81 failures on this bug. These happen on Android 4.2 x86, android-4-0-armv7-api16, android-4-3-armv7-api16, android-5-0-aarch64, Linux, Linux x64, linux64-qr, Here is a recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=170655917&lineNumber=12538 [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 55•3 years ago
|
||
See comment 0 and bug dependencies. Investigation and mitigation is on-going.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 60•3 years ago
|
||
Recent failures on Android and Linux are frequent and irregular: Not associated with particular tests or test tasks. Some are related to network failures or delays during 'pip install': [task 2018-04-03T18:13:55.354Z] 18:13:55 INFO - Collecting jsonschema==2.5.1 [task 2018-04-03T18:15:55.610Z] 18:15:55 INFO - Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e3d0>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:17:56.643Z] 18:17:56 INFO - Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e550>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:19:58.106Z] 18:19:58 INFO - Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e6d0>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:22:00.205Z] 18:22:00 INFO - Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e850>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:24:04.445Z] 18:24:04 INFO - Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fba9e73e9d0>, 'Connection to pypi.pub.build.mozilla.org timed out. (connect timeout=120.0)')': /pub [task 2018-04-03T18:26:04.821Z] 18:26:04 INFO - Could not find a version that satisfies the requirement jsonschema==2.5.1 (from versions: ) [task 2018-04-03T18:26:04.822Z] 18:26:04 INFO - No matching distribution found for jsonschema==2.5.1
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 70•3 years ago
|
||
Updates: There have been 141 failures in the last 7 days. Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container. The most affected build types are debug and opt, but there are also over 10 failures on asan & pgo. Failures per platforms: Linux: 32 Linux x64: 32 android-4-3-armv7-api16: 31 linux64-qr: 19 linux64-ccov: 18 android-4-0-armv7-api16: 5 android-5-0-aarch64: 2 linux32-nightly: 1 Android 4.2 x86: 1
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 77•3 years ago
|
||
Bug 1451432 remains my biggest current concern here.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 83•3 years ago
|
||
Over the last 7 days there are 99 failures present on this bug. These happen on Android 4.2 x86, android-4-3-armv7-api16, android-5-0-aarch64, Linux, Linux x64, linux64-ccov, linux64-qr, Windows 7. Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=176126037&lineNumber=25270 [task 2018-04-28T15:11:00.833Z] 15:11:00 INFO - TEST-START | /referrer-policy/unsafe-url/meta-referrer/same-origin/http-https/iframe-tag/generic.swap-origin-redirect.http.html [task 2018-04-28T15:11:00.844Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 28 (0x7fa601cc9800) [pid = 15232] [serial = 17] [outer = (nil)] [url = http://web-platform.test:8000/referrer-policy/generic/subresource/document.py?redirection=no-redirect&cache_destroyer=1524928253055] [task 2018-04-28T15:11:00.849Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 27 (0x7fa601508000) [pid = 15232] [serial = 14] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.853Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 26 (0x7fa602492800) [pid = 15232] [serial = 3] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.855Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 25 (0x7fa609723000) [pid = 15232] [serial = 2] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.858Z] 15:11:00 INFO - PID 15175 | --DOCSHELL 0x7fa601552800 == 8 [pid = 15232] [id = {62097795-cefa-4626-92b1-e2b01134ebcf}] [task 2018-04-28T15:11:00.862Z] 15:11:00 INFO - PID 15175 | --DOCSHELL 0x7fa5ff39c800 == 7 [pid = 15232] [id = {907aed2f-82ba-484c-a1a7-209937e276ec}] [task 2018-04-28T15:11:00.864Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 24 (0x7fa5ff365000) [pid = 15232] [serial = 12] [outer = (nil)] [url = http://web-platform.test:8000/referrer-policy/generic/subresource/document.py] [taskcluster:error] Task timeout after 7200 seconds. Force killing container. [task 2018-04-28T15:11:00.874Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 23 (0x7fa609720c00) [pid = 15232] [serial = 7] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.875Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 22 (0x7fa609721000) [pid = 15232] [serial = 6] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.876Z] 15:11:00 INFO - PID 15175 | --DOMWINDOW == 21 (0x7fa6098bf000) [pid = 15232] [serial = 9] [outer = (nil)] [url = about:blank] [task 2018-04-28T15:11:00.878Z] 15:11:00 INFO - PID 15175 | --DOCSHELL 0x7fa601555000 == 6 [pid = 15232] [id = {f6b879e5-be92-4156-adc9-5489f3b17629}] [taskcluster 2018-04-28 15:11:01.679Z] === Task Finished === [taskcluster 2018-04-28 15:11:01.685Z] Unsuccessful task run with exit code: -1 completed in 7740.784 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 88•3 years ago
|
||
Update: There have been 62 failures in the last 7 days, 230 total failures in the last 21 days. Depends on: 1243080, 1246165, 1321605, 1404190, 1431125, 1448049, 1451432 Blocks: 1306635 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container. Failure per platform and build type: - Android 4.2 x86 / opt: 2 - android-4-0-armv7-api16 / debug & opt :3 - android-4-3-armv7-api16 / debug & opt: 17 - android-5-0-aarch64 / debug & op: 1 - Linux x64 / pgo, debug & asan: 8 - linux64-ccov / opt: 8 - linux64-qr / opt & debug: 9 - linux64-asan-reporter / opt: 1 - Linux / debug: 13 Recent log with the failure: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=177354933&lineNumber=1772
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 92•3 years ago
|
||
I expect these failures to continue and this bug to remain open indefinitely. Task timeouts are the last resort whenever a task is hung or otherwise runs for too long. My immediate goal is to address timeouts which recur in the same task frequently and identify other related issues in dependent bugs. I expect infrequent failures on Android especially due to bug 1321605. There are some infrequent failures on Android that can be addressed with more chunks - bug 1461393. Recent failures on Android and Linux are frequent and irregular: Not associated with particular tests or test tasks. I'm hoping those might be addressed in bug 1457694. (Recent network failures and delays during 'pip install' appear to have been resolved.)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 103•3 years ago
|
||
Over the last 7 days there are 124 failures present on this bug. These happen on android-4-0-armv7-api16, android-4-2-x86, android-4-3-armv7-api16, linux32, linux64, linux64-ccov, linux64-nightly, linux64-noopt, linux64-qr, windows7-32. Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?job_id=180438377&repo=mozilla-inbound&lineNumber=1806 [task 2018-05-27T02:38:21.245Z] 02:38:21 INFO - TEST-OK | testSessionOOMRestore | took 335841ms [task 2018-05-27T02:38:21.246Z] 02:38:21 INFO - TEST-START | Shutdown [task 2018-05-27T02:38:21.247Z] 02:38:21 INFO - Passed: 97 [task 2018-05-27T02:38:21.249Z] 02:38:21 INFO - Failed: 0 [task 2018-05-27T02:38:21.250Z] 02:38:21 INFO - Todo: 0 [task 2018-05-27T02:38:21.250Z] 02:38:21 INFO - SimpleTest FINISHED [task 2018-05-27T02:38:42.819Z] 02:38:42 INFO - INFO | automation.py | Application ran for: 0:06:10.045962 [task 2018-05-27T02:38:42.821Z] 02:38:42 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmp_o252qpidlog [task 2018-05-27T02:38:43.458Z] 02:38:43 INFO - /data/tombstones does not exist; tombstone check skipped [task 2018-05-27T02:38:50.701Z] 02:38:50 INFO - INFO | automation.py | Application pid: 5599 [task 2018-05-27T02:39:03.345Z] 02:39:03 INFO - SimpleTest START [task 2018-05-27T02:39:03.347Z] 02:39:03 INFO - TEST-START | testSessionPrivateBrowsing [taskcluster:error] Task timeout after 3600 seconds. Force killing container. [taskcluster 2018-05-27 02:39:11.369Z] === Task Finished === [taskcluster 2018-05-27 02:39:11.372Z] Unsuccessful task run with exit code: -1 completed in 3601.941 seconds
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 105•3 years ago
|
||
linux32-debug/jittest-4 and android-debug/xpcshell-11 are frequent and might be actionable. Otherwise, see comment 92.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 108•3 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5989424834d2 Increase Android/debug xpcshell max-run-time; r=me,a=test-only
![]() |
Reporter | |
Updated•3 years ago
|
Comment 109•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5989424834d2
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 114•3 years ago
|
||
As an update, there are 76 failures in the last 7 days as it follows: - 36 on linux64 (asan-reporter, ccov and qr) - 29 on android. Mostly on 4-3-armv7-api16 but there are also 3 on 4-0-armv7-api16 and 2 on 4-2-x86 - 11 on linux32
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 123•3 years ago
|
||
In the last 7 days, there are 111 failures on this bug. They occur mostly on linux 32, 64, 64-ccov, 64-qr, all builds type. Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=183726361&repo=mozilla-inbound&lineNumber=24638
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 134•3 years ago
|
||
In the last 7 days there are 127 failures: - 51 failures on linux64 / asan, debug, pgo & lto - 47 failures on android-em-4-3-armv7-api16 / opt & debug - 11 failures on linux 32 / debug, pgo & opt - 7 failures on linux64-ccov / debug - 6 failures on osx-cross / opt & debug - 2 failures on android-4-0-armv7-api16 / debug - 2 failures on linux64-qr / debug - 1 failure on linux64-noopt / debug As per comment 92, these will keep occuring. :gbrown is there no way to decrease their rate on the most common platforms they occur: linux64 and android?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 137•3 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #134) > In the last 7 days there are 127 failures: > > - 51 failures on linux64 / asan, debug, pgo & lto > - 47 failures on android-em-4-3-armv7-api16 / opt & debug > - 11 failures on linux 32 / debug, pgo & opt > - 7 failures on linux64-ccov / debug > - 6 failures on osx-cross / opt & debug > - 2 failures on android-4-0-armv7-api16 / debug > - 2 failures on linux64-qr / debug > - 1 failure on linux64-noopt / debug > > As per comment 92, these will keep occuring. > > :gbrown is there no way to decrease their rate on the most common platforms > they occur: linux64 and android? These are certainly too frequent, but I don't know what to do about it. There are multiple causes, even on one platform. Bug 1470177 eliminated a common cause of android timeouts, but the weekly stats hardly changed. Bug 1457694 was an attempt to address a common scenario I (may) see on linux and android, but that hasn't made any progress and I'm not sure how to make progress on it. :( I will keep monitoring failures and acting on those cases that I can.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 140•3 years ago
|
||
:gbrown thanks for looking into this!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 150•3 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #137) > Bug 1457694 was an attempt to address a common scenario I (may) see on linux > and android, but that hasn't made any progress and I'm not sure how to make > progress on it. :( Bug 1457694 has been confirmed as a real issue and is progressing, I think!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 155•3 years ago
|
||
Update: There has been a total of 119 failures in the last 7 days( and 339 failures in the last 21 days). - 2 failures on android-4-0-armv7-api16 debug - 1 failure on android-4-2-x86 opt - 1 failure on android-em-4-2-x86 opt - 19 failures on android-em-4-3-armv7-api16 debug/opt - 17 failures on linux32 debug/opt - 51 failures on linux64 asan/debug/lto/opt/pgo - 10 failures on linux64-ccov debug - 10 failures on linux64-qr debug/opt - 1 failure on linux64-noopt debug - 1 failure on osx-10-10 opt - 3 failures on osx-cross opt - 1 failure on windows10-64 asan - 2 failures on windows7-32 debug There are different type of tests running, but all of them have this error in common: [taskcluster:error] Task timeout after 3600 seconds. Force killing container. [taskcluster 2018-07-18 12:00:30.241Z] Unsuccessful task run with exit code: -1 completed in 3924.674 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 164•3 years ago
|
||
(In reply to OrangeFactor Robot from comment #163) > 135 failures in 138 pushes (0.978 failures/push) were associated with this > bug yesterday. > > Platform breakdown: > * android-em-4-3-armv7-api16: 105 robocop is perma-failing - https://bugzilla.mozilla.org/show_bug.cgi?id=1451513#c19.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 167•3 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #164) > robocop is perma-failing - > https://bugzilla.mozilla.org/show_bug.cgi?id=1451513#c19. That was corrected with a backout and subsequent re-landing - great!
![]() |
Reporter | |
Comment 168•3 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #150) > Bug 1457694 has been confirmed as a real issue and is progressing, I think! That bug hasn't been marked as resolved, but some changes have landed in related bugs and I am seeing far fewer of those issues. Failure rates in this bug have recently been significantly lower than we've seen in months.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 175•3 years ago
|
||
Thanks :aryx! Yes, it looks like bug 1480494 is the biggest contributor to current failures here.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 188•3 years ago
|
||
Update: There have been 98 failures in the last week. Failures per platform and build type: linux64 / debug, asan, opt, lto: 39 osx-10-10 / opt & debug: 12 android-em-4-3-armv7-api16 / debug & opt: 11 linux64-ccov / debug: 11 linux32 / debug & opt: 7 osx-cross / opt & asan: 7 windows2012-32 / debug & pgo: 2 windows7-32 / opt & pgo: 3 android-em-4-2-x86 / debug: 2 linux64-noopt / debug: 1 linux64-qr / debug: 1 android-4-0-armv7-api16 / opt: 1 android-5-0-aarch64 / opt: 1 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 194•3 years ago
|
||
There have been 118 failures in the last week. The most affected platform / build: linux64-ccov / debug. Recent relevat log file: https://treeherder.mozilla.org/logviewer.html#?job_id=197122988&repo=mozilla-central&lineNumber=180429 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container. Blocks: 1306635 Depends on: 1115253, 1243080, 1321605, 1404190, 1431125, 1457694, 1480348, 1487877, 1487886
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 199•3 years ago
|
||
linux64/ccov devtool and android/debug mochitest failure are being addressed with more test chunks. linux64 mochitest-media failures are reportedly 1480348. There are also some build task timeouts, which seem quite random -- I don't know what to do about those.
Comment hidden (Intermittent Failures Robot) |
Comment 201•3 years ago
|
||
Update: There have been 37 failures in the last week. Failures per platform and build type: - linux64 / debug, opt, asan, pgo, lto: 16 - linux32 / debug & opt: 10 - android-em-4-3-armv7-api16 / debug & opt: 5 - linux64-qr / debug: 2 - linux64-ccov / debug: 1 - osx-10-10-dmd / opt: 1 - osx-cross / opt: 1 - osx-cross-noopt / debug: 1 Recent relevant log file: https://treeherder.mozilla.org/logviewer.html#?job_id=199174505&repo=mozilla-inbound&lineNumber=40467 Summary: Intermittent [taskcluster:error] Task timeout after 3600 seconds. Force killing container. / [taskcluster:error] Task timeout after 5400 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 203•3 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #199) > linux64/ccov devtool and android/debug mochitest failure are being addressed > with more test chunks. > > linux64 mochitest-media failures are reportedly 1480348. > > There are also some build task timeouts, which seem quite random -- I don't > know what to do about those. Many of those (apparent) build task timeouts are in rusttests; I haven't investigated, but note bug 1489277 might be tangentially related.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 208•2 years ago
|
||
There was a recent spike in linux64-ccov/asan build timeouts after 60 minutes; these are "build-linux64-asan-fuzzing-ccov/opt (bocf)" builds, on mozilla-central only. The most recent builds have run close to 50 minutes, so I won't take action until/unless this issue returns.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 211•2 years ago
|
||
Some failures last week happened during robustcheckout - fallout from bug 1490703 possibly - should be resolved now.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 218•2 years ago
|
||
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=212394686&repo=autoland&lineNumber=45090 [task 2018-11-17T15:05:43.073Z] 15:05:43 INFO - TEST-START | /referrer-policy/same-origin/attr-referrer/cross-origin/http-https/script-tag/cross-origin.keep-origin-redirect.http.html [task 2018-11-17T15:05:43.296Z] 15:05:43 INFO - PID 22748 | ++DOCSHELL 0x7fdd2decb800 == 2 [pid = 22854] [id = {f599e7eb-3bd6-4194-91f2-ee9bbeb55f76}] [task 2018-11-17T15:05:43.298Z] 15:05:43 INFO - PID 22748 | ++DOMWINDOW == 5 (0x7fdd2ca0e800) [pid = 22854] [serial = 5] [outer = (nil)] [task 2018-11-17T15:05:43.397Z] 15:05:43 INFO - PID 22748 | ++DOMWINDOW == 6 (0x7fdd2dd62400) [pid = 22854] [serial = 6] [outer = 0x7fdd2ca0e800] [task 2018-11-17T15:05:43.679Z] 15:05:43 INFO - PID 22748 | ++DOMWINDOW == 7 (0x7fdd2dd63400) [pid = 22854] [serial = 7] [outer = 0x7fdd2ca0e800] [task 2018-11-17T15:05:44.143Z] 15:05:44 INFO - PID 22748 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties [taskcluster:error] Task timeout after 7200 seconds. Force killing container. [taskcluster 2018-11-17 15:05:45.094Z] === Task Finished === [taskcluster 2018-11-17 15:05:45.096Z] Unsuccessful task run with exit code: -1 completed in 7692.491 seconds Details in Comment 92.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 221•2 years ago
|
||
Tests are in good shape -- almost no test failures here recently. However, there are a bunch of build failures here. I don't see clear correlations to specific build types, but...needs more investigation.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 223•2 years ago
|
||
Recent Android 7.0 failures are on machine-16 -- a ticket has been opened on packet.net to have this machine specific performance problem addressed.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 227•2 years ago
|
||
This bug failed 45 times in the last 7 days. These occur on linux32, linux64 on opt, asan and pgo. For the following platforms there's only 1 failures for each: linux64-ccov, linux64-qr, android-em-4-3-armv7-api16. Recent log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=217017974&repo=mozilla-inbound&lineNumber=87534 15:52:26 INFO - ============================= test session starts ============================== [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - platform linux2 -- Python 2.7.9, pytest-3.6.2, py-1.5.4, pluggy-0.6.0 -- /builds/worker/workspace/build/src/obj-firefox/_virtualenvs/obj-firefox-8yIyzR8r-2.7/bin/python [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - rootdir: /builds/worker/workspace/build/src, inifile: /builds/worker/workspace/build/src/config/mozunit/mozunit/pytest.ini [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - collecting ... collected 1 item [task 2018-12-14T15:52:26.563Z] 15:52:26 INFO - ../config/tests/unit-mozunit.py::TestMozUnit::test_mocked_open PASSED [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - =========================== 1 passed in 0.01 seconds =========================== [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - /builds/worker/workspace/build/src/config/tests/test_mozbuild_reading.py [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - ============================= test session starts ============================== [task 2018-12-14T15:52:26.564Z] 15:52:26 INFO - platform linux2 -- Python 2.7.9, pytest-3.6.2, py-1.5.4, pluggy-0.6.0 -- /builds/worker/workspace/build/src/obj-firefox/_virtualenvs/obj-firefox-8yIyzR8r-2.7/bin/python [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - rootdir: /builds/worker/workspace/build/src, inifile: /builds/worker/workspace/build/src/config/mozunit/mozunit/pytest.ini [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - collecting ... collected 3 items [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - ../config/tests/test_mozbuild_reading.py::TestMozbuildReading::test_filesystem_traversal_no_config PASSED [task 2018-12-14T15:52:26.565Z] 15:52:26 INFO - ../config/tests/test_mozbuild_reading.py::TestMozbuildReading::test_filesystem_traversal_reading <- ../../../../../usr/lib/python2.7/unittest/case.py SKIPPED [task 2018-12-14T15:52:26.566Z] 15:52:26 INFO - ../config/tests/test_mozbuild_reading.py::TestMozbuildReading::test_orphan_file_patterns PASSED [task 2018-12-14T15:52:26.566Z] 15:52:26 INFO - ===================== 2 passed, 1 skipped in 79.39 seconds ===================== [taskcluster:error] Task timeout after 7200 seconds. Force killing container. [taskcluster 2018-12-14 15:53:03.117Z] === Task Finished === [taskcluster 2018-12-14 15:53:03.118Z] Unsuccessful task run with exit code: -1 completed in 7252.08 seconds
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 233•2 years ago
|
||
There are 29 total failures in the last 7 days and 162 total failures in the last 30 days (some of them might be misclassifications), majority on linux. Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=219384257&repo=autoland&lineNumber=194352 [task 2018-12-31T13:12:56.848Z] 13:12:56 INFO - TEST-START | dom/base/test/test_bug715041.xul [task 2018-12-31T13:13:08.337Z] 13:13:08 INFO - GECKO(5728) | JS CALLED OBSERVE FUNCTION!!! [task 2018-12-31T13:13:08.338Z] 13:13:08 INFO - GECKO(5728) | msg 1 Count: 1 [task 2018-12-31T13:13:09.378Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 28 (0xdb22d000) [pid = 5728] [serial = 132] [outer = (nil)] [url = about:blank] [task 2018-12-31T13:13:09.382Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 27 (0xdb238800) [pid = 5728] [serial = 129] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:09.385Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 26 (0xdb22b800) [pid = 5728] [serial = 124] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_blocking_image.html] [task 2018-12-31T13:13:09.387Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 25 (0xdb231000) [pid = 5728] [serial = 127] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:09.389Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 24 (0xdb223000) [pid = 5728] [serial = 123] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:09.391Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 23 (0xdc944c00) [pid = 5728] [serial = 110] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_blockParsing.html] [task 2018-12-31T13:13:09.396Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 22 (0xdb237c00) [pid = 5728] [serial = 128] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_bug1008126.html] [task 2018-12-31T13:13:09.398Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 21 (0xdb225000) [pid = 5728] [serial = 130] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_bug1016960.html] [task 2018-12-31T13:13:09.400Z] 13:13:09 INFO - GECKO(5728) | --DOMWINDOW == 20 (0xdb2b9000) [pid = 5728] [serial = 133] [outer = (nil)] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html] [task 2018-12-31T13:13:10.312Z] 13:13:10 INFO - GECKO(5728) | msg 3 Count: 1 [task 2018-12-31T13:13:13.318Z] 13:13:13 INFO - GECKO(5728) | msg 4 Count: 1 [task 2018-12-31T13:13:13.319Z] 13:13:13 INFO - GECKO(5728) | msg 2 Count: 1 [task 2018-12-31T13:13:13.322Z] 13:13:13 INFO - GECKO(5728) | TESTING CASE AddShiftLocalCleanUp [task 2018-12-31T13:13:13.325Z] 13:13:13 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:13.327Z] 13:13:13 INFO - GECKO(5728) | JS REMOVE IDLE OBSERVER () time to be removed: 1 [task 2018-12-31T13:13:13.332Z] 13:13:13 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 0 [task 2018-12-31T13:13:13.334Z] 13:13:13 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 1 [task 2018-12-31T13:13:13.339Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE REMOVING idle observer with time 1 [task 2018-12-31T13:13:13.341Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE numIdleObserversRemoved: 1 numIdleObserversAdded: 0 [task 2018-12-31T13:13:13.342Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of remove idle observer [task 2018-12-31T13:13:13.350Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:13.351Z] 13:13:13 INFO - GECKO(5728) | AddShiftLocalCleanUp() done clean up [task 2018-12-31T13:13:13.353Z] 13:13:13 INFO - GECKO(5728) | TESTING CASE AddNewLocalWhileAllIdle [task 2018-12-31T13:13:13.354Z] 13:13:13 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:13.356Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE add idle observer before [task 2018-12-31T13:13:13.358Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:13.360Z] 13:13:13 INFO - GECKO(5728) | window is: [object Window] [task 2018-12-31T13:13:13.361Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE ADDING idle observer with time: 1 [task 2018-12-31T13:13:13.363Z] 13:13:13 INFO - GECKO(5728) | MOCK IDLE SERVICE: num idle observers added: 1 [task 2018-12-31T13:13:13.364Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of add idle observer [task 2018-12-31T13:13:13.366Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:13.368Z] 13:13:13 INFO - GECKO(5728) | JS FAKE IDLE SERVICE [task 2018-12-31T13:13:13.369Z] 13:13:13 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:13.373Z] 13:13:13 INFO - GECKO(5728) | JS CALLED OBSERVE FUNCTION!!! [task 2018-12-31T13:13:13.375Z] 13:13:13 INFO - GECKO(5728) | msg 1 Count: 1 [task 2018-12-31T13:13:14.322Z] 13:13:14 INFO - GECKO(5728) | msg 2 Count: 1 [task 2018-12-31T13:13:15.321Z] 13:13:15 INFO - GECKO(5728) | msg 2 Count: 2 [task 2018-12-31T13:13:16.821Z] 13:13:16 INFO - GECKO(5728) | msg 5 Count: 1 [task 2018-12-31T13:13:16.825Z] 13:13:16 INFO - GECKO(5728) | function performNextTest() [task 2018-12-31T13:13:16.828Z] 13:13:16 INFO - GECKO(5728) | currTestCaseNum: 4 [task 2018-12-31T13:13:16.831Z] 13:13:16 INFO - GECKO(5728) | cleanUp: false [task 2018-12-31T13:13:16.834Z] 13:13:16 INFO - GECKO(5728) | passed: true [task 2018-12-31T13:13:16.838Z] 13:13:16 INFO - GECKO(5728) | numIdleObserversRemoved: 0 [task 2018-12-31T13:13:16.840Z] 13:13:16 INFO - GECKO(5728) | numIdleObservesAdded: 1 [task 2018-12-31T13:13:16.844Z] 13:13:16 INFO - GECKO(5728) | TESTING CASE AddNewLocalWhileAllIdleCleanUp [task 2018-12-31T13:13:16.846Z] 13:13:16 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:16.850Z] 13:13:16 INFO - GECKO(5728) | JS REMOVE IDLE OBSERVER () time to be removed: 1 [task 2018-12-31T13:13:16.853Z] 13:13:16 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 0 [task 2018-12-31T13:13:16.854Z] 13:13:16 INFO - GECKO(5728) | JS removeIdleObserver() observer time: 1 [task 2018-12-31T13:13:16.856Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE REMOVING idle observer with time 1 [task 2018-12-31T13:13:16.857Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE numIdleObserversRemoved: 1 numIdleObserversAdded: 0 [task 2018-12-31T13:13:16.858Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of remove idle observer [task 2018-12-31T13:13:16.859Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:16.861Z] 13:13:16 INFO - GECKO(5728) | TESTING CASE ShiftLocalTimerBack() [task 2018-12-31T13:13:16.863Z] 13:13:16 INFO - GECKO(5728) | ============== [task 2018-12-31T13:13:16.866Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE add idle observer before [task 2018-12-31T13:13:16.867Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 1 [task 2018-12-31T13:13:16.869Z] 13:13:16 INFO - GECKO(5728) | window is: [object Window] [task 2018-12-31T13:13:16.871Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE ADDING idle observer with time: 1 [task 2018-12-31T13:13:16.873Z] 13:13:16 INFO - GECKO(5728) | MOCK IDLE SERVICE: num idle observers added: 1 [task 2018-12-31T13:13:16.875Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE end of add idle observer [task 2018-12-31T13:13:16.877Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:16.879Z] 13:13:16 INFO - GECKO(5728) | JS FAKE IDLE SERVICE [task 2018-12-31T13:13:16.881Z] 13:13:16 INFO - GECKO(5728) | JS NUM OBSERVERS: 2 [task 2018-12-31T13:13:16.883Z] 13:13:16 INFO - GECKO(5728) | JS CALLED OBSERVE FUNCTION!!! [task 2018-12-31T13:13:16.885Z] 13:13:16 INFO - GECKO(5728) | msg 2 Count: 1 [task 2018-12-31T13:13:18.751Z] 13:13:18 INFO - GECKO(5728) | msg 4 Count: 1 [taskcluster:error] Task timeout after 3600 seconds. Force killing container. [taskcluster 2018-12-31 13:13:22.095Z] === Task Finished === [taskcluster 2018-12-31 13:13:22.096Z] Unsuccessful task run with exit code: -1 completed in 3603.519 seconds
![]() |
Reporter | |
Comment 234•2 years ago
|
||
(In reply to Intermittent Failures Robot from comment #232) > 29 failures in 1008 pushes (0.029 failures/push) were associated with this > bug in the last 7 days. > > Platform breakdown: > * linux64: 11 > * linux32: 8 > * linux64-qr: 10 I reviewed these linux timeouts but couldn't find a common cause. Existing max-run-times seem appropriate. aws slow-down? Seems better recently -- monitoring.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 236•2 years ago
|
||
Most recent failures are on linux/debug, but vary by test suite. Average task durations are << max-run-time. I wonder if these are related to logspam, like bug 1437991, bug 1515833, and bug 1515827.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 239•2 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #236)
Most recent failures are on linux/debug, but vary by test suite. Average task durations are << max-run-time. I wonder if these are related to logspam, like bug 1437991, bug 1515833, and bug 1515827.
Failure frequency did seem to decrease with resolution of bug 1437991, but linux/debug continues to dominate ongoing failures.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 241•2 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #236)
Most recent failures are on linux/debug, but vary by test suite. Average task durations are << max-run-time. I wonder if these are related to logspam, like bug 1437991, bug 1515833, and bug 1515827.
Those logspam bugs are all resolved now - great! - and did seem to help the random linux/debug failures here.
Recently we are seeing more linux/debug jsreftest failures specifically -- investigating...
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 243•2 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #241)
Recently we are seeing more linux/debug jsreftest failures specifically -- investigating...
And linux/debug reftest also.
linux64/debug jsreftest might benefit from more chunks.
Other platforms are more puzzling: linux64/debug reftests run in about 30 minutes generally...then suddenly a chunk runs in 60+ minutes. Affected chunks appear "random"...could that be because tests are being shuffled?
![]() |
Reporter | |
Comment 244•2 years ago
|
||
Some failures on Feb 4/5 appear to be related to certificate expiry (temporary infrastructure condition caused mass failures, hangs).
Comment hidden (Intermittent Failures Robot) |
Comment 246•2 years ago
|
||
Over the last 7 days this bug has 30 failures. These happen on windows7-32, osx-cross-noopt, osx-cross, linux64-qr, linux64, linux32
Here is the latest log failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=228854900&repo=mozilla-inbound&lineNumber=30960
[task 2019-02-16T23:10:38.226Z] 23:10:38 INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:38.231Z] 23:10:38 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html | 1167 / 1195 (97%)
[task 2019-02-16T23:10:38.256Z] 23:10:38 INFO - ++DOMWINDOW == 334 (0x7f0529d71c00) [pid = 8577] [serial = 2751] [outer = 0x7f052d531800]
[task 2019-02-16T23:10:38.337Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: HTMLEditRules::BeforeEdit() failed to handle something: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/editor/libeditor/HTMLEditor.cpp, line 3576
[task 2019-02-16T23:10:38.344Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: '!aSelection.RangeCount()', file /builds/worker/workspace/build/src/editor/libeditor/EditorBase.cpp, line 3569
[task 2019-02-16T23:10:38.351Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: '!selectionStartPoint.IsSet()', file /builds/worker/workspace/build/src/editor/libeditor/HTMLEditRules.cpp, line 9806
[task 2019-02-16T23:10:38.355Z] 23:10:38 INFO - [Child 8577, Main Thread] WARNING: Failed to normalize Selection: 'NS_SUCCEEDED(rv)', file /builds/worker/workspace/build/src/editor/libeditor/HTMLEditRules.cpp, line 453
[task 2019-02-16T23:10:41.103Z] 23:10:41 INFO - REFTEST TEST-PASS | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html | image comparison, max difference: 0, number of differing pixels: 0
[task 2019-02-16T23:10:41.106Z] 23:10:41 INFO - REFTEST TEST-END | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-004.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:41.136Z] 23:10:41 INFO - ++DOMWINDOW == 335 (0x7f0528b65000) [pid = 8577] [serial = 2752] [outer = 0x7f052d531800]
[task 2019-02-16T23:10:41.196Z] 23:10:41 INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:41.203Z] 23:10:41 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html | 1168 / 1195 (97%)
[task 2019-02-16T23:10:41.243Z] 23:10:41 INFO - ++DOMWINDOW == 336 (0x7f0528d1ac00) [pid = 8577] [serial = 2753] [outer = 0x7f052d531800]
[task 2019-02-16T23:10:41.628Z] 23:10:41 INFO - REFTEST TEST-PASS | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html | image comparison, max difference: 0, number of differing pixels: 0
[task 2019-02-16T23:10:41.634Z] 23:10:41 INFO - REFTEST TEST-END | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-005.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-001-ref.html
[task 2019-02-16T23:10:41.659Z] 23:10:41 INFO - ++DOMWINDOW == 337 (0x7f0529084c00) [pid = 8577] [serial = 2754] [outer = 0x7f052d531800]
[taskcluster:error] Task timeout after 3600 seconds. Force killing container.
[task 2019-02-16T23:10:41.723Z] 23:10:41 INFO - REFTEST TEST-START | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-006.html == file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-006-ref.html
[task 2019-02-16T23:10:41.746Z] 23:10:41 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/received/selectors/focus-within-006.html | 1169 / 1195 (97%)
[taskcluster 2019-02-16 23:10:42.612Z] === Task Finished ===
[taskcluster 2019-02-16 23:10:42.613Z] Unsuccessful task run with exit code: -1 completed in 4378.344 seconds
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 250•2 years ago
|
||
Over the last 7 days there are 39 failures on this bug. These happen on osx-cross, linux64-qr, linux64, linux32, android-em-4-3-armv7-api16, android-5-0-x86_64.
Here is the latest failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=231041560&repo=autoland&lineNumber=35054
![]() |
Reporter | |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 252•2 years ago
|
||
This started to permafail from this merge https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&revision=ecbfad744a66ee47428dff7992aa6604391844e6&selectedJob=232377959&searchStr=os%2Cx%2Ccross%2Ccompiled%2Cccov%2Cdebug%2Cbuild-macosx64-ccov%2Fdebug%2C%28b%29
gbrown can you please take a look?
![]() |
Reporter | |
Comment 253•2 years ago
|
||
Thanks Arthur. I noticed and hope my patch in bug 1533565 will resolve this.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 255•2 years ago
|
||
There were 4 timeouts on android builds here:
(On retry, they were all fine.)
It looks like hg robustcheckout was taking a very long time.
[task 2019-03-07T10:38:21.362Z] 10:38:21 INFO - Running main action method: multi_l10n
...
[task 2019-03-07T12:05:06.758Z] 12:05:06 INFO - 12:05:06 INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', '--config', 'extensions.robustcheckout=/builds/worker/workspace/build/src/testing/mozharness/external_tools/robustcheckout.py', 'robustcheckout', u'https://hg.mozilla.org/l10n-central/mr', u'mr', '--sharebase', u'/builds/hg-shared', '--branch', u'default']
[task 2019-03-07T12:05:06.758Z] 12:05:06 INFO - 12:05:06 INFO - Copy/paste: hg --config ui.merge=internal:merge --config extensions.robustcheckout=/builds/worker/workspace/build/src/testing/mozharness/external_tools/robustcheckout.py robustcheckout https://hg.mozilla.org/l10n-central/mr mr --sharebase /builds/hg-shared --branch default
[task 2019-03-07T12:05:06.809Z] 12:05:06 INFO - 12:05:06 INFO - (using Mercurial 4.8.1)
[task 2019-03-07T12:05:06.810Z] 12:05:06 INFO - 12:05:06 INFO - ensuring https://hg.mozilla.org/l10n-central/mr@default is available at mr
[task 2019-03-07T12:05:07.025Z] 12:05:07 INFO - 12:05:07 INFO - (sharing from new pooled repository cf74f930fde306218d72c790d69ad9ba7c8b8b51)
[task 2019-03-07T12:05:07.216Z] 12:05:07 INFO - 12:05:07 INFO - requesting all changes
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
vs a successful retry:
[task 2019-03-07T12:43:25.400Z] 12:43:25 INFO - Running main action method: multi_l10n
...
[task 2019-03-07T12:51:28.535Z] 12:51:28 INFO - 12:51:28 INFO - [mozharness: 2019-03-07 12:51:28.535244Z] Finished pull-locale-source step (success)
...
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - [mozharness: 2019-03-07 12:56:32.376373Z] Finished multi-l10n step (success)
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - [mozharness: 2019-03-07 12:56:32.376448Z] Skipping package-source step.
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - Running post-run listener: _parse_build_tests_ccov
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - Running post-run listener: _shutdown_sccache
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - Running post-run listener: _summarize
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - # TBPL SUCCESS #
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - [mozharness: 2019-03-07 12:56:32.376769Z] FxDesktopBuild summary:
[task 2019-03-07T12:56:32.376Z] 12:56:32 INFO - # TBPL SUCCESS #
[taskcluster 2019-03-07 12:56:32.926Z] === Task Finished ===
[taskcluster 2019-03-07 12:58:24.583Z] Successful task run with exit code: 0 completed in 3083.283 seconds
I won't follow-up unless there are more problems like this.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 258•2 years ago
|
||
There are still some linux64/debug devtools timeouts after bug 1534822. Chunks are very unbalanced. Filed bug 1536253 as a first step to improvement.
![]() |
Reporter | |
Comment 259•2 years ago
|
||
There are some infrequent linux64-qr/debug reftest 3600 s timeouts, but chunks normally run in 20 to 40 minutes so I am reluctant to act; watching...
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 262•2 years ago
|
||
source-test-clang-tidy is perma-fail, causing a spike here. There is some existing activity on that test, like bug 1539779 -- waiting to see if that helps. (Otherwise, it might be worth noting that there was a sharp spike in run time around here: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=tidy&tochange=98452610cfcc81cba0d4478797fe1e83a51172e8&fromchange=cf6cfe33476622c202fcbc34ef12f8ca1c039b8a. )
Comment 263•2 years ago
|
||
Geoff, I've filed bug 1540325 for the clang tidy timeouts.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 267•2 years ago
|
||
Over the last 7 days there are 31 failures present on this bug. These happen on: android-em-7-0-x86_64, linux32-shippable, linux64, linux64-qr, osx-cross, osx-cross-ccov
Here is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239147679&repo=autoland&lineNumber=78131
![]() |
Reporter | |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 271•2 years ago
|
||
Over the last 7 days there are 38 failures present on this bug. These happen on android-4-0-armv7-api16, android-5-0-aarch64, android-em-7-0-x86, android-em-7-0-x86_64, linux32-shippable, linux64, linux64-shippable, osx-cross, osx-cross-noopt, osx-shippable, windows-mingw32, windows2012-32-shippable.
Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=240827727&repo=mozilla-central&lineNumber=5178
[task 2019-04-17T00:24:53.121Z] 00:24:53 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-283.xht
[task 2019-04-17T00:24:55.017Z] 00:24:55 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-283.xht | took 1905ms
[task 2019-04-17T00:24:55.019Z] 00:24:55 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-284.xht
[task 2019-04-17T00:24:57.803Z] 00:24:57 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-284.xht | took 2782ms
[task 2019-04-17T00:24:57.805Z] 00:24:57 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-285.xht
[task 2019-04-17T00:24:59.685Z] 00:24:59 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-285.xht | took 1879ms
[task 2019-04-17T00:24:59.687Z] 00:24:59 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-286.xht
[task 2019-04-17T00:25:01.657Z] 00:25:01 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-286.xht | took 1971ms
[task 2019-04-17T00:25:01.657Z] 00:25:01 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-287.xht
[task 2019-04-17T00:25:03.591Z] 00:25:03 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-287.xht | took 1935ms
[task 2019-04-17T00:25:03.593Z] 00:25:03 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-288.xht
[task 2019-04-17T00:25:05.387Z] 00:25:05 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-288.xht | took 1796ms
[task 2019-04-17T00:25:05.388Z] 00:25:05 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-289.xht
[task 2019-04-17T00:25:07.057Z] 00:25:07 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-289.xht | took 1667ms
[task 2019-04-17T00:25:07.058Z] 00:25:07 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-290.xht
[task 2019-04-17T00:25:08.942Z] 00:25:08 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-290.xht | took 1880ms
[task 2019-04-17T00:25:08.944Z] 00:25:08 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-291.xht
[task 2019-04-17T00:25:10.795Z] 00:25:10 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-291.xht | took 1857ms
[task 2019-04-17T00:25:10.796Z] 00:25:10 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-292.xht
[task 2019-04-17T00:25:12.671Z] 00:25:12 INFO - TEST-PASS | /css/CSS2/selectors/first-letter-punctuation-292.xht | took 1873ms
[task 2019-04-17T00:25:12.671Z] 00:25:12 INFO - TEST-START | /css/CSS2/selectors/first-letter-punctuation-293.xht
[taskcluster:error] Task timeout after 5400 seconds. Force killing container.
[taskcluster 2019-04-17 00:25:15.346Z] === Task Finished ===
[taskcluster 2019-04-17 00:25:15.347Z] Unsuccessful task run with exit code: -1 completed in 5402.945 seconds
Comment 272•2 years ago
|
||
Looking at a couple of the linux builds it seems sometimes it takes 1/2 to 1 hour to clone and update the repo. I do see '(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")' in the log as well.
Perhaps we a) either adjust the max run time to account for the slow hg performance or b) improve the hg performance perhaps with fsmonitor or both or some other approach. Too bad we have to clone and update > 3G of data each time as well.
Perhaps we can just try to bump the max run time to see if we prevent or reduce these errors for the problematic jobs, not just builds, then for the ones which complete when given enough time, try to improve their run times. For the ones that won't complete regardless of the run time we can reduce it again and investigate that as a distinct issue.
![]() |
Reporter | |
Updated•2 years ago
|
![]() |
Reporter | |
Comment 273•2 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #272)
Thanks :bc.
Looking at a couple of the linux builds it seems sometimes it takes 1/2 to 1 hour to clone and update the repo. I do see '(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")' in the log as well.
I also see the fsmonitor warning in other logs, where the cloning completes quickly -- maybe it is normal / expected?
Comment hidden (Intermittent Failures Robot) |
Comment 275•2 years ago
|
||
There are 35 total failures in the last 7 days on multiple platforms: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-19&endday=2019-04-26&tree=trunk&bug=1411358
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=242811869&repo=mozilla-inbound&lineNumber=17137
[task 2019-04-26T09:12:42.250Z] 09:12:42 INFO - REFTEST TEST-START | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=test262/language/expressions/equals/S11.9.1_A3.3.js
[task 2019-04-26T09:12:42.251Z] 09:12:42 INFO - REFTEST TEST-LOAD | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=test262/language/expressions/equals/S11.9.1_A3.3.js | 4550 / 6609 (68%)
[task 2019-04-26T09:12:42.255Z] 09:12:42 INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
[task 2019-04-26T09:12:42.466Z] 09:12:42 INFO - org.mozilla.geckoview.test unexpectedly found running. Killing...
[task 2019-04-26T09:12:42.467Z] 09:12:42 INFO - REFTEST TEST-INFO | started process screentopng
[task 2019-04-26T09:12:44.529Z] 09:12:44 INFO - REFTEST TEST-INFO | screentopng: exit 0
[task 2019-04-26T09:13:00.704Z] 09:13:00 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/jsreftest/tests/jsreftest.html?test=test262/language/expressions/equals/S11.9.1_A3.3.js | application ran for longer than allowed maximum time
[task 2019-04-26T09:13:00.705Z] 09:13:00 INFO - remoteautomation.py | Application ran for: 1:50:49.448078
[task 2019-04-26T09:13:02.391Z] 09:13:02 INFO - REFTEST INFO | Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpPTQpi_/4c2ef0dc-18bd-c459-f2b2-5b20c3643e3e.dmp /builds/worker/workspace/build/symbols
[taskcluster:error] Task timeout after 7200 seconds. Force killing container.
[taskcluster 2019-04-26 09:13:20.899Z] === Task Finished ===
[taskcluster 2019-04-26 09:13:20.899Z] Unsuccessful task run with exit code: -1 completed in 7369.418 seconds
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 277•2 years ago
|
||
Most failures are android-em-7-0*. I have checked that these are not hung (recent entries in log at timeout) and that the same tasks normally runs in much less time on other pushes. Although android-performance.log artifacts are not available in this bug, I strongly suspect bug 1545308 and hope to see that bug re-open soon.
![]() |
Reporter | |
Comment 278•2 years ago
|
||
I expect these failures to continue and this bug to remain open indefinitely. Task timeouts are the last resort whenever a task is hung or otherwise runs for too long. My goal is to address timeouts which recur in the same task frequently and identify other related issues in dependent bugs.
I try to review failures here at least once each week. There is no point in pasting recent failure logs or failure counts in this bug. If a spike in frequency is noted, or you have insight into a failure, or I haven't commented on an on-going issue for more than a week, need-info is appreciated.
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 280•2 years ago
|
||
(In reply to Intermittent Failures Robot from comment #279)
19 failures in 933 pushes (0.02 failures/push) were associated with this bug yesterday.
Repository breakdown:
- try: 11
Mass mis-classification.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Reporter | |
Comment 283•2 years ago
|
||
Most recent failures are on android-em-7-0-x86. These are mostly wpt-reftests which normally complete in 15 to 30 minutes but sometimes exceed 90 minutes. Evidence is not conclusive, but I hope these will be resolved by bug 1545308.
Description
•