Closed Bug 1512607 Opened 10 months ago Closed 4 months ago

Intermittent Windows raptor [taskcluster:error] Task aborted - max run time exceeded

Categories

(Testing :: Raptor, defect, P1)

Version 3
defect

Tracking

(firefox69 fixed)

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ariakab)

References

Details

(Keywords: intermittent-failure, Whiteboard: [comment 30][stockwell disable-recommended])

Attachments

(1 file)

Filed by: rmaries [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=215904196&repo=mozilla-central

https://queue.taskcluster.net/v1/task/Iuz3PTvfSFaKqwloxY-wBg/runs/0/artifacts/public/logs/live_backing.log

03:29:33     INFO -  raptor-main raptor config: {'binary': '/home/cltbld/tasks/task_1544182066/mozharness/mozharness/mozilla/testing/chrome/chrome-linux/chrome', 'symbols_path': 'https://queue.taskcluster.net/v1/task/DkOUsvsmQimN4IUBvQlKmw/artifacts/public/build/target.crashreporter-symbols.zip', 'app': 'chrome', 'gecko_profile_entries': None, 'run_local': False, 'platform': 'linux', 'host': '127.0.0.1', 'is_release_build': False, 'gecko_profile_interval': None, 'processor': 'x86_64', 'gecko_profile': False, 'obj_path': None}
03:29:33     INFO -  raptor-benchmark bench_dir contains:
03:29:33     INFO -  raptor-benchmark ['unity-webgl', 'wasm-misc', 'SunSpider', 'assorted-dom', 'StyleBench', 'Speedometer', 'MotionMark', 'webaudio', 'six-speed', 'ARES-6', 'resources', 'wasm-godot']
03:29:33     INFO -  raptor-benchmark writing wptserve headers file
03:29:33     INFO -  raptor-benchmark wrote wpt headers file: /home/cltbld/tasks/task_1544182066/build/tests/raptor/raptor/tests/webkit/PerformanceTests/__dir__.headers
03:29:33     INFO -  raptor-benchmark starting webserver on '127.0.0.1:53084'
03:29:33     INFO -  raptor-benchmark serving benchmarks from here: /home/cltbld/tasks/task_1544182066/build/tests/raptor/raptor/tests/webkit/PerformanceTests
03:29:33     INFO -  raptor-gen-test-config writing test settings into background js, so webext can get it
03:29:33     INFO -  raptor-gen-test-config finished writing test config to /home/cltbld/tasks/task_1544182066/build/tests/raptor/webext/raptor/auto_gen_test_config.js
03:29:33     INFO -  raptor-main installing webext /home/cltbld/tasks/task_1544182066/build/tests/raptor/webext/raptor
03:29:33     INFO -  raptor-main starting chrome
03:29:33     INFO -  Application command: /home/cltbld/tasks/task_1544182066/mozharness/mozharness/mozilla/testing/chrome/chrome-linux/chrome --user-data-dir=/tmp/tmpaMKDkz.mozrunner --profile-directory=Default --no-first-run --load-extension=/home/cltbld/tasks/task_1544182066/build/tests/raptor/webext/raptor
03:29:34     INFO -  127.0.0.1 - - [07/Dec/2018 03:29:34] "POST / HTTP/1.1" 200 -
03:29:34     INFO -  raptor-control-server received webext_status: raptor runner.js is loaded!
03:29:34     INFO -  127.0.0.1 - - [07/Dec/2018 03:29:34] "GET /raptor-unity-webgl-chrome.json HTTP/1.1" 200 -
03:29:34     INFO -  raptor-control-server reading test settings from raptor-unity-webgl-chrome.json
03:29:34     INFO -  raptor-control-server sent test settings to web ext runner
03:29:34     INFO -  127.0.0.1 - - [07/Dec/2018 03:29:34] "POST / HTTP/1.1" 200 -
03:29:34     INFO -  raptor-control-server received webext_status: * pausing 30 seconds to let browser settle... *
03:30:04     INFO -  127.0.0.1 - - [07/Dec/2018 03:30:04] "POST / HTTP/1.1" 200 -
03:30:04     INFO -  raptor-control-server received webext_status: running 5 pagecycles of http://127.0.0.1:53084/unity-webgl/index.html?raptor
03:30:04     INFO -  127.0.0.1 - - [07/Dec/2018 03:30:04] "POST / HTTP/1.1" 200 -
03:30:04     INFO -  raptor-control-server received webext_status: opened new empty tab 1733427713
03:30:05     INFO -  127.0.0.1 - - [07/Dec/2018 03:30:05] "POST / HTTP/1.1" 200 -
03:30:05     INFO -  raptor-control-server received webext_status: begin pagecycle 1
03:30:06     INFO -  127.0.0.1 - - [07/Dec/2018 03:30:06] "POST / HTTP/1.1" 200 -
03:30:06     INFO -  raptor-control-server received webext_status: update tab 1733427713
03:30:06     INFO -  127.0.0.1 - - [07/Dec/2018 03:30:06] "POST / HTTP/1.1" 200 -
03:30:06     INFO -  raptor-control-server received webext_status: test tab updated 1733427713
03:33:26     INFO -  127.0.0.1 - - [07/Dec/2018 03:33:26] "POST / HTTP/1.1" 200 -
03:33:26     INFO -  raptor-control-server received webext_status: results received
03:33:27     INFO -  127.0.0.1 - - [07/Dec/2018 03:33:27] "POST / HTTP/1.1" 200 -
03:33:27     INFO -  raptor-control-server received webext_status: begin pagecycle 2
03:33:28     INFO -  127.0.0.1 - - [07/Dec/2018 03:33:28] "POST / HTTP/1.1" 200 -
03:33:28     INFO -  raptor-control-server received webext_status: update tab 1733427713
03:33:28     INFO -  127.0.0.1 - - [07/Dec/2018 03:33:28] "POST / HTTP/1.1" 200 -
03:33:28     INFO -  raptor-control-server received webext_status: test tab updated 1733427713
03:36:58     INFO -  127.0.0.1 - - [07/Dec/2018 03:36:58] "POST / HTTP/1.1" 200 -
03:36:58     INFO -  raptor-control-server received webext_status: results received
03:36:59     INFO -  127.0.0.1 - - [07/Dec/2018 03:36:59] "POST / HTTP/1.1" 200 -
03:36:59     INFO -  raptor-control-server received webext_status: begin pagecycle 3
03:37:00     INFO -  127.0.0.1 - - [07/Dec/2018 03:37:00] "POST / HTTP/1.1" 200 -
03:37:00     INFO -  raptor-control-server received webext_status: update tab 1733427713
03:37:00     INFO -  127.0.0.1 - - [07/Dec/2018 03:37:00] "POST / HTTP/1.1" 200 -
03:37:00     INFO -  raptor-control-server received webext_status: test tab updated 1733427713
03:40:17     INFO -  127.0.0.1 - - [07/Dec/2018 03:40:17] "POST / HTTP/1.1" 200 -
03:40:17     INFO -  raptor-control-server received webext_status: results received
03:40:18     INFO -  127.0.0.1 - - [07/Dec/2018 03:40:18] "POST / HTTP/1.1" 200 -
03:40:18     INFO -  raptor-control-server received webext_status: begin pagecycle 4
03:40:19     INFO -  127.0.0.1 - - [07/Dec/2018 03:40:19] "POST / HTTP/1.1" 200 -
03:40:19     INFO -  raptor-control-server received webext_status: update tab 1733427713
03:40:19     INFO -  127.0.0.1 - - [07/Dec/2018 03:40:19] "POST / HTTP/1.1" 200 -
03:40:19     INFO -  raptor-control-server received webext_status: test tab updated 1733427713
03:43:39     INFO -  127.0.0.1 - - [07/Dec/2018 03:43:39] "POST / HTTP/1.1" 200 -
03:43:39     INFO -  raptor-control-server received webext_status: results received
03:43:40     INFO -  127.0.0.1 - - [07/Dec/2018 03:43:40] "POST / HTTP/1.1" 200 -
03:43:40     INFO -  raptor-control-server received webext_status: begin pagecycle 5
03:43:41     INFO -  127.0.0.1 - - [07/Dec/2018 03:43:41] "POST / HTTP/1.1" 200 -
03:43:41     INFO -  raptor-control-server received webext_status: update tab 1733427713
03:43:41     INFO -  127.0.0.1 - - [07/Dec/2018 03:43:41] "POST / HTTP/1.1" 200 -
03:43:41     INFO -  raptor-control-server received webext_status: test tab updated 1733427713
[taskcluster:error] Aborting task...
[taskcluster 2018-12-07T11:58:03.685Z] === Task Finished ===
[taskcluster 2018-12-07T11:58:03.685Z] Task Duration: 29m58.704526561s
[taskcluster 2018-12-07T11:58:04.097Z] Uploading artifact public/logs/localconfig.json from file logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2019-12-07T10:02:51.692Z
[taskcluster 2018-12-07T11:58:04.787Z] Uploading artifact public/logs/raptor_critical.log from file logs/raptor_critical.log with content encoding "gzip", mime type "text/plain" and expiry 2019-12-07T10:02:51.692Z
[taskcluster 2018-12-07T11:58:05.180Z] Uploading artifact public/logs/raptor_error.log from file logs/raptor_error.log with content encoding "gzip", mime type "text/plain" and expiry 2019-12-07T10:02:51.692Z
[taskcluster 2018-12-07T11:58:05.633Z] Uploading artifact public/logs/raptor_fatal.log from file logs/raptor_fatal.log with content encoding "gzip", mime type "text/plain" and expiry 2019-12-07T10:02:51.692Z
[taskcluster 2018-12-07T11:58:06.086Z] Uploading artifact public/logs/raptor_info.log from file logs/raptor_info.log with content encoding "gzip", mime type "text/plain" and expiry 2019-12-07T10:02:51.692Z
[taskcluster 2018-12-07T11:58:06.501Z] Uploading artifact public/logs/raptor_raw.log from file logs/raptor_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-12-07T10:02:51.692Z
[taskcluster 2018-12-07T11:58:07.076Z] Uploading artifact public/logs/raptor_warning.log from file logs/raptor_warning.log with content encoding "gzip", mime type "text/plain" and expiry 2019-12-07T10:02:51.692Z
[taskcluster:error] Task aborted - max run time exceeded
Summary: Intermittent raptor tier2 [taskcluster:error] Task aborted - max run time exceeded → Intermittent raptor [taskcluster:error] Task aborted - max run time exceeded

Most of the recent failures here seem to be missclasifications. Removing the disable-recommended tag.

Whiteboard: [stockwell disable-recommended]
Summary: Intermittent raptor [taskcluster:error] Task aborted - max run time exceeded → Intermittent Windows raptor [taskcluster:error] Task aborted - max run time exceeded

Windows failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247840095&repo=mozilla-inbound&lineNumber=609
22:18:09 INFO - 'USERPROFILE': 'C:\Users\task_1558558865',
22:18:09 INFO - 'VS140COMNTOOLS': 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\',
22:18:09 INFO - 'WINDIR': 'C:\windows'}
22:18:09 INFO - Calling ['C:\Users\task_1558558865\build\venv\Scripts\python', u'C:\Users\task_1558558865\build\tests\raptor\raptor\raptor.py', u'--test', 'raptor-motionmark-htmlsuite', u'--binary', 'C:\Users\task_1558558865\build\application\firefox\firefox.exe', u'--app', u'firefox', u'--symbolsPath', u'https://queue.taskcluster.net/v1/task/e4IqFCFUTlacofUJEJ2oMg/artifacts/public/build/target.crashreporter-symbols.zip', u'--log-tbpl-level=debug'] with output_timeout 3600
22:18:09 INFO - raptor-main raptor-start
22:18:09 INFO - raptor-main received command line arguments: Namespace(activity=None, app='firefox', binary='C:\Users\task_1558558865\build\application\firefox\firefox.exe', browser_cycles=None, debug_mode=False, e10s=True, gecko_profile=False, gecko_profile_entries=None, gecko_profile_interval=None, gecko_profile_threads=None, host='127.0.0.1', intent=None, is_release_build=False, log_errorsummary=None, log_grouped=None, log_html=None, log_mach=None, log_mach_buffer=None, log_mach_level=None, log_mach_screenshot=None, log_mach_verbose=None, log_raw=None, log_raw_level=None, log_tbpl=None, log_tbpl_buffer=None, log_tbpl_compact=None, log_tbpl_level='debug', log_unittest=None, log_xunit=None, memory_test=False, obj_path=None, page_cycles=None, page_timeout=None, post_startup_delay=30000, power_test=False, run_local=False, symbols_path='https://queue.taskcluster.net/v1/task/e4IqFCFUTlacofUJEJ2oMg/artifacts/public/build/target.crashreporter-symbols.zip', test='raptor-motionmark-htmlsuite')
22:18:09 INFO - raptor-manifest C:\Users\task_1558558865\build\tests\raptor\raptor\raptor.ini
22:18:09 INFO - raptor-manifest configuring settings for test raptor-motionmark-htmlsuite-firefox
22:18:09 INFO - raptor-main raptor tests scheduled to run:
22:18:09 INFO - raptor-main raptor-motionmark-htmlsuite-firefox
22:18:09 INFO - raptor-main main raptor init, config is: {'binary': 'C:\Users\task_1558558865\build\application\firefox\firefox.exe', 'symbols_path': 'https://queue.taskcluster.net/v1/task/e4IqFCFUTlacofUJEJ2oMg/artifacts/public/build/target.crashreporter-symbols.zip', 'memory_test': False, 'enable_control_server_wait': False, 'e10s': True, 'app': 'firefox', 'gecko_profile_entries': None, 'power_test': False, 'run_local': False, 'platform': 'win', 'host': '127.0.0.1', 'is_release_build': False, 'gecko_profile_interval': None, 'processor': 'x86_64', 'gecko_profile': False, 'obj_path': None}
22:18:09 INFO - raptor-main Merging profile: C:\Users\task_1558558865\build\tests\raptor\raptor\profile_data\base
22:18:09 INFO - raptor-main Merging profile: C:\Users\task_1558558865\build\tests\raptor\raptor\profile_data\common
22:18:09 INFO - raptor-main Merging profile: C:\Users\task_1558558865\build\tests\raptor\raptor\profile_data\perf
22:18:09 INFO - raptor-main Merging profile: C:\Users\task_1558558865\build\tests\raptor\raptor\profile_data\raptor
22:18:09 INFO - raptor-control-server raptor control server running on port 49879...
22:18:09 INFO - raptor-main creating browser runner using mozrunner
22:18:09 INFO - raptor-main starting raptor test: raptor-motionmark-htmlsuite-firefox
22:18:09 INFO - raptor-main test settings: {'expected_browser_cycles': 1, 'alert_threshold': '2.0', 'here': 'C:\Users\task_1558558865\build\tests\raptor\raptor\tests', 'path': 'C:\Users\task_1558558865\build\tests\raptor\raptor\tests\raptor-motionmark-htmlsuite-firefox', 'cold': False, 'unit': 'score', 'ancestor-manifest': 'C:\Users\task_1558558865\build\tests\raptor\raptor\raptor.ini', 'name': 'raptor-motionmark-htmlsuite-firefox', 'type': 'benchmark', 'page_cycles': '5', 'apps': 'firefox', 'test_url': 'http://<host>:<port>/MotionMark/developer.html?test-interval=15&display=minimal&tiles=big&controller=fixed&frame-rate=30&kalman-process-error=1&kalman-measurement-error=4&time-measurement=performance&suite-name=HTMLsuite&raptor=true&oskey={platform}', 'manifest': 'C:\Users\task_1558558865\build\tests\raptor\raptor\tests\raptor-motionmark-htmlsuite.ini', 'page_timeout': '600000', 'browser_cycle': 1, 'expected': 'pass', 'lower_is_better': 'false', 'relpath': 'tests\raptor-motionmark-htmlsuite-firefox'}
22:18:09 INFO - raptor-main raptor config: {'binary': 'C:\Users\task_1558558865\build\application\firefox\firefox.exe', 'local_profile_dir': 'c:\users\task_1558558865\appdata\local\temp\tmpmxjdjg.mozrunner', 'symbols_path': 'https://queue.taskcluster.net/v1/task/e4IqFCFUTlacofUJEJ2oMg/artifacts/public/build/target.crashreporter-symbols.zip', 'memory_test': False, 'enable_control_server_wait': False, 'e10s': True, 'app': 'firefox', 'gecko_profile_entries': None, 'power_test': False, 'run_local': False, 'platform': 'win', 'host': '127.0.0.1', 'is_release_build': False, 'gecko_profile_interval': None, 'processor': 'x86_64', 'gecko_profile': False, 'obj_path': None}
22:18:09 INFO - raptor-benchmark bench_dir contains:
22:18:09 INFO - raptor-benchmark ['ARES-6', 'MotionMark', 'resources', 'six-speed', 'Speedometer', 'StyleBench', 'SunSpider', 'wasm-godot', 'webaudio']
22:18:09 INFO - raptor-benchmark writing wptserve headers file
22:18:09 INFO - raptor-benchmark wrote wpt headers file: C:\Users\task_1558558865\build\tests\raptor\raptor\tests\webkit\PerformanceTests_dir_.headers
22:18:09 INFO - raptor-benchmark starting webserver on '127.0.0.1:49880'
22:18:09 INFO - raptor-benchmark serving benchmarks from here: C:\Users\task_1558558865\build\tests\raptor\raptor\tests\webkit\PerformanceTests
22:18:09 INFO - raptor-gen-test-config writing test settings into background js, so webext can get it
22:18:09 INFO - raptor-gen-test-config finished writing test config to C:\Users\task_1558558865\build\tests\raptor\webext\raptor\auto_gen_test_config.js
22:18:09 INFO - raptor-main installing webext C:\Users\task_1558558865\build\tests\raptor\raptor..\webext\raptor
22:18:09 INFO - raptor-main starting firefox
22:18:09 INFO - Application command: C:\Users\task_1558558865\build\application\firefox\firefox.exe --wait-for-browser -profile c:\users\task_1558558865\appdata\local\temp\tmpmxjdjg.mozrunner
[taskcluster:error] Aborting task...
[taskcluster 2019-05-22T22:47:08.266Z] SUCCESS: The process with PID 5720 (child process of PID 808) has been terminated.
[taskcluster 2019-05-22T22:47:08.266Z] SUCCESS: The process with PID 1804 (child process of PID 808) has been terminated.
[taskcluster 2019-05-22T22:47:08.266Z] SUCCESS: The process with PID 936 (child process of PID 5536) has been terminated.
[taskcluster 2019-05-22T22:47:08.266Z] SUCCESS: The process with PID 808 (child process of PID 5536) has been terminated.
[taskcluster 2019-05-22T22:47:08.266Z] SUCCESS: The process with PID 5536 (child process of PID 3776) has been terminated.
[taskcluster 2019-05-22T22:47:08.266Z]
[taskcluster 2019-05-22T22:47:08.266Z] === Task Finished ===
[taskcluster 2019-05-22T22:47:08.266Z] Task Duration: 29m58.0771656s
[taskcluster 2019-05-22T22:47:08.938Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2020-05-21T20:04:22.103Z
[taskcluster 2019-05-22T22:47:09.633Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/RY5mbIBqRcWC2xA7TfEw5w/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-05-21T20:04:22.103Z
[taskcluster:error] Task aborted - max run time exceeded

Whiteboard: [stockwell disable-recommended]

(In reply to Intermittent Failures Robot from comment #28)

49 failures in 801 pushes (0.061 failures/push) were associated with this bug yesterday.

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1512607&startday=2019-05-22&endday=2019-05-22&tree=all

The majority of these failures are chromium, due to the latest chromium update being broken/unstable. Once chromium is pinned to a stable version (Bug 1553137) the chromium failures will be resolved.

Whiteboard: [stockwell disable-recommended] → [comment 30]

Hi Arnold, would you mind taking a look at this one? Some of the intermittent failures associated with this bug won't be legit, but others will. Basically the idea here is to go through the logs that were flagged with this issue, and find legit instances of this failure.

For each test job that runs via taskcluster, in the job configs we set a maximum overall job runtime. If the test job exceeds this run time then this is the error message in the log:

[taskcluster:error] Task aborted - max run time exceeded

As an example, for Raptor suites the default taskcluster max job duration is set here [0]. Although that can also be set under each particular job in the test sections below that, if the default isn't sufficient, i.e. [1].

What we're looking for here is for test jobs that are running successfully but don't have enough time to finish so they timeout with the error noted above. Note that if a test job fails for some other reason and the job hangs then it might get the same error above; those particular ones we want to ignore in this case.

The goal is to find Raptor test jobs that legit don't have enough time to complete - and then you can create a patch to increase the max-run-time for that particular job(s) only, therefore reducing the number of instances of this intermittent failure.

To do that you'll have to read through the job logs noted throughout the comments above i.e. click on the 'For more details, see' link - that will give you a list of recent failures. Then click on 'view details' on the right and read through the log and see if it's legit; if so note down the test name. You'll see some test jobs are reaching the maximum run time on their last page / browser cycle etc. which means they are close but not quite enough time to finish.

Thanks and any questions feel free to ping me on IRC!

[0] https://searchfox.org/mozilla-central/rev/aba472751e24763d0c18bae8408e9d7106e9acea/taskcluster/ci/test/raptor.yml#10

[1] https://searchfox.org/mozilla-central/rev/aba472751e24763d0c18bae8408e9d7106e9acea/taskcluster/ci/test/raptor.yml#1724

Flags: needinfo?(arnold.iakab)
Whiteboard: [comment 30][stockwell disable-recommended] → [comment 30]

Hi Rob!

I've looked into the logs, even through the older ones though I think most of the old ones are not relevant anymore since issues that occurred there are not present in the recent ones.

I think I managed to extract some clear and valuable information regarding the following:

  • raptor-unity-webgl-firefox-profiling-e10s Rap-Prof(ugl):
    On the linux64-shippable platform all jobs seem to hang up on the second or third pagecycle, I'm not sure if it's supposed for one pagecycle to take too much time to finish but none of them succeeded in getting the job done so I guess the max_duration_time could be increased for this one.

  • raptor-tp6-2-firefox-cold-e10s Rap(tp6-c-2)
    All the jobs that are running on the windows10-aarch64 platform seem to not have enough time to finish so this is another candidate for increased max_duration_time

  • raptor-motionmark-htmlsuite-firefox-profiling-e10s Rap-Prof(mm-h)
    The jobs that run on the macosx1010-64-shippable platform seem to not have enough time to finish.

For the other test suites the failures are random, I couldn't see a pattern and suggest an increase for the max_duration_time.
We might consider increasing the max_duration_time for the ones mentioned above and look through future logs for other candidates.

Flags: needinfo?(ariakab) → needinfo?(rwood)

Excellent, thanks Arnold!

Please go ahead and create a patch to increase the 'max-run-time' values in the raptor taskculster configs [0] for those jobs that you noted above. Note that you'd add / modify a 'max-run-time' value under each of the particular jobs (not changing the default at the top). Thanks!

[0] https://searchfox.org/mozilla-central/source/taskcluster/ci/test/raptor.yml

Flags: needinfo?(rwood) → needinfo?(ariakab)
Assignee: nobody → ariakab
Status: NEW → ASSIGNED
Priority: P5 → P1
Flags: needinfo?(ariakab)
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a73077366144
Intermittent Windows raptor [taskcluster:error] Task aborted - max run time exceeded r=rwood,perftest-reviewers,Bebe
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.