Intermittent [taskcluster:error] Task aborted - max run time exceeded | check log for raptor-benchmark Info: Cloning the benchmarks to <path>/performance-tests/perf-automation
Categories
(Testing :: Performance, defect, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure, Whiteboard: [retriggered])
Attachments
(2 obsolete files)
Filed by: imoraru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=401918676&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/EMUkgH7qQE2eAUwohKk7hw/runs/0/artifacts/public/logs/live_backing.log
[task 2023-01-10T11:34:58.716Z] 11:34:58 INFO - condprof INFO | Creating metadata...
[task 2023-01-10T11:34:58.719Z] 11:34:58 INFO - condprof INFO | Saving metadata file in /var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmpu024_k14/profile/condprofile.json
[task 2023-01-10T11:34:58.719Z] 11:34:58 INFO - condprof INFO | Profile at /var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmpu024_k14/profile.
[task 2023-01-10T11:34:58.719Z] 11:34:58 INFO - Done.
[task 2023-01-10T11:34:58.860Z] 11:34:58 INFO - raptor-perftest Info: Created a conditioned-profile copy: /var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmptchxz_6y/profile
[task 2023-01-10T11:34:58.860Z] 11:34:58 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_167334929937991/build/tests/raptor/raptor/profile_data/base
[task 2023-01-10T11:34:58.865Z] 11:34:58 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_167334929937991/build/tests/raptor/raptor/profile_data/common
[task 2023-01-10T11:34:58.869Z] 11:34:58 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_167334929937991/build/tests/raptor/raptor/profile_data/perf
[task 2023-01-10T11:34:58.874Z] 11:34:58 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_167334929937991/build/tests/raptor/raptor/profile_data/raptor
[task 2023-01-10T11:34:58.876Z] 11:34:58 INFO - raptor-perftest Info: Browser preferences: {'fission.autostart': True}
[task 2023-01-10T11:34:58.877Z] 11:34:58 INFO - raptor-perftest Info: Local browser profile: /var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmptchxz_6y/profile
[task 2023-01-10T11:34:58.877Z] 11:34:58 INFO - raptor-browsertime Info: Removing mozprofile delimiters from browser profile
[task 2023-01-10T11:34:58.877Z] 11:34:58 INFO - raptor-browsertime Info: cwd: '/opt/worker/tasks/task_167334929937991/build'
[task 2023-01-10T11:34:58.877Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_node: /opt/worker/tasks/task_167334929937991/fetches/node/bin/node
[task 2023-01-10T11:34:58.878Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_node: os.stat_result(st_mode=33261, st_ino=1660386008, st_dev=16777221, st_nlink=1, st_uid=36, st_gid=20, st_size=82563760, st_atime=1673350360, st_mtime=1670920349, st_ctime=1673350360)
[task 2023-01-10T11:34:58.878Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_browsertimejs: /opt/worker/tasks/task_167334929937991/fetches/browsertime/node_modules/browsertime/bin/browsertime.js
[task 2023-01-10T11:34:58.878Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_browsertimejs: os.stat_result(st_mode=33261, st_ino=1660383631, st_dev=16777221, st_nlink=1, st_uid=36, st_gid=20, st_size=6152, st_atime=1673350359, st_mtime=1671858889, st_ctime=1673350360)
[task 2023-01-10T11:34:58.879Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_geckodriver: /opt/worker/tasks/task_167334929937991/fetches/geckodriver
[task 2023-01-10T11:34:58.879Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_geckodriver: os.stat_result(st_mode=33261, st_ino=1660383337, st_dev=16777221, st_nlink=1, st_uid=36, st_gid=20, st_size=5366252, st_atime=1673350350, st_mtime=1672807224, st_ctime=1673350350)
[task 2023-01-10T11:34:58.879Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_chromedriver: /opt/worker/tasks/task_167334929937991/fetches/{}chromedriver
[task 2023-01-10T11:34:58.880Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_chromedriver: [Errno 2] No such file or directory: '/opt/worker/tasks/task_167334929937991/fetches/{}chromedriver'
[task 2023-01-10T11:34:58.880Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_user_args: []
[task 2023-01-10T11:34:58.880Z] 11:34:58 INFO - raptor-browsertime Info: browsertime_user_args: stat: path should be string, bytes, os.PathLike or integer, not list
[task 2023-01-10T11:34:58.880Z] 11:34:58 INFO - raptor-perftest Info: starting test: wasm-misc-optimizing
[task 2023-01-10T11:34:58.881Z] 11:34:58 INFO - raptor-browsertime Info: setting test-specific Firefox preferences
[task 2023-01-10T11:34:58.881Z] 11:34:58 INFO - raptor-browsertime Info: Removing mozprofile delimiters from browser profile
[task 2023-01-10T11:34:58.889Z] 11:34:58 INFO - raptor-benchmark Info: Cloning the benchmarks to /var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmpi6msigff/performance-tests/perf-automation
[task 2023-01-10T11:34:58.893Z] 11:34:58 INFO - Cloning into '/private/var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmpi6msigff/performance-tests/perf-automation'...
[taskcluster:error] Aborting task...
[taskcluster:warn 2023-01-10T11:47:29.116Z] no such process
[taskcluster 2023-01-10T11:47:29.116Z] === Task Finished ===
[taskcluster 2023-01-10T11:47:29.116Z] Task Duration: 15m0.172843s
[taskcluster 2023-01-10T11:47:29.268Z] Uploading artifact public/logs/localconfig.json from file logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2023-04-10T09:40:10.629Z
[taskcluster 2023-01-10T11:47:29.522Z] Uploading redirect artifact public/logs/live.log to URL https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/EMUkgH7qQE2eAUwohKk7hw/runs/0/artifacts/public%2Flogs%2Flive_backing.log with mime type "text/plain; charset=utf-8" and expiry 2023-04-10T09:40:10.629Z
[taskcluster:error] Task aborted - max run time exceeded
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Note that the problem here is with cloning the perf-automation repository:
[task 2023-01-10T11:34:58.889Z] 11:34:58 INFO - raptor-benchmark Info: Cloning the benchmarks to /var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmpi6msigff/performance-tests/perf-automation
[task 2023-01-10T11:34:58.893Z] 11:34:58 INFO - Cloning into '/private/var/folders/wz/g2zb5lgn6737640bv4sq9x6h000014/T/tmpi6msigff/performance-tests/perf-automation'...
[taskcluster:error] Aborting task...
[taskcluster:warn 2023-01-10T11:47:29.116Z] no such process
[taskcluster 2023-01-10T11:47:29.116Z] === Task Finished ===
Why is it stalled for nearly 13 minutes?
Updated•2 years ago
|
Updated•2 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Comment 3•2 years ago
|
||
First occurrence so far: range of retriggers and backfills
Comment 4•2 years ago
|
||
Comment 5•2 years ago
|
||
Updated•2 years ago
|
Updated•2 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Comment 7•2 years ago
|
||
Hi Nazim! Based on a try push linked to a patch that extends the runtime for the browsertime benchmark tests, looking at the log file, it seems that there might be an issue in the extra-profiler run that causes this timeout (hinted by the following line from the logfile for the try push job: [task 2023-01-20T14:42:40.409Z] 14:42:40 INFO - raptor-browsertime Info: Browsertime process for extra profiler run timed out after 2021 seconds). Do you have a suggestion for a potential fix for this, or how we can proceed with the investigation ?
| Comment hidden (Intermittent Failures Robot) |
Yes, it looks like there is a problem right after (or during) we stop the profiler. Inside the browsertime codebase we stop the profiler here: https://github.com/sitespeedio/browsertime/blob/7aa74245a5603a346a745c55dddddcb02d56cfd2/lib/firefox/geckoProfiler.js#L160-L164 And then start gzipping the profile data: https://github.com/sitespeedio/browsertime/blob/7aa74245a5603a346a745c55dddddcb02d56cfd2/lib/firefox/geckoProfiler.js#L171-L176 It's not 100% clear to me which one is failing because we don't have any logs right after we stop it, but I suspect that we are timing out in the latter part because it happens right after stopping the profiler and it doesn't have any timeout check.
I think we should add some more logs and add a timeout for that gzipping part just in case that part takes longer (just like this one. I can send a browsertime PR for that.
| Comment hidden (Intermittent Failures Robot) |
I sent a PR to browsertime to fix this issue and it's landed: https://github.com/sitespeedio/browsertime/pull/1896
Now, it should either fix this issue or give us more insight on the problem. The next step is to update our browsertime to use the latest version (I think). But I'm not sure how that works. Could you help me with that?
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 14•2 years ago
|
||
Hey Nazim,
As per your comment I wanted to followup and see if you still needed any help with next steps
(In reply to Nazım Can Altınova [:canova][:canaltinova on phabricator] from comment #11)
I sent a PR to browsertime to fix this issue and it's landed: https://github.com/sitespeedio/browsertime/pull/1896
Now, it should either fix this issue or give us more insight on the problem. The next step is to update our browsertime to use the latest version (I think). But I'm not sure how that works. Could you help me with that?
Comment 15•2 years ago
|
||
Browsertime was updated via bug 1817204, so we should wait to see if this improves our failure rate.
| Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Comment 18•2 years ago
|
||
:canova, do you think you could take another look at this failure? I think the timeout you added helped in some cases, but in these cases it looks like we're failing after the area of the timeout.
[task 2023-03-06T07:32:00.628Z] 07:32:00 INFO - raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2023-03-06T07:32:03.274Z] 07:32:03 INFO - raptor-browsertime Info: Starting a browsertime benchamrk
[task 2023-03-06T07:32:03.275Z] 07:32:03 INFO - raptor-browsertime Info: Waiting for 1000 ms (post_startup_delay)
[task 2023-03-06T07:32:04.275Z] 07:32:04 INFO - raptor-browsertime Info: Navigating to about:blank
[task 2023-03-06T07:32:04.276Z] 07:32:04 INFO - raptor-browsertime Info: Navigating to url about:blank iteration 1
[task 2023-03-06T07:32:09.308Z] 07:32:09 INFO - raptor-browsertime Info: Cycle 0, waiting for 1000 ms
[task 2023-03-06T07:32:10.310Z] 07:32:10 INFO - raptor-browsertime Info: Cycle 0, starting the measure
[task 2023-03-06T07:32:10.311Z] 07:32:10 INFO - raptor-browsertime Info: Testing url http://127.0.0.1:49920/twitch-animation/index.html iteration 1
[task 2023-03-06T07:32:10.359Z] 07:32:10 INFO - raptor-browsertime Info: Start GeckoProfiler with features: ["js","stackwalk","cpu","screenshots"] and threads: ["GeckoMain","Compositor","Renderer"]
[task 2023-03-06T07:32:19.439Z] 07:32:19 INFO - raptor-browsertime Info: Collecting Gecko profile from C:\Users\task_1678061290\build\blobber_upload_dir\browsertime-results\twitch-animation\profiling\pages\127_0_0_1\twitch-animation\index.html\data\geckoProfile-1.json to C:\Users\task_1678061290\build\blobber_upload_dir\browsertime-results\twitch-animation\profiling\pages\127_0_0_1\twitch-animation\index.html\data\geckoProfile-1.json
[task 2023-03-06T07:32:21.883Z] 07:32:21 INFO - raptor-browsertime Info: Stop GeckoProfiler.
[task 2023-03-06T07:32:21.895Z] 07:32:21 INFO - raptor-browsertime Info: Gzip file the profile.
[task 2023-03-06T07:32:22.626Z] 07:32:22 INFO - raptor-browsertime Info: Done stopping GeckoProfiler.
[taskcluster:error] Aborting task...
[taskcluster 2023-03-06T07:44:02.260Z] SUCCESS: The process with PID 2676 (child process of PID 7152) has been terminated.
Updated•2 years ago
|
Comment 19•2 years ago
|
||
Along with the 10 failures here, there are also 17 failures in bug 1809667 that are related to this issue.
Hey :sparky!
It looks like the last profiler log message is being executed properly. So I don't think this is profiler code related. See the Done stopping GeckoProfiler. log here: https://github.com/sitespeedio/browsertime/blob/a8e8d133998c58947f49711b45717885b6351b14/lib/firefox/geckoProfiler.js#L184
When we arrive to this line, we are done with everything profiler related.
I specifically wanted to add this log at the end, to see if we are successfully exiting the profiler code. Because before that, we were only seeing a previous profiler log message but was never sure where it was failing.
So my theory is that, after this line, we return to this function and then we hang here somewhere: https://github.com/sitespeedio/browsertime/blob/a8e8d133998c58947f49711b45717885b6351b14/lib/firefox/webdriver/firefox.js#L121
I'm not sure where it's hanging here though, as we don't have log messages in that function. I suggest adding more logs here as well to understand what's happening.
Comment 21•2 years ago
|
||
Ok, thanks :canova!
:kshampur, can you look into this a bit to see if there's a way we could fix this? I think the best approach here would be to set a shorter timeout for the code that runs the profiler run, and ignore the profile if we fail: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/browsertime/base.py#820-827
If we decrease the proc_timeout to 7 minutes at most, that should give enough time to complete a single test run for a profile in a passing case.
Comment 22•2 years ago
•
|
||
I tried to reproduce this but no luck as everything seems to be passing on multiple retriggers: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&author=kshampur%40mozilla.com&selectedTaskRun=HBsB8V2tQEWFveSTsaPUiw.0
and also I see from the latest orange factor graph it is now zero?
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2023-03-02&endday=2023-03-09&tree=trunk&failurehash=all&bug=1809652
and correspondingly, since this seeped into Bug 1809667, comment 42,
I see that the orange factor for that bug has also gone down considerably https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2023-03-02&endday=2023-03-09&tree=all&failurehash=all&bug=1809667
I will keep the ni? and keep monitoring
| Comment hidden (Intermittent Failures Robot) |
Comment 24•2 years ago
|
||
I see the failure rate is still high in CI but I am unable to reproduce on Try
recent Try run with rebuilds that are still largely passing
So I may just write a patch for the time out, and once landed monitor that to see if it helps m-c/autoland
| Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Description
•