Intermittent [taskcluster:error] Task aborted - max run time exceeded [taskcluster:error] Task timeout after X seconds. Force killing container.due to slow cloning
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect, P2)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Assigned: markco)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])
Attachments
(1 file)
![]() |
||
Updated•6 years ago
|
![]() |
||
Updated•6 years ago
|
![]() |
||
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 7•6 years ago
|
||
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 10•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 13•6 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 58•5 years ago
•
|
||
Today the Windows ccov job failed 4 times because cloning was slow (15-30 minutes). For the 99.9% percentile, it's 10 minutes: https://sql.telemetry.mozilla.org/queries/65011/source
Classifications in this bug indicate this mostly affects Linux.
Connor, can you check if their is a pattern for the log clone times?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 63•5 years ago
|
||
The only pattern I seem to see are tasks taking a long time during the working directory update stage (ie writing all the files to disk once the worker has a copy of the repo itself). A few examples:
10 minutes to 124400/283483 files.
20 minutes to 200100/283497 files.
10 minutes to 191261/27776 files. Interestingly this one slows down considerably about halfway through.
An aside: apologies for taking so long to respond to your needinfo.
![]() |
||
Comment 64•5 years ago
|
||
The affected machines don't show higher failure rates for other tasks: https://sql.telemetry.mozilla.org/queries/65552/source
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 71•5 years ago
•
|
||
There are 35 total failures in the last 7 days and 114 total failures in the last 30 days on linux64-shippable and linux32-shippable opt.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278169683&repo=autoland&lineNumber=343
fetches 2019-11-26T06:41:32.740Z] Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/amJy-JSaSQeIAec8oSLdAg/artifacts/public/build/target.tar.bz2
[fetches 2019-11-26T06:41:32.961Z] https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZYGge5qsTr6LK-0URjRksA/artifacts/public/build/minidump_stackwalk.tar.xz resolved to 1059240 bytes with sha256 434e9ef66fd15c388a8d32141ec80de91d1434d543b4094d3efdf8a42610216d in 0.222s
[fetches 2019-11-26T06:41:32.961Z] Extracting /builds/worker/fetches/minidump_stackwalk.tar.xz to /builds/worker/fetches
[fetches 2019-11-26T06:41:33.058Z] /builds/worker/fetches/minidump_stackwalk.tar.xz extracted in 0.097s
[fetches 2019-11-26T06:41:33.058Z] Removing /builds/worker/fetches/minidump_stackwalk.tar.xz
[fetches 2019-11-26T06:41:35.702Z] https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/amJy-JSaSQeIAec8oSLdAg/artifacts/public/build/target.crashreporter-symbols.zip resolved to 46728139 bytes with sha256 dc4b4ee3c7dd901f90d90c490947613ce01f657efe87d416ec4925313296cb23 in 2.962s
[fetches 2019-11-26T06:41:35.702Z] Extracting /builds/worker/fetches/target.crashreporter-symbols.zip to /builds/worker/fetches
[fetches 2019-11-26T06:41:35.704Z] Archive: /builds/worker/fetches/target.crashreporter-symbols.zip
[fetches 2019-11-26T06:41:35.705Z] inflating: certutil/76E7CD108CC27257328B56C47CD334100/certutil.sym
[fetches 2019-11-26T06:41:35.709Z] inflating: crashreporter/A1FFF060CE84FE55D994BCEE60B9A50B0/crashreporter.sym
[fetches 2019-11-26T06:41:35.709Z] inflating: firefox/03EE84E97F81759D018D295F1A6A93B50/firefox.sym
[fetches 2019-11-26T06:41:35.718Z] inflating: firefox-bin/71AD26377FE876DE485640E729CA02ED0/firefox-bin.sym
[fetches 2019-11-26T06:41:35.719Z] inflating: libclearkey.so/580AE5298498DDFBB4125FF926C5574C0/libclearkey.so.sym
[fetches 2019-11-26T06:41:35.719Z] inflating: libfake.so/6B7A3E79866BEFFB4000C16C5BCE149F0/libfake.so.sym
[fetches 2019-11-26T06:41:35.720Z] inflating: libfakeopenh264.so/5435E7B18F1ED2629787D95AF83915BB0/libfakeopenh264.so.sym
[fetches 2019-11-26T06:41:35.720Z] inflating: libfreebl3.so/FBF49429510AD4BC4D58CE77A42937EB0/libfreebl3.so.sym
[fetches 2019-11-26T06:41:35.723Z] inflating: libfreeblpriv3.so/33F589E9D4312E2800BF25B0D078AE0A0/libfreeblpriv3.so.sym
[fetches 2019-11-26T06:41:35.724Z] inflating: liblgpllibs.so/678F58EE229BE9F45E61281490864D420/liblgpllibs.so.sym
[fetches 2019-11-26T06:41:35.930Z] inflating: libmozavcodec.so/E7854067DA8B6284B2FAB7D9BAC69C840/libmozavcodec.so.sym
[fetches 2019-11-26T06:41:35.932Z] inflating: libmozavutil.so/60815712CF6559ADCAD034645D02E4250/libmozavutil.so.sym
[fetches 2019-11-26T06:41:35.932Z] inflating: libmozgtk.so/57A90437BF2EF72D1EA14AC4E44926E30/libmozgtk.so.sym
[fetches 2019-11-26T06:41:35.932Z] inflating: libmozgtk.so/C5FE8C993BA0F6B5E77A2BCB71D84D2E0/libmozgtk.so.sym
[fetches 2019-11-26T06:41:35.933Z] inflating: libmozgtk_stub.so/3A4D6E41EBE27A9813BA4A152F985A900/libmozgtk_stub.so.sym
[fetches 2019-11-26T06:41:35.937Z] inflating: libmozsandbox.so/71E6EFC3EED512540C336A02A6F870560/libmozsandbox.so.sym
[fetches 2019-11-26T06:41:35.950Z] inflating: libmozsqlite3.so/8D3704DF18441DC39B430C977EF9E2960/libmozsqlite3.so.sym
[fetches 2019-11-26T06:41:35.950Z] inflating: libmozwayland.so/C79186E09B6D1CC1D2B66ADC1B65656D0/libmozwayland.so.sym
[fetches 2019-11-26T06:41:35.953Z] inflating: libnspr4.so/E8DCACEE7AC116F7E4F9E33441AF2BE00/libnspr4.so.sym
[fetches 2019-11-26T06:41:36.158Z] inflating: libnss3.so/F418568F16AF15E5C26C1140D89813C10/libnss3.so.sym
[fetches 2019-11-26T06:41:36.160Z] inflating: libnssckbi.so/5402DB9A5BB9B1CDD9919C51B7F0BE190/libnssckbi.so.sym
[fetches 2019-11-26T06:41:36.161Z] inflating: libnssdbm3.so/A1B4CC76EF52CCF1886F6A767FE2ADDD0/libnssdbm3.so.sym
[fetches 2019-11-26T06:41:36.162Z] inflating: libnssutil3.so/53F461198067A7408B6CE1A949EBC6F00/libnssutil3.so.sym
[fetches 2019-11-26T06:41:36.163Z] inflating: libplc4.so/65C1756078EA29F37A00CF496A220EA50/libplc4.so.sym
[fetches 2019-11-26T06:41:36.163Z] inflating: libplds4.so/AD755F5643F033D4E2642A673FAE52DC0/libplds4.so.sym
[fetches 2019-11-26T06:41:36.165Z] inflating: libsmime3.so/CDC81B706813F94C9C45B426FBE819130/libsmime3.so.sym
[fetches 2019-11-26T06:41:36.167Z] inflating: libsoftokn3.so/4C0994035F8D41A6DD0EE5E2F4DE67690/libsoftokn3.so.sym
[fetches 2019-11-26T06:41:36.171Z] inflating: libssl3.so/E2154F25538C375DDF46F2532F9552980/libssl3.so.sym
[fetches 2019-11-26T06:42:14.528Z] inflating: libxul.so/87D3F7C9536291DA200AFD00DA50798F0/libxul.so.sym https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/amJy-JSaSQeIAec8oSLdAg/artifacts/public/build/target.tar.bz2 resolved to 117839081 bytes with sha256 f0510d5434f4563bd33b8e8a89506e25a07433d5527fd839f63b562d64a322fd in 41.788s
[fetches 2019-11-26T06:42:14.528Z] Extracting /builds/worker/fetches/target.tar.bz2 to /builds/worker/fetches
[fetches 2019-11-26T06:43:43.460Z] https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/JtZkq06jSIOAKM4JBolPSg/artifacts/public/build/clang.tar.xz resolved to 182985204 bytes with sha256 950192ece3a8f114f22faab48f5b1c36fe75f2ac2c9dff729bc1361e19b5998e in 130.720s
[fetches 2019-11-26T06:43:43.460Z] Extracting /builds/worker/fetches/clang.tar.xz to /builds/worker/fetches
[taskcluster:error] Task timeout after 1200 seconds. Force killing container.
[taskcluster 2019-11-26 06:44:35.798Z] === Task Finished ===
[taskcluster 2019-11-26 06:44:35.798Z] Unsuccessful task run with exit code: -1 completed in 1236.448 seconds
Connor can you assign someone to work on this?
Comment hidden (Intermittent Failures Robot) |
Comment 73•5 years ago
|
||
In the last 7 days there were 22 occurrences most on linux32-shippable, some on linux64-shippable, all on build type opt.
Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=279773029&repo=autoland&lineNumber=586
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment 118•3 years ago
|
||
See also the comments on bug 1580652 comment 112 up to bug 1580652 comment 118.
(In reply to Connor Sheehan [:sheehan] from comment #118)
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #114)
(In reply to Connor Sheehan [:sheehan] from comment #113)
IIRC the shared repository is a
.hg
directory mounted into the worker without a working directory checkout. This.hg
directory is shared across task containers so each container can use the repo to create a working directory checkout for the task. If the worker doesn't have a copy of the repo in it's shared mount, it has to do a full clone to get one. Later tasks running on the same worker re-use this cached repo to do their working directory checkouts. So we essentially do a full clone when we spawn new workers.How often do we create workers? It feels like that at some times during the day it's happening more often than in others, so that failures for test jobs are more likely to hit.
I'm not sure to be honest. We ought to have some telemetry that can answer these questions. I'm not entirely sure if that exists or where it might live.
Also given that the initial clone from a remote location can take such a long time would it be possible to already run this command when a new worker is getting created?
This was actually one of the things I had in mind - moving the clone time out of task-execution time and into worker-creation time, so the long clone time isn't developer facing. Then vcs operations in tasks would be much more consistent in their runtimes. This may affect some assumptions about how fast we can stand up workers when demand arises, so it's likely a more involved change I'll be looking into next year.
If the issue with these exceptionally slow tasks is filesystem/storage related we will obviously still see slow vcs operations, so that area is ripe for optimization regardless of whether we move the clone to worker-startup time.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 132•2 years ago
•
|
||
There have been 42 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 135•2 years ago
|
||
There have been 30 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64
- android-hw-a51-11-0-aarch64-shippable-qr
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 143•2 years ago
|
||
There have been 42 total failures in the last 7 days, recent failure log.
Affected platforms are:
- android-hw-a51-11-0-aarch64-shippable-qr
- windows10-64
[vcs 2022-10-08T10:54:41.500Z] updating [=================================> ] 237000/327286 8m42s
[vcs 2022-10-08T10:54:42.708Z] updating [=================================> ] 237200/327286 8m40s
[vcs 2022-10-08T10:54:43.750Z] updating [=================================> ] 237400/327286 8m39s
[vcs 2022-10-08T10:54:45.272Z] updating [=================================> ] 237500/327286 8m42s
[vcs 2022-10-08T10:54:46.775Z] updating [=================================> ] 237700/327286 8m44s
[vcs 2022-10-08T10:54:47.840Z] updating [=================================> ] 237900/327286 8m46s
[vcs 2022-10-08T10:54:48.977Z] updating [=================================> ] 238200/327286 8m38s
[taskcluster:error] Aborting task...
[taskcluster 2022-10-08T10:54:50.641Z] SUCCESS: The process with PID 6432 (child process of PID 6164) has been terminated.
[taskcluster 2022-10-08T10:54:50.641Z] SUCCESS: The process with PID 5268 (child process of PID 8128) has been terminated.
[taskcluster 2022-10-08T10:54:50.641Z] SUCCESS: The process with PID 6164 (child process of PID 8128) has been terminated.
[taskcluster 2022-10-08T10:54:50.641Z] SUCCESS: The process with PID 8128 (child process of PID 4448) has been terminated.
[taskcluster 2022-10-08T10:54:50.641Z]
[taskcluster 2022-10-08T10:54:50.642Z] === Task Finished ===
[taskcluster 2022-10-08T10:54:50.642Z] Task Duration: 1h0m0.2658106s
[taskcluster 2022-10-08T10:54:50.651Z] [mounts] Preserving cache: Moving "Z:\\task_166522274304904\\build" to "Z:\\caches\\NWV9cMErQoOa8tIiNk8_yA"
[taskcluster 2022-10-08T10:54:50.991Z] [mounts] Denying task_166522274304904 access to 'Z:\caches\NWV9cMErQoOa8tIiNk8_yA'
[taskcluster 2022-10-08T10:55:45.346Z] Uploading link artifact public/logs/live.log to artifact public/logs/live_backing.log with expiry 2023-01-06T09:39:28.623Z
[taskcluster:error] Task aborted - max run time exceeded
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 145•2 years ago
|
||
Kash, could the browsertime test running on A51 be made more reliable?
Comment 146•2 years ago
|
||
:aryx Looking at the most recent week-to-date orange factor graph, looks like the btime failures have gone down, especially since Oct 6th onwards.
Regardless, I see some overlap of the a51 btime failures here and recent a51 failures in Bug 1589796 (which we are actively looking into, best case the solution for that works for this)
I will bring up this bug in our Triage meeting next week.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 151•2 years ago
|
||
a51 btime failures have gone down a lot. I think resetting the docker on all hosts, as done in Bug 1589796, comment548, has helped with this. Feel free to ni? me or someone else from my team if this reoccurs
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 154•2 years ago
|
||
There have been 30 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64
[vcs 2022-11-12T10:52:08.088Z] updating [=======================================> ] 286400/332294 9m38s
[vcs 2022-11-12T10:52:09.553Z] updating [=======================================> ] 286500/332294 9m36s
[vcs 2022-11-12T10:52:10.829Z] updating [=======================================> ] 286600/332294 9m37s
[vcs 2022-11-12T10:52:12.195Z] updating [=======================================> ] 286700/332294 9m35s
[vcs 2022-11-12T10:52:13.606Z] updating [=======================================> ] 286800/332294 9m35s
[vcs 2022-11-12T10:52:14.893Z] updating [=======================================> ] 286900/332294 9m35s
[vcs 2022-11-12T10:52:16.298Z] updating [=======================================> ] 287000/332294 9m34s
[vcs 2022-11-12T10:52:17.920Z] updating [=======================================> ] 287100/332294 9m34s
[taskcluster:error] Aborting task...
[taskcluster 2022-11-12T10:52:18.399Z] SUCCESS: The process with PID 8432 (child process of PID 3532) has been terminated.
[taskcluster 2022-11-12T10:52:18.399Z] SUCCESS: The process with PID 256 (child process of PID 6584) has been terminated.
[taskcluster 2022-11-12T10:52:18.399Z] SUCCESS: The process with PID 3532 (child process of PID 6584) has been terminated.
[taskcluster 2022-11-12T10:52:18.399Z] SUCCESS: The process with PID 6584 (child process of PID 6592) has been terminated.
[taskcluster 2022-11-12T10:52:18.399Z]
[taskcluster 2022-11-12T10:52:18.399Z] === Task Finished ===
[taskcluster 2022-11-12T10:52:18.399Z] Task Duration: 1h0m0.163866s
[taskcluster 2022-11-12T10:52:18.399Z] [mounts] Preserving cache: Moving "Z:\\task_166823680117229\\build" to "Z:\\caches\\ILgLDzM5RhebSfLPMr0BNw"
[taskcluster 2022-11-12T10:52:18.879Z] [mounts] Denying task_166823680117229 access to 'Z:\caches\ILgLDzM5RhebSfLPMr0BNw'
[taskcluster 2022-11-12T10:53:13.089Z] Uploading link artifact public/logs/live.log to artifact public/logs/live_backing.log with expiry 2023-02-10T09:51:01.941Z
[taskcluster:error] Task aborted - max run time exceeded
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 164•2 years ago
|
||
There have been 42 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64
[vcs 2023-01-04T22:42:12.157Z] updating [======================================> ] 285900/335224 11m57s
[vcs 2023-01-04T22:42:13.645Z] updating [======================================> ] 286000/335224 11m53s
[vcs 2023-01-04T22:42:14.868Z] updating [======================================> ] 286100/335224 11m52s
[vcs 2023-01-04T22:42:16.181Z] updating [======================================> ] 286200/335224 11m48s
[vcs 2023-01-04T22:42:17.714Z] updating [======================================> ] 286300/335224 11m45s
[vcs 2023-01-04T22:42:19.219Z] updating [======================================> ] 286400/335224 11m45s
[vcs 2023-01-04T22:42:20.572Z] updating [======================================> ] 286500/335224 11m44s
[vcs 2023-01-04T22:42:21.867Z] updating [======================================> ] 286600/335224 11m42s
[vcs 2023-01-04T22:42:23.159Z] updating [======================================> ] 286700/335224 11m38s
[vcs 2023-01-04T22:42:24.617Z] updating [======================================> ] 286800/335224 11m35s
[taskcluster:error] Aborting task...
[taskcluster 2023-01-04T22:42:25.937Z] SUCCESS: The process with PID 7712 (child process of PID 8700) has been terminated.
[taskcluster 2023-01-04T22:42:25.937Z] SUCCESS: The process with PID 8176 (child process of PID 1576) has been terminated.
[taskcluster 2023-01-04T22:42:25.937Z] SUCCESS: The process with PID 8700 (child process of PID 1576) has been terminated.
[taskcluster 2023-01-04T22:42:25.937Z] SUCCESS: The process with PID 1576 (child process of PID 4408) has been terminated.
[taskcluster 2023-01-04T22:42:25.937Z]
[taskcluster 2023-01-04T22:42:25.937Z] === Task Finished ===
[taskcluster 2023-01-04T22:42:25.938Z] Task Duration: 1h0m0.2030385s
[taskcluster 2023-01-04T22:42:25.938Z] [mounts] Preserving cache: Moving "Z:\\task_167286411738299\\build" to "Z:\\caches\\XDYx5yC0TfuQ7T4VYwAVyQ"
[taskcluster 2023-01-04T22:42:26.465Z] [mounts] Denying task_167286411738299 access to 'Z:\caches\XDYx5yC0TfuQ7T4VYwAVyQ'
[taskcluster 2023-01-04T22:43:21.201Z] Uploading link artifact public/logs/live.log to artifact public/logs/live_backing.log with expiry 2024-01-04T21:42:06.103Z
[taskcluster:error] Task aborted - max run time exceeded
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 178•2 years ago
|
||
In the last 14+ hours, several machines (both northcentralus and southcentralus datacenters) which ran Windows source tasks (linting, python tests) failed permanently during cloning:
vm-pa6xdm9qt6zw8jej6opwut9xgsogs460654
vm-ec7vjb9hryy1gthn3tu5vwbwdkqc6cqhi6h
vm-wq105tgwrygycmzzob0uyqc4lybzioqluiw
vm-fgobhmnhr2sstpqjkjm3lgiwtef8imtkmwo
vm-m7y3n7oftsk4gkbjax38qsefrgbsxtwmxuq
jcristau mentioned bug 1766735 as change to the workers, no idea how this could be related. Mark, could you investigate? This query will list machines which failed their last 3+ tasks.
Assignee | ||
Comment 179•2 years ago
|
||
Updated•2 years ago
|
Assignee | ||
Comment 180•2 years ago
|
||
I am not sure why increasing the disk would cause this behavior. I have a phab patch in to increase the disk size and to use a premium SSD. Hopefully this will improve the performance of the clone.
Fortunately we have improvements coming with the future win11-2009 and win10-2009 source pools.
Comment 181•2 years ago
|
||
Comment 182•2 years ago
|
||
Tentatively calling this fixed.
Comment hidden (Intermittent Failures Robot) |
Description
•