Intermittent Raptor, Talos [taskcluster:error] Aborting task... [hang while fetching artifacts]
Categories
(Testing :: Talos, defect, P3)
Tracking
(Not tracked)
People
(Reporter: malexandru, Unassigned)
Details
(Keywords: intermittent-failure)
We've seen a sudden increase of raptor and talos failures on a push whose changes we're unrelated:
https://treeherder.mozilla.org/#/jobs?repo=autoland&collapsedPushes=676749&group_state=expanded&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Crunnable&searchStr=performance&tochange=828ed052b1a9ee14fe39f7235ece9398812c7f4f&fromchange=542f439cf528e5c1bc6068ebc9ff5bfec05bd9f9
Retriggers are all green, so it shows that most probably it was a hiccup.
Made a bug to track this specific issue, I'm not sure if the summary correctly describes the failures, so please change it if necessary.
Failure logs:
Comment 1•6 years ago
|
||
This happens for various artifacts like minidump_stackwalk.tar.xz and others. I wonder if it is still somewhat related to bug 1616556, but also for Windows now. Mike, any idea?
Comment 2•6 years ago
|
||
It doesn't seem related, and it's not only happening when downloading artifacts. Seems like problems with Windows workers?
Comment 3•6 years ago
|
||
:markco, was there reimaging of the workers recently that could affect this? I assume this is datacenter and not bitbar.
Updated•6 years ago
|
Comment 4•6 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)
:markco, was there reimaging of the workers recently that could affect this? I assume this is datacenter and not bitbar.
jmaher: There have been spot reimaging of nodes that have not been taking tasks, but no mass reimaging. It looks like these are failing on missing files under c:\users\task_* directories. Those directories are not persistent between reboots since a new task user is used per task.
pmoore: Any ideas?
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment 7•6 years ago
|
||
The last failure of this type was seen on April 7th, or maybe more recent occurrences have been classified against another bug.
| Comment hidden (Intermittent Failures Robot) |
The logs in comment 0 contain "[taskcluster:error] Task aborted - max run time exceeded" so it looks like the task maxRunTime is less than the amount of time the tasks require. This could be due to the task hanging, or taking longer than it previously did, or the maxRunTime being less than it previously was.
Comment 10•5 years ago
|
||
No, here one excerpt from the logs:
[fetches 2020-04-07T07:11:11.575Z] Removing C:\Users\task_1586238152\fetches\minidump_stackwalk.tar.xz
[taskcluster:error] Aborting task...
[taskcluster 2020-04-07T07:36:05.239Z] SUCCESS: The process with PID 356 (child process of PID 7416) has been terminated.
There is no timestamp for Aborting task..., which would be great to have btw, but as it shows it hangs in removing the minidump_stackwalk.tar.xz.
Updated•5 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Description
•