1627889 - Intermittent Raptor, Talos [taskcluster:error] Aborting task... [hang while fetching artifacts]

Reporter

Description

•

6 years ago

We've seen a sudden increase of raptor and talos failures on a push whose changes we're unrelated:
https://treeherder.mozilla.org/#/jobs?repo=autoland&collapsedPushes=676749&group_state=expanded&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Crunnable&searchStr=performance&tochange=828ed052b1a9ee14fe39f7235ece9398812c7f4f&fromchange=542f439cf528e5c1bc6068ebc9ff5bfec05bd9f9

Retriggers are all green, so it shows that most probably it was a hiccup.
Made a bug to track this specific issue, I'm not sure if the summary correctly describes the failures, so please change it if necessary.

Failure logs:

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 1

•

6 years ago

This happens for various artifacts like minidump_stackwalk.tar.xz and others. I wonder if it is still somewhat related to bug 1616556, but also for Windows now. Mike, any idea?

Flags: needinfo?(mh+mozilla)

Summary: Talos and Raptor tests timing out while fetching artifacts → Intermittent Raptor, Talos [taskcluster:error] Aborting task... [hang while fetching artifacts]

Mike Hommey [:glandium]

Comment 2

•

6 years ago

It doesn't seem related, and it's not only happening when downloading artifacts. Seems like problems with Windows workers?

Flags: needinfo?(mh+mozilla)

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

6 years ago

:markco, was there reimaging of the workers recently that could affect this? I assume this is datacenter and not bitbar.

Flags: needinfo?(mcornmesser)

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

6 years ago

Priority: -- → P3

Mark Cornmesser [:markco]

Comment 4

•

6 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

:markco, was there reimaging of the workers recently that could affect this? I assume this is datacenter and not bitbar.

jmaher: There have been spot reimaging of nodes that have not been taking tasks, but no mass reimaging. It looks like these are failing on missing files under c:\users\task_* directories. Those directories are not persistent between reboots since a new task user is used per task.

pmoore: Any ideas?

Flags: needinfo?(mcornmesser) → needinfo?(pmoore)

Comment hidden (Intermittent Failures Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

6 years ago

Keywords: intermittent-failure

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 7

•

6 years ago

The last failure of this type was seen on April 7th, or maybe more recent occurrences have been classified against another bug.

Comment hidden (Intermittent Failures Robot)

Pete Moore [:pmoore][:pete] [PTO until 13 April 2026]

Comment 9

•

6 years ago

The logs in comment 0 contain "[taskcluster:error] Task aborted - max run time exceeded" so it looks like the task maxRunTime is less than the amount of time the tasks require. This could be due to the task hanging, or taking longer than it previously did, or the maxRunTime being less than it previously was.

Flags: needinfo?(pmoore)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 10

•

6 years ago

No, here one excerpt from the logs:

[fetches 2020-04-07T07:11:11.575Z] Removing C:\Users\task_1586238152\fetches\minidump_stackwalk.tar.xz
[taskcluster:error] Aborting task...
[taskcluster 2020-04-07T07:36:05.239Z] SUCCESS: The process with PID 356 (child process of PID 7416) has been terminated.

There is no timestamp for Aborting task..., which would be great to have btw, but as it shows it hangs in removing the minidump_stackwalk.tar.xz.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

6 years ago

Severity: normal → S3

Comment hidden (Intermittent Failures Robot)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → INCOMPLETE

Bugzilla

Intermittent Raptor, Talos [taskcluster:error] Aborting task... [hang while fetching artifacts]

Categories

(Testing :: Talos, defect, P3)

Tracking

(Not tracked)

People

(Reporter: malexandru, Unassigned)

References

Details

(Keywords: intermittent-failure)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Comment 14

Updated