Intermittent [worker:error] distutils.errors.DistutilsFileError: cannot copy tree '/builds/worker/artifacts': not a directory

RESOLVED FIXED in Firefox -esr60

Status

P5
normal
RESOLVED FIXED
4 months ago
a month ago

People

(Reporter: intermittent-bug-filer, Assigned: dragrom)

Tracking

({intermittent-failure})

Version 3
mozilla65
intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox-esr60 fixed, firefox64 fixed, firefox65 fixed)

Details

(Whiteboard: [stockwell disable-recommended])

Attachments

(1 attachment)

Comment 1

4 months ago
possibly due to Bug 1474570
See Also: → bug 1474570
(In reply to Bob Clary [:bc:] from comment #1)
> possibly due to Bug 1474570

Yeah, this certainly is the cause. Sorry that I didn't catch this in review.

This broken task runs on taskcluster-worker ("provisionerId": "proj-autophone", "workerType": "gecko-t-ap-unit-p2"), so it looks like the taskcluster-worker implementation has been broken during the migration from taskcluster-worker to generic-worker for linux talos tasks.

I suspect this will be a relatively simple fix that we can roll out quickly.

The points of interest are:

In the logs, I see:

+ : WORKING_DIR /builds/worker/workspace
+ : WORKSPACE /builds/worker/workspace

From task definition https://queue.taskcluster.net/v1/task/Ag2De52ITG6fa3SEaS2PBQ I see "WORKSPACE" is set to "/builds/worker/workspace" and WORKING_DIR isn't set, so it will default to the current directory. It looks like taskcluster-worker runs processes from /builds/worker/workspace directory (but I'll have to check the taskcluster-worker implementation to see if it uses "WORKSPACE" env var or if it chooses this path some other way (such as hardcoded to ~/workspace).

I suspect the solution will be to pass in both WORKING_DIR _instead_ of WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the updated test-linux.sh script.
Note, longer term, the preferred fix is to migrate to generic-worker from taskcluster-worker (bug 1488392) - I believe project-autophone tasks are the last remaining tasks that run on taskcluster-worker.

Comment 4

4 months ago
That is high on my list of todos and getting higher every minute. ;-)
(In reply to Bob Clary [:bc:] from comment #4)
> That is high on my list of todos and getting higher every minute. ;-)

Haha, no worries! :-)

Typo in comment 2:

> I suspect the solution will be to pass in both WORKING_DIR _instead_ of
> WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the
> updated test-linux.sh script.

should have been:

> I suspect the solution will be to pass in WORKING_DIR _instead_ of
> WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the
> updated test-linux.sh script.
> I suspect the solution will be to pass in WORKING_DIR _instead_ of
> WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the
> updated test-linux.sh script.

I've created https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis. It is a copy of https://queue.taskcluster.net/v1/task/Ag2De52ITG6fa3SEaS2PBQ but with the env vars changed; I removed WORKSPACE and set WORKING_DIR to /builds/worker.

Let's see how it goes!

Comment 7

4 months ago
We might need to update the bitbar docker container to handle WORKING_DIR. If WORKSPACE is not specified, it will set it to /builds/worker/workspace and pass WORKSPACE to the taskcluster-worker's environment but it won't know about WORKING_DIR and won't pass it at all. I have to run out to an appointment this morning and will be gone for 2-3 hours. I'll check back when I return.
(In reply to Pete Moore [:pmoore][:pete] from comment #6)

> I've created
> https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/
> CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis.

This task is still pending after 20 minutes - does your tool to spawn new workers fetch the pending count from here?

  https://queue.taskcluster.net/v1/pending/proj-autophone/gecko-t-ap-unit-p2

I had a vague memory that maybe it queries treeherder for pending tasks, but this task won't appear on treeherder, so it might be better to fetch the pending count directly from taskcluster.

Many thanks!
(In reply to Bob Clary [:bc:] from comment #7)
> We might need to update the bitbar docker container to handle WORKING_DIR.
> If WORKSPACE is not specified, it will set it to /builds/worker/workspace
> and pass WORKSPACE to the taskcluster-worker's environment but it won't know
> about WORKING_DIR and won't pass it at all. I have to run out to an
> appointment this morning and will be gone for 2-3 hours. I'll check back
> when I return.

Ah ok - many thanks. In that case we could set both explicitly in the task definition:

"WORKING_DIR": "/builds/worker",
"WORKSPACE": "/builds/worker/workspace",

the test-linux.sh script won't overwrite them if they are already set.
(Assignee)

Comment 10

4 months ago
Created attachment 9019384 [details] [diff] [review]
fix_bitbar_tests.patch
Attachment #9019384 - Flags: review?(pmoore)
(Assignee)

Updated

4 months ago
Assignee: nobody → dcrisan
Status: NEW → ASSIGNED

Comment 12

4 months ago
(In reply to Pete Moore [:pmoore][:pete] from comment #8)
> (In reply to Pete Moore [:pmoore][:pete] from comment #6)
> 
> > I've created
> > https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/
> > CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis.
> 

That finally ran. Unfortunately most hit bug 1499246 but at least one hit this error.

> This task is still pending after 20 minutes - does your tool to spawn new
> workers fetch the pending count from here?
> 
>   https://queue.taskcluster.net/v1/pending/proj-autophone/gecko-t-ap-unit-p2
> 

No.

> I had a vague memory that maybe it queries treeherder for pending tasks, but
> this task won't appear on treeherder, so it might be better to fetch the
> pending count directly from taskcluster.
> 
> Many thanks!

It does use treeherder at the moment. I'll look into changing it to use the pending queue. Filed Bug 1501350. Thanks.

(In reply to Dragos Crisan [:dragrom] from comment #11)
> Test patch on try:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=aa3c248be83067e34d957928a85182a9a399a992

Unfortunately that didn't exercise the android-hw. This will work:

./mach try fuzzy --full --query "android-hw mda"

But if you like, I can submit your patch and check it out. Let me know.
(Assignee)

Comment 13

4 months ago
(In reply to Bob Clary [:bc:] from comment #12)
> (In reply to Pete Moore [:pmoore][:pete] from comment #8)
> > (In reply to Pete Moore [:pmoore][:pete] from comment #6)
> > 
> > > I've created
> > > https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/
> > > CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis.
> > 
> 
> That finally ran. Unfortunately most hit bug 1499246 but at least one hit
> this error.
> 
> > This task is still pending after 20 minutes - does your tool to spawn new
> > workers fetch the pending count from here?
> > 
> >   https://queue.taskcluster.net/v1/pending/proj-autophone/gecko-t-ap-unit-p2
> > 
> 
> No.
> 
> > I had a vague memory that maybe it queries treeherder for pending tasks, but
> > this task won't appear on treeherder, so it might be better to fetch the
> > pending count directly from taskcluster.
> > 
> > Many thanks!
> 
> It does use treeherder at the moment. I'll look into changing it to use the
> pending queue. Filed Bug 1501350. Thanks.
> 
> (In reply to Dragos Crisan [:dragrom] from comment #11)
> > Test patch on try:
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=aa3c248be83067e34d957928a85182a9a399a992
> 
> Unfortunately that didn't exercise the android-hw. This will work:
> 
> ./mach try fuzzy --full --query "android-hw mda"
> 
> But if you like, I can submit your patch and check it out. Let me know.

Please submit my patch and let me know if it work.I also added the M tests from android 8 in https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa3c248be83067e34d957928a85182a9a399a992.

Updated

4 months ago
Blocks: 1501364

Comment 14

4 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&group_state=expanded&revision=043f66d558d7173087740eeddf8248c34259c0d6

I don't think this will help as the bitbar containers are unaware of WORKING_DIR, but we'll see.

Comment 15

4 months ago
dragrom: This did seem to help. The failures in my try push are not related to this error.
Comment hidden (Intermittent Failures Robot)
Comment on attachment 9019384 [details] [diff] [review]
fix_bitbar_tests.patch

Review of attachment 9019384 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, many thanks!
Attachment #9019384 - Flags: review?(pmoore) → review+

Comment 18

4 months ago
Pushed by pmoore@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/460f9791ba8a
Intermittent [worker:error] distutils.errors.DistutilsFileError: cannot copy tree '/builds/worker/artifacts': not a directory, r=pmoore
(Assignee)

Updated

4 months ago
Attachment #9019384 - Flags: checked-in+

Comment 19

4 months ago
We'll want this on beta as well now that bug 1474570 has merged there.

Comment 20

4 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/460f9791ba8a
Status: ASSIGNED → RESOLVED
Last Resolved: 4 months ago
status-firefox65: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Comment hidden (Intermittent Failures Robot)

Comment 22

4 months ago
Comment on attachment 9019384 [details] [diff] [review]
fix_bitbar_tests.patch

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1474570

User impact if declined: No android hardware testing on mozilla-beta

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Not risky as it is a simple change to add an environment variable to the test environment.

String changes made/needed:
Attachment #9019384 - Flags: approval-mozilla-beta?
Comment on attachment 9019384 [details] [diff] [review]
fix_bitbar_tests.patch

test-only changes don't need approval to land
Attachment #9019384 - Flags: approval-mozilla-beta?

Comment 24

4 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/b2d003653646
status-firefox64: --- → fixed
Comment hidden (Intermittent Failures Robot)

Comment 26

a month ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-esr60/rev/05f116b9b73a
status-firefox-esr60: --- → fixed
You need to log in before you can comment on or make changes to this bug.