Closed
Bug 1501250
Opened 6 years ago
Closed 6 years ago
Intermittent [worker:error] distutils.errors.DistutilsFileError: cannot copy tree '/builds/worker/artifacts': not a directory
Categories
(Testing :: General, defect, P5)
Tracking
(firefox-esr60 fixed, firefox64 fixed, firefox65 fixed)
RESOLVED
FIXED
mozilla65
People
(Reporter: intermittent-bug-filer, Assigned: dragrom)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])
Attachments
(1 file)
1.05 KB,
patch
|
pmoore
:
review+
dragrom
:
checked-in+
|
Details | Diff | Splinter Review |
Filed by: bclary [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=207243128&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/Ag2De52ITG6fa3SEaS2PBQ/runs/0/artifacts/public/logs/live_backing.log
Comment 2•6 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #1) > possibly due to Bug 1474570 Yeah, this certainly is the cause. Sorry that I didn't catch this in review. This broken task runs on taskcluster-worker ("provisionerId": "proj-autophone", "workerType": "gecko-t-ap-unit-p2"), so it looks like the taskcluster-worker implementation has been broken during the migration from taskcluster-worker to generic-worker for linux talos tasks. I suspect this will be a relatively simple fix that we can roll out quickly. The points of interest are: In the logs, I see: + : WORKING_DIR /builds/worker/workspace + : WORKSPACE /builds/worker/workspace From task definition https://queue.taskcluster.net/v1/task/Ag2De52ITG6fa3SEaS2PBQ I see "WORKSPACE" is set to "/builds/worker/workspace" and WORKING_DIR isn't set, so it will default to the current directory. It looks like taskcluster-worker runs processes from /builds/worker/workspace directory (but I'll have to check the taskcluster-worker implementation to see if it uses "WORKSPACE" env var or if it chooses this path some other way (such as hardcoded to ~/workspace). I suspect the solution will be to pass in both WORKING_DIR _instead_ of WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the updated test-linux.sh script.
Comment 3•6 years ago
|
||
Note, longer term, the preferred fix is to migrate to generic-worker from taskcluster-worker (bug 1488392) - I believe project-autophone tasks are the last remaining tasks that run on taskcluster-worker.
Comment 4•6 years ago
|
||
That is high on my list of todos and getting higher every minute. ;-)
Comment 5•6 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #4) > That is high on my list of todos and getting higher every minute. ;-) Haha, no worries! :-) Typo in comment 2: > I suspect the solution will be to pass in both WORKING_DIR _instead_ of > WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the > updated test-linux.sh script. should have been: > I suspect the solution will be to pass in WORKING_DIR _instead_ of > WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the > updated test-linux.sh script.
Comment 6•6 years ago
|
||
> I suspect the solution will be to pass in WORKING_DIR _instead_ of > WORKSPACE, with `WORKING_DIR=/builds/worker`. That should work with the > updated test-linux.sh script. I've created https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis. It is a copy of https://queue.taskcluster.net/v1/task/Ag2De52ITG6fa3SEaS2PBQ but with the env vars changed; I removed WORKSPACE and set WORKING_DIR to /builds/worker. Let's see how it goes!
Comment 7•6 years ago
|
||
We might need to update the bitbar docker container to handle WORKING_DIR. If WORKSPACE is not specified, it will set it to /builds/worker/workspace and pass WORKSPACE to the taskcluster-worker's environment but it won't know about WORKING_DIR and won't pass it at all. I have to run out to an appointment this morning and will be gone for 2-3 hours. I'll check back when I return.
Comment 8•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #6) > I've created > https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/ > CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis. This task is still pending after 20 minutes - does your tool to spawn new workers fetch the pending count from here? https://queue.taskcluster.net/v1/pending/proj-autophone/gecko-t-ap-unit-p2 I had a vague memory that maybe it queries treeherder for pending tasks, but this task won't appear on treeherder, so it might be better to fetch the pending count directly from taskcluster. Many thanks!
Comment 9•6 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #7) > We might need to update the bitbar docker container to handle WORKING_DIR. > If WORKSPACE is not specified, it will set it to /builds/worker/workspace > and pass WORKSPACE to the taskcluster-worker's environment but it won't know > about WORKING_DIR and won't pass it at all. I have to run out to an > appointment this morning and will be gone for 2-3 hours. I'll check back > when I return. Ah ok - many thanks. In that case we could set both explicitly in the task definition: "WORKING_DIR": "/builds/worker", "WORKSPACE": "/builds/worker/workspace", the test-linux.sh script won't overwrite them if they are already set.
Assignee | ||
Comment 10•6 years ago
|
||
Attachment #9019384 -
Flags: review?(pmoore)
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → dcrisan
Status: NEW → ASSIGNED
Assignee | ||
Comment 11•6 years ago
|
||
Test patch on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa3c248be83067e34d957928a85182a9a399a992
Comment 12•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #8) > (In reply to Pete Moore [:pmoore][:pete] from comment #6) > > > I've created > > https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/ > > CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis. > That finally ran. Unfortunately most hit bug 1499246 but at least one hit this error. > This task is still pending after 20 minutes - does your tool to spawn new > workers fetch the pending count from here? > > https://queue.taskcluster.net/v1/pending/proj-autophone/gecko-t-ap-unit-p2 > No. > I had a vague memory that maybe it queries treeherder for pending tasks, but > this task won't appear on treeherder, so it might be better to fetch the > pending count directly from taskcluster. > > Many thanks! It does use treeherder at the moment. I'll look into changing it to use the pending queue. Filed Bug 1501350. Thanks. (In reply to Dragos Crisan [:dragrom] from comment #11) > Test patch on try: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=aa3c248be83067e34d957928a85182a9a399a992 Unfortunately that didn't exercise the android-hw. This will work: ./mach try fuzzy --full --query "android-hw mda" But if you like, I can submit your patch and check it out. Let me know.
Assignee | ||
Comment 13•6 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #12) > (In reply to Pete Moore [:pmoore][:pete] from comment #8) > > (In reply to Pete Moore [:pmoore][:pete] from comment #6) > > > > > I've created > > > https://tools.taskcluster.net/groups/CsEKkSVZSYKzwFoZ5POEIA/tasks/ > > > CsEKkSVZSYKzwFoZ5POEIA/details to test this hypothesis. > > > > That finally ran. Unfortunately most hit bug 1499246 but at least one hit > this error. > > > This task is still pending after 20 minutes - does your tool to spawn new > > workers fetch the pending count from here? > > > > https://queue.taskcluster.net/v1/pending/proj-autophone/gecko-t-ap-unit-p2 > > > > No. > > > I had a vague memory that maybe it queries treeherder for pending tasks, but > > this task won't appear on treeherder, so it might be better to fetch the > > pending count directly from taskcluster. > > > > Many thanks! > > It does use treeherder at the moment. I'll look into changing it to use the > pending queue. Filed Bug 1501350. Thanks. > > (In reply to Dragos Crisan [:dragrom] from comment #11) > > Test patch on try: > > https://treeherder.mozilla.org/#/ > > jobs?repo=try&revision=aa3c248be83067e34d957928a85182a9a399a992 > > Unfortunately that didn't exercise the android-hw. This will work: > > ./mach try fuzzy --full --query "android-hw mda" > > But if you like, I can submit your patch and check it out. Let me know. Please submit my patch and let me know if it work.I also added the M tests from android 8 in https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa3c248be83067e34d957928a85182a9a399a992.
Comment 14•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&group_state=expanded&revision=043f66d558d7173087740eeddf8248c34259c0d6 I don't think this will help as the bitbar containers are unaware of WORKING_DIR, but we'll see.
Comment 15•6 years ago
|
||
dragrom: This did seem to help. The failures in my try push are not related to this error.
Comment hidden (Intermittent Failures Robot) |
Comment 17•6 years ago
|
||
Comment on attachment 9019384 [details] [diff] [review] fix_bitbar_tests.patch Review of attachment 9019384 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, many thanks!
Attachment #9019384 -
Flags: review?(pmoore) → review+
Comment 18•6 years ago
|
||
Pushed by pmoore@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/460f9791ba8a Intermittent [worker:error] distutils.errors.DistutilsFileError: cannot copy tree '/builds/worker/artifacts': not a directory, r=pmoore
Assignee | ||
Updated•6 years ago
|
Attachment #9019384 -
Flags: checked-in+
Comment 19•6 years ago
|
||
We'll want this on beta as well now that bug 1474570 has merged there.
Comment 20•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/460f9791ba8a
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox65:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Comment hidden (Intermittent Failures Robot) |
Comment 22•6 years ago
|
||
Comment on attachment 9019384 [details] [diff] [review] fix_bitbar_tests.patch [Beta/Release Uplift Approval Request] Feature/Bug causing the regression: Bug 1474570 User impact if declined: No android hardware testing on mozilla-beta Is this code covered by automated tests?: No Has the fix been verified in Nightly?: Yes Needs manual test from QE?: No If yes, steps to reproduce: List of other uplifts needed: None Risk to taking this patch: Low Why is the change risky/not risky? (and alternatives if risky): Not risky as it is a simple change to add an environment variable to the test environment. String changes made/needed:
Attachment #9019384 -
Flags: approval-mozilla-beta?
Comment 23•6 years ago
|
||
Comment on attachment 9019384 [details] [diff] [review] fix_bitbar_tests.patch test-only changes don't need approval to land
Attachment #9019384 -
Flags: approval-mozilla-beta?
Comment 24•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/b2d003653646
status-firefox64:
--- → fixed
Comment hidden (Intermittent Failures Robot) |
Comment 26•5 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr60/rev/05f116b9b73a
status-firefox-esr60:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•