Closed Bug 1356137 Opened 7 years ago Closed 7 years ago

clang and clang-tidy mac jobs intermittently don't have permissions to write to tooltool cache directory

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla55

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(1 file)

In bug 1355731, I'm replacing tooltool invocations with invocations of a new tool that wraps tooltool. Since many scripts were defining TOOLTOOL_CACHE, exporting it, but tooltool actually expects an explicit flag and doesn't take it from the environment, I figured I'd change that at the same time. Which in practice all I did was enable the tooltool cache.

The result is that clang and clang-tidy jobs failed with permission errors writing to the tooltool cache. Which would be fine(ish) if it wasn't intermittent, which doesn't make sense... https://treeherder.mozilla.org/#/jobs?repo=try&revision=e9b6a4e1834e2e588e78bf789c9990ef0646c9e4&filter-searchStr=clang
Blocks: 1356140
So my guess was that this was due to two different task running on the same host, one as "worker" and one as "root".  But that instance only *ever* ran your tasks:

        task_id         | state  |         created         |      owner      |    platform    
------------------------+--------+-------------------------+-----------------+----------------
 XwZHFbOcTIedIjTaVVhCSw | failed | 2017-04-13 06:34:28.876 | mh@glandium.org | toolchains opt
 NgdU-kyESsOJlq0v9AP30g | failed | 2017-04-13 06:23:10.376 | mh@glandium.org | toolchains opt
 NBQEzVtyQfq65heAl0QSrw | failed | 2017-04-13 06:39:02.901 | mh@glandium.org | toolchains opt
 I_ZjYFjnQ7-9l5sqzyndtw | failed | 2017-04-13 06:45:07.531 | mh@glandium.org | toolchains opt
 LFQ_DZBwRLagks6c_WP3nw | failed | 2017-04-13 06:45:07.526 | mh@glandium.org | toolchains opt
 BfiHWX13QFuTOeX9Q9m_jg | failed | 2017-04-13 06:45:11.626 | mh@glandium.org | toolchains opt
 WGWdGTctTgWl8SqlpCusbA | failed | 2017-04-13 06:45:07.581 | mh@glandium.org | toolchains opt
 XTddyovRR5qIMlOKWIVoiw | failed | 2017-04-13 06:45:11.189 | mh@glandium.org | toolchains opt
 P0BXFccmRsCuWqnE6pqpZw | failed | 2017-04-13 06:45:07.561 | mh@glandium.org | toolchains opt
 bLfHO9rlQnWOv02fxOzDCQ | failed | 2017-04-13 06:45:12.236 | mh@glandium.org | toolchains opt

I think the *success* cases occur when you run after a "regular" Gecko job, which has

taskcluster/taskgraph/transforms/job/mozharness.py
156         # Various caches/volumes are default owned by root:root.
157         '--chown-recursive', '/home/worker/workspace',
158         '--chown-recursive', '/home/worker/tooltool-cache',

I expect adding a similar thing to the toolchain builds will fix the issue.

The underlying issue here is pretty annoying, and is related to a docker bug/misfeature which is that it doesn't namespace uids, and 'worker' has a different uid for different linux distros.  We decided a long time ago to bury the hatchet and just use this run-task --chown-recursive fix where ncessary.
Comment on attachment 8858169 [details]
Bug 1356137 - Ensure TC workspace and tooltool cache have the right permissions.

https://reviewboard.mozilla.org/r/130114/#review132988
Attachment #8858169 - Flags: review?(dustin) → review+
Assignee: nobody → mh+mozilla
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/ed794b3612db
Ensure TC workspace and tooltool cache have the right permissions. r=dustin
https://hg.mozilla.org/mozilla-central/rev/ed794b3612db
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: