Closed Bug 1408233 Opened 7 years ago Closed 6 years ago

Permissions/ACLs on the tooltool cache directory are causing problems

Categories

(Taskcluster :: Workers, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Assigned: grenade)

Details

I filed bug 1408224 to hide an error that has been happening for a while, in that we're failing to remove files from the tooltool cache directory for some reason:

  WindowsError: [Error 5] Access is denied: u'c:/builds/tooltool_cache\\b726645f9d26c5a3048720b3839166021c1cf91a02d2ff2f10c49adced7455c7352e18b5052084d80bf9d1c40ec1bf72d0397921b8cd23262f89fdbd10def58f'

In turn, that shows up in a confusing way in the log viewer and the failure classification on treeherder, leading to miscategorization of actual and different errors in bug 1391811 (see comment 12 there), or people being confused on try (e.g. bug 1408212).

Bug 1408224 merely hides the problem, but doesn't solve it, which this bug should try to do. Not fixing the problem will eventually lead to the tooltool cache filling up and possibly saturating disk space.

Pete, I don't know if you remember, but we discussed this a couple months ago on #taskcluster, and I never was able to find the root cause. Could you look into it?
Flags: needinfo?(pmoore)
My expertise is in the workings of generic-worker, :grenade is the expert in the environment setup.
Flags: needinfo?(pmoore) → needinfo?(rthijssen)
In the longer term, we perhaps should circle back on the idea of defining these caches in the task apyaload similar to what we do for linux.  This way the worker will be managing the caches between tasks (including permissions) and purge cache should work as well.
- tooltool cache on tc win instances is on the y: drive at y:\tooltool-cache.
- environment variable TOOLTOOL_CACHE is set to "y:\tooltool-cache".
- acls are set for directory y:\tooltool-cache.

some aspect of some builds must not be using the TOOLTOOL_CACHE environment variable since errors seem to be related to directory c:/builds/tooltool_cache which does not have any acls set by occ.

it would be good if we could fix whatever code is trying to use the c: drive to store tooltool cache data to instead respect the environment variable since we don't really want tasks modifying arbitrary locations on c: if we can avoid it.
Flags: needinfo?(rthijssen)
Nothing in the tree is setting the tooltool cache directory to y:, and mozharness probably doesn't care about the TOOLTOOL_CACHE environment variable, and sets its own value of c:/builds/tooltool_cache, from those not so ancient times where everything was running on buildbot.
i double checked the occ manifests and it looks like we have several problems

in occ:
- testers are correctly set to y:\tooltool-cache
- builders are incorrectly set to c:\builds\tooltool-cache

and in-tree taskcluster mozharness configs are incorrectly hard coded with:
'TOOLTOOL_CACHE': os.environ.get('c:/builds/tooltool_cache'),

which should be something like:
'TOOLTOOL_CACHE': os.environ.get('TOOLTOOL_CACHE', 'y:/tooltool-cache'),

i have temporarily updated occ builders to match what's in-tree:
https://github.com/mozilla-releng/OpenCloudConfig/commit/6cce00053d1e0ec52de889d56c10ce7e0b43fd73

i'll follow up with a patch to m-c to use:
'TOOLTOOL_CACHE': os.environ.get('TOOLTOOL_CACHE', 'y:/tooltool-cache'),

and then move occ builders to y: as well.
Priority: -- → P3
Assignee: nobody → rthijssen
This is perhaps related to Bug 1350956
since the infra changes are working, i'm going to leave well enough alone on this one.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.