Closed Bug 1376191 Opened 7 years ago Closed 7 years ago

Can't create directory /home/worker/tooltool-cache!

Categories

(Testing :: Firefox UI Tests, defect)

All
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1385629

People

(Reporter: MatsPalmgren_bugz, Assigned: wcosta)

Details

Attachments

(1 file)

Can't create directory /home/worker/tooltool-cache! [log…]
Traceback (most recent call last): [log…]
OSError: [Errno 45] Operation not supported: '/home/worker' [log…] 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2d406007ed0990eff4ec664f713355be788bebf&selectedJob=109797673
This is a bug in how Firefox UI Tests are invoked.

Taskcluster tasks should use `run-task --chown-recursive <path>` (or equivalent) to ensure that all directories are writable by the task. If they fail to do this, they could get a permissions failure due to state left over from a previous, unrelated tasks. This includes uid/gid mismatch between Docker images causing files to be owned by different users, even if the username is identical.
Component: Build Config → Firefox UI Tests
Product: Core → Testing
QA Contact: hskupin
There is nothing special we can do for the taskcluster configuration. We just do it like any other test job running via mozharness:

https://dxr.mozilla.org/mozilla-central/rev/20f32734df750bddada9d1edca665c2ea53946f0/taskcluster/ci/test/tests.yml#118-160

Looks like this is only happening for OS X, but not Linux. Wander, do you have any idea what this could be?
Flags: needinfo?(wcosta)
(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #2)
> There is nothing special we can do for the taskcluster configuration. We
> just do it like any other test job running via mozharness:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> 20f32734df750bddada9d1edca665c2ea53946f0/taskcluster/ci/test/tests.yml#118-
> 160
> 
> Looks like this is only happening for OS X, but not Linux. Wander, do you
> have any idea what this could be?

Feels like tasks expired and not available, any other try push I can see?
Flags: needinfo?(wcosta)
OS X doesn't use Docker. But depending on how caches are reused between tasks, you'll need to recursively set permissions on any caches to ensure permissions are sane.

In Docker/Linux, the task starts as uid/gid 0, so any file permission adjustment can be made before permissions are dropped to a non-privileged user/group. How this works in OS X, I have no clue. If there is a uid/gid mismatch between tasks and you don't have privileges to adjust permissions to the executing uid/gid from within the task, then this is a bug on the TaskCluster platform and the OS X worker. If all OS X tasks execute as the same uid/gid and it is impossible to switch uid/gid between tasks, then it should be possible to adjust filesystem permissions for caches.
(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #4)
> It's basically for each and every Firefox ui test we run in TC:
> 
> https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-
> searchStr=firefox%20ui%20os%20x&bugfiler&fromchange=914fc90581c481dc80b97da80
> e22b42c69f1068d&filter-resultStatus=testfailed&filter-
> resultStatus=busted&filter-resultStatus=exception&filter-
> resultStatus=retry&filter-resultStatus=usercancel&filter-
> resultStatus=runnable&filter-resultStatus=success&selectedJob=113396716
> 
> So grab one of those recent ones which should help to get more details.

Huh? I think I am missing something, I see no such error from TH link provided, :/
Flags: needinfo?(wcosta)
Assignee: nobody → wcosta
Status: NEW → ASSIGNED
Hurray! That's great to hear. And yes, this remaining issue is unrelated and covered by bug 1374268.
Comment on attachment 8885729 [details]
Bug 1376191: Use system tooltool cache dir for mac.

https://reviewboard.mozilla.org/r/156534/#review161962

So as I just noticed, I picked this config from the Marionette one in mozharness. The latter seems to be also affected now when run via TC on OS X but wasn't when run via buildbot:

https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mn%20os%20x&bugfiler&selectedJob=113919257
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&filter-searchStr=mn%20os%20x&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=92359201

So I think this is a more global issue with the new TC workers for OS X. Maybe we should simply remove this line from the config? Does mozharness have a fallback or don't use the tooltool cache then? I see that at least some other test jobs don't make use of this config setting.
Attachment #8885729 - Flags: review?(hskupin) → review-
(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #10)
> Comment on attachment 8885729 [details]
> Bug 1376191: Use system tooltool cache dir for mac.
> 
> https://reviewboard.mozilla.org/r/156534/#review161962
> 
> So as I just noticed, I picked this config from the Marionette one in
> mozharness. The latter seems to be also affected now when run via TC on OS X
> but wasn't when run via buildbot:
> 
> https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-
> searchStr=mn%20os%20x&bugfiler&selectedJob=113919257
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&filter-
> searchStr=mn%20os%20x&filter-tier=1&filter-tier=2&filter-
> tier=3&selectedJob=92359201
> 
> So I think this is a more global issue with the new TC workers for OS X.
> Maybe we should simply remove this line from the config? Does mozharness
> have a fallback or don't use the tooltool cache then? I see that at least
> some other test jobs don't make use of this config setting.

Now sure if I am following you here, I looked at TC jobs logs from link above and couldn't spot any mkdir problem. Are you suggesting to remove tooltool_cache config because it does work for Marionette tests?
Flags: needinfo?(hskupin)
(In reply to Wander Lairson Costa [:wcosta] from comment #11)
> Now sure if I am following you here, I looked at TC jobs logs from link
> above and couldn't spot any mkdir problem. Are you suggesting to remove

The first link which points to autoland always shows this failure in any of the Mn jobs. Just click the Failure summary panel for a quick scan. The second is for aurora where we still use buildbot.

> tooltool_cache config because it does work for Marionette tests?

No, it doesn't work for Marionette and is a regression with porting the OS X machines from BuildBot to TC.
Flags: needinfo?(hskupin)
(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #12)
> (In reply to Wander Lairson Costa [:wcosta] from comment #11)
> > Now sure if I am following you here, I looked at TC jobs logs from link
> > above and couldn't spot any mkdir problem. Are you suggesting to remove
> 
> The first link which points to autoland always shows this failure in any of
> the Mn jobs. Just click the Failure summary panel for a quick scan. The
> second is for aurora where we still use buildbot.
> 

<facepalm meme here>

> > tooltool_cache config because it does work for Marionette tests?
> 
> No, it doesn't work for Marionette and is a regression with porting the OS X
> machines from BuildBot to TC.

Marionette has the same issue [1], but I am in favor of changing the config rather than delete it, because:

1 - It will slightly make jobs run more slow, since tooltool artifacts always need to be downloaded
2 - Explicit is better than implicit :)

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/marionette/prod_config.py#40
So there are a couple of other test suites using something similar:
https://dxr.mozilla.org/mozilla-central/search?q=path%3Atesting%2Fmozharness+tooltool_cache&redirect=false

What seems to work fine is actually what AWSY is using:

> os.path.join(os.getcwd(), "tooltool_cache")

Maybe that is what should be used? Maybe even other test suites are affected here, and we should go through this list?
(In reply to Henrik Skupin (:whimboo) from comment #14)
> So there are a couple of other test suites using something similar:
> https://dxr.mozilla.org/mozilla-central/
> search?q=path%3Atesting%2Fmozharness+tooltool_cache&redirect=false
> 

The only tests affect are those running on a physical machine, which are all Macosx tests and talos.

> What seems to work fine is actually what AWSY is using:
> 
> > os.path.join(os.getcwd(), "tooltool_cache")
> 
> Maybe that is what should be used? Maybe even other test suites are affected
> here, and we should go through this list?

It wouldn't work, as docker tests don't start mozharness in home dir [1]

[1] https://dxr.mozilla.org/mozilla-central/source/taskcluster/scripts/tester/test-linux.sh#35
That's my latest fix [1]. I did some investigation try to find other cases, but seems that these are the only cases.

[1] https://dxr.mozilla.org/mozilla-central/source/taskcluster/scripts/tester/test-linux.sh#35
Sorry for the late reply but I was sick and then on PTO.

(In reply to Wander Lairson Costa [:wcosta] from comment #15)
> > > os.path.join(os.getcwd(), "tooltool_cache")
> > 
> > Maybe that is what should be used? Maybe even other test suites are affected
> > here, and we should go through this list?
> 
> It wouldn't work, as docker tests don't start mozharness in home dir [1]

`os.getcwd()` doens't retrieve the home dir but the current working dir. As the script is doing a `cd $WORKSPACE` it should perfectly work.
Flags: needinfo?(wcosta)
(In reply to Henrik Skupin (:whimboo) from comment #17)
> Sorry for the late reply but I was sick and then on PTO.
> 
> (In reply to Wander Lairson Costa [:wcosta] from comment #15)
> > > > os.path.join(os.getcwd(), "tooltool_cache")
> > > 
> > > Maybe that is what should be used? Maybe even other test suites are affected
> > > here, and we should go through this list?
> > 
> > It wouldn't work, as docker tests don't start mozharness in home dir [1]
> 
> `os.getcwd()` doens't retrieve the home dir but the current working dir. As
> the script is doing a `cd $WORKSPACE` it should perfectly work.

Yes, then the path would be /home/worker/workspace/tooltool-cache instead of /home/worker/tooltool-cache. Moreover, for TC worker running inside real machines, os.getcwd() != /builds/tooltool-cache.
Flags: needinfo?(wcosta)
As decided we will care about it on bug 1385629.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: