Closed Bug 1468084 Opened 6 years ago Closed 6 years ago

builds/scriptworker returned 1 instead of one of [0]

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: apop, Assigned: tomprince)

References

Details

Today, while daily monitoring, I've received some mails from puppet with the following problem :

Sun Jun 10 08:29:03 -0700 2018 Puppet (err): /tools/python3/bin/python -BE /tools/misc-python3/virtualenv.py                     --python=/tools/python3/bin/python --distribute --never-download /builds/scriptworker returned 1 instead of one of [0]
Sun Jun 10 08:29:03 -0700 2018 /Stage[main]/Bouncer_scriptworker/Python3::Virtualenv[/builds/scriptworker]/Exec[virtualenv /builds/scriptworker]/returns (err): change from notrun to 0 failed: /tools/python3/bin/python -BE /tools/misc-python3/virtualenv.py                     --python=/tools/python3/bin/python --distribute --never-download /builds/scriptworker returned 1 instead of one of [0]

Can you please check or point me to someone who could help resolving this ?
Puppet is a relops tool, and Aki knows about scriptworker.
Assignee: nobody → relops
Component: Worker → RelOps: Puppet
Product: Taskcluster → Infrastructure & Operations
QA Contact: pmoore → mcornmesser
Looks like this is for tb-bouncer. Tom, do you know what this is about? Want a hand?
Flags: needinfo?(mozilla)
It looks like, on tb-bouncer, puppet thinks `/tools` should have mode 0700 which means that cltbld can't access it. I've not been able to track down *why*. It might be related to https://github.com/mozilla/build-puppet/blob/af266054f23b38df26aa5c7f965ebcccbc9b5415/modules/bouncer_scriptworker/manifests/init.pp#L67-L73 but that doesn't explain why it is only the tb-* one that is hitting that.
Hm, I wonder if we want that block at all.
We're continuing to get emailed about this multiple times per hour -- any timeframe in mind? We can remove that block, explicitly set perms, make sure the user is cltbld instead of root, chmod it manually and see if it sticks, or other fixes.
I've pinned these workers back to my envionment for the moment, which has a fix, but I'm not sure that it one we should land.
I've got tb-bouncer and tb-bouncer-dev pinned to my environment. The only patch there is one that hard-codes /tools and and the misc python dir to have mode 0755. As soon as I drop that change from my environment, puppet tries to switch /tools to 0700.

I think this is because the resource defaults that bouncer sets, and some aspect of dynamic scoping of resource defaults. But I am baffled by why this only affects the tb-bouncer workers, since it looks like their configuration is identical to the firefox ones. So I am reluctant to just get rid of the default (particularly since there appear to be many scriptworkers using them.
Flags: needinfo?(mozilla)
Shipit scriptworker doesn't have a tools clone. I'm betting fx bouncer scriptworker had someone manually clone tools or manually fix its perms.
Johan, did you write this? Did you intend for tools to have a 0700 perm?
Flags: needinfo?(jlorenzo)
It isn't a clone of build-tools, it is the toplevel directory where python gets installed.
(In reply to Tom Prince [:tomprince] from comment #12)
> It isn't a clone of build-tools, it is the toplevel directory where python
> gets installed.

Ah, /tools . I think the block is doing more than it should in both locations, and we should probably explicitly list the files we want to have 0700.
(In reply to Aki Sasaki [:aki] from comment #11)
> Johan, did you write this? Did you intend for tools to have a 0700 perm?

I confirm I wrote this. I did not intend for tools to have this set of permissions. I originally copied this file from what we have in other types of workers, like pushapk (that is to say, without tools). In there, my original intent was to make 0600 the default for files defined below (like script_config.json). I see this section[1] is actually redundant with this one[2]. It was changed in [3], 2 days before this bug got filed. I don't think [3] is the root cause, though. As far as I know "File" only applies to "file" entries defined within the same package.

Moreover, the python virtual env is defined to be 0700 at [4]. /tools is defined there[5]. 

There are a couple of things I don't understand:
a. Why doesn't this affect other types of scriptworker instance? Like Tom said, other scriptworker instances are configured the same way. For instance pushapk has had the same config for more than 22 months[6]
b. What does the command in comment 0 try to do? Do they try to create the virtualenv or do they try to install packages in this venv? Do you guys have fuller logs about the 755 error? 



[1] https://github.com/mozilla-releng/build-puppet/blob/af266054f23b38df26aa5c7f965ebcccbc9b5415/modules/bouncer_scriptworker/manifests/init.pp#L67-L73
[2] https://github.com/mozilla-releng/build-puppet/blob/af266054f23b38df26aa5c7f965ebcccbc9b5415/modules/bouncer_scriptworker/manifests/init.pp#L79-L83
[3] https://github.com/mozilla-releng/build-puppet/pull/48
[4] https://github.com/mozilla-releng/build-puppet/blob/af266054f23b38df26aa5c7f965ebcccbc9b5415/modules/bouncer_scriptworker/manifests/init.pp#L35
[5] https://github.com/mozilla-releng/build-puppet/blob/master/modules/dirs/manifests/tools.pp#L15
[6] https://github.com/mozilla-releng/build-puppet/blame/2832467b9bc9c37d21f21b832587bafec873e1e4/modules/pushapk_scriptworker/manifests/init.pp#L84
Flags: needinfo?(jlorenzo)
I'm guessing that the scriptworker instances that were pre-existing had their /tools created long enough ago that this isn't an issue. If the block is redundant and causing problems, let's get rid of it. If people want to spend more time debugging why, I'm ok with that, but let's stop the bleeding in prod and then debug why in a dev env.
Assignee: relops → mozilla
Blocks: 1421062
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.