Closed Bug 1289822 Opened 8 years ago Closed 8 years ago

Deploy balrogworker staging instance

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: h.franciskang, Assigned: mtabara)

References

Details

Attachments

(2 files)

This bug tracks the deployment of balrogworker using the cloudops stack.
Depends on: 1288789
Depends on: 1246675
Blocks: 1277871
I talked to Ben about this since he is in the process of deploying a new service to cloudops. He suggested waiting to deploy until he goes through the process to work out any wrinkles in the process.  In the interim, we could get a code review on the repo in bug  	1277871
rather than waiting for cloudops state to be ready, we are going to deploy this along signingscript in puppet.

signingscript is also scriptworker based but has been deployed in releng infra under puppet. It would be great to deploy funsize-balrogworker (maybe balrogscript would be more appropriate name?) in the same manner so future maintenance and deployments are similar across scriptworker based scripts. Or at least until cloudops is ready so this doesn't block.

This will take three parts:

1) drop docker for env setup and instead revert to simply venv+setuptools so that it can be puppetized
   a. https://github.com/mozilla-releng/funsize-balrogworker/blob/master/Dockerfile

2) port funsize-balrogworker and tools' balrog submitter api from py2->py3
   * http://hg.mozilla.org/build/tools/file/a5d75df726be/lib/python/balrog/submitter
      * maybe we should have a lib/python3/balrog/submitter ?

3) puppetize funsize-balrogworker by deploying it on a staging node. similar to signingscript
   * signingscript: http://hg.mozilla.org/build/puppet/rev/4f4b5d2b76a0

we may not require py2->py3 conversion if (a) the puppet scriptworker based nodes have python2 installed and is a sufficient interpreter. (b) the work to port yields little reward and is much harder than getting puppet scriptworker nodes py2 compliant.

I'm probably missing some things but this is the gist.
Blocks: 1282187
Assignee: nobody → mtabara
Depends on: 1301613
See Also: → 1277871
Spent some time to accustom with the scriptworker deployment from [0] and also some basic puppet knowledge. Will follow-up with [1] and [2] tomorrow morning to get myself an EC2 box to incrementally test the balrogworker puppet patches.

[0]: http://hg.mozilla.org/build/puppet/rev/4f4b5d2b76a0
[1]: https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave
[2]: https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Set_up_a_user_environment#Pinning
Depends on: 1305391
Got myself a machine today. Initially I had some issues with missing certs in it and had to recreate. FQDN is dev-linux64-ec2-mtabara2.dev.releng.use1.mozilla.com 
Created my own environment in releng-puppet2.srv.releng.scl3.mozilla.com and called for changes from the test instances.

Am still not very confident of how the things are working but am trying to immitate Aki's patches there.
Incrementally started porting stuff from signing-scriptworker and the https://github.com/mozilla-releng/funsize-balrogworker/blob/master/Dockerfile.

All good now, hopefully I'll spin this up completely by today's tc migration mtg.
Both py27 and py35 happily coexist now. 
However, there are some packages missing from puppet pypi mirror for py27 

---
boto==2.41.0
cryptography==1.2.3
enum34==1.1.2
idna==2.0
ipaddress==1.0.16
requests==2.10.0
six==1.10.0

--

In intend to copy-them under releng-puppet2.srv.releng.scl3.mozilla.com and leave the sync to the job across the other masters.
Will double-check with buildduty folks just to make sure this is a fine approach.
Funsize-balrogworker, such as it is has following requirements: https://github.com/mozilla-releng/funsize-balrogworker/blob/master/fbs_requirements.txt

Clustered into groups, they are:

existing pacakges within puppet pypi:
mar==1.2
pyasn1==0.1.9

new packages under puppet pypi:
idna==2.0
ipaddress==1.0.16

upgraded packages in puppet pypi (as in, all references pointing to these py packages are pinned to some older existing version)
boto==2.41.0
cryptography==1.2.3
enum34==1.1.2

upgraded packages in puppet pypi potentially creating trouble:
requests==2.10.0

Existing puppet requests encompass:
[mtabara@releng-puppet2.srv.releng.scl3.mozilla.com packages]$ ls -l | grep requests
-rw-r--r-- 1 puppetsync puppetsync   523254 Oct  1  2012 requests-0.14.1.tar.gz
-rw-r--r-- 1 puppetsync puppetsync   336280 Dec 22  2012 requests-1.0.4.tar.gz
-rw-r--r-- 1 puppetsync puppetsync   348854 Aug  2  2013 requests-1.2.3.tar.gz
-rw-r--r-- 1 puppetsync puppetsync   412648 Mar  7  2014 requests-2.0.1.tar.gz
-rw-r--r-- 1 puppetsync puppetsync   438132 Apr 17  2015 requests-2.4.3.tar.gz
-rw-r--r-- 1 puppetsync puppetsync   450389 Apr 17  2015 requests-2.6.0.tar.gz
-rw-rw-r-- 1 puppetsync puppetsync   480803 Oct 30  2015 requests-2.8.1.tar.gz
-rw-rw-r-- 1 puppetsync puppetsync     6285 Jan  5  2016 requests-hawk-1.0.0.tar.gz

So I'm thinking that the following two *unpinned* references http://hg.mozilla.org/build/puppet/file/tip/modules/cruncher/manifests/reportor.pp#l32 and http://hg.mozilla.org/build/puppet/file/tip/modules/cruncher/manifests/slave_health.pp#l22  get 2.8.1 Updating to 2.10.0 would affect this as well since they will get the latter one.

Will touch base with buildduty folks to confirm what's to be done here.

Note to self: after successful deployment is completed, loop in the pypi packages and kill (potential) unused ones.
As per IRC conversation:

mtabara> Callek: re:  "make sure everything referencing them in puppet itself pins them" - care to f? on https://bugzilla.mozilla.org/show_bug.cgi?id=1289822#c7 when you have a spare minute? 
15:04:07 <@Callek> mtabara: simplest would be to pin the latest-before-your-patch version and comment "Pinning to avoid investigating if an update would break" or something like that
15:04:08 <@Callek> ;-)
15:08:54 <mtabara> just to be sure, you mean pinning last-before-my-patch (hence requests-2.8.1 should I count correctly) in those two files cruncher/manifests/reportor.pp#l32 and cruncher/manifests/reportor.pp#l32 and cruncher/manifests/slave_health.pp#l22, right?
15:12:47 <@Callek> yea

Before I get some review for pinning requests in those two files, I'll go ahead and copy-paste all the aforementioned python packages except requests and temporarily will make my balrogworker script point to 2.8.1.

I copy-paste into puppet all the above packages except requests-2.10.1, for now.
Ran into some troubles wiht libffi-dev, still debugging.


Note to self: don't forget to tweak back requests from 2.8.1 to 2.11.0 once the patch is up to review.
Solved the above libfii lib issues. Both py27 and py35 virtual envs are working great. Cloned the tools repo as well.
Currently working at the last step in providing the scriptworker configs.
Before I forget, few notes here:
1. I've successfully tested parts of the scripts - the only missing piece is the scriptworker functionality for which I still need to add secrets in order to make it connect to TC

2. I didn't add yet the nagios part. I'll save that for production

3. I still need to change/add/point to requests-2.10.0 once we're ready to deploy to production

4. For now it's working but in the near future I should debug and find a better path for "funsize-balrogworker" and "tools" repos. They currently live under /builds/scriptworker but they should have a dedicated folder path, somewhere in /tools/checkouts or something. I messed up the folder rights there so I was unable to make them live there. I may need to create another dedicated user for this, as Aki did for signingscript. It'll make it easier I suppose.

Theoretically, leftovers here:
* have aki/jlund have second look at this 
* feed in missing secrets https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Secrets
* update my puppet environment with these changes
* deploy the catalog to my working ec2-box
* create a dummy task and watch as the ec2-box balrogworker claims it, something similar to what I did under https://bugzilla.mozilla.org/show_bug.cgi?id=1301613#c3
Attachment #8795951 - Flags: feedback?(jlund)
Attachment #8795951 - Flags: feedback?(aki)
(In reply to Mihai Tabara [:mtabara] from comment #10)
> 1. I've successfully tested parts of the scripts - the only missing piece is
> the scriptworker functionality for which I still need to add secrets in
> order to make it connect to TC

I'm using hiera for all signing scriptworker secrets:
https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Secrets

> 2. I didn't add yet the nagios part. I'll save that for production

Ok.  There's bug 1295196 as an example.
We need another check for signing, the pending queue; we may need a check like that for balrog scriptworker as well.  We'll probably use Queue.pendingTasks for that: https://docs.taskcluster.net/reference/platform/queue/api-docs#pendingTasks

> 3. I still need to change/add/point to requests-2.10.0 once we're ready to
> deploy to production

ok.

> 4. For now it's working but in the near future I should debug and find a
> better path for "funsize-balrogworker" and "tools" repos.

I think funsize-balrogworker is a misnomer; this balrog scriptworker will be for both full and partials, yes?

> They currently
> live under /builds/scriptworker but they should have a dedicated folder
> path, somewhere in /tools/checkouts or something.

I think either works.

> I messed up the folder
> rights there so I was unable to make them live there. I may need to create
> another dedicated user for this, as Aki did for signingscript. It'll make it
> easier I suppose.

I don't think that's a requirement.  You should be able to nuke /tools and recreate via puppet.  Before you're done here, you should make sure the puppet patches work against a clean EC2 instance: either bring up a 2nd instance with a 2nd hostname+ip, or bring down your current instance and bring up a 2nd with the same hostname+ip.

> Theoretically, leftovers here:
> * have aki/jlund have second look at this
> * feed in missing secrets
> https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Secrets
> * update my puppet environment with these changes
> * deploy the catalog to my working ec2-box
> * create a dummy task and watch as the ec2-box balrogworker claims it,
> something similar to what I did under
> https://bugzilla.mozilla.org/show_bug.cgi?id=1301613#c3

+1.  We'll need a new client for the rolled out production instances that don't use dummy worker types.  I set up https://tools.taskcluster.net/auth/clients/#project%252freleng%252fscriptworker%252fsigning-linux for the signing scriptworkers.  We may want to have a clientid per instance, but for now they share.
Comment on attachment 8795951 [details] [diff] [review]
Puppet changes to add balrogworker to testing environment

Looks like you're on the right track!

>diff --git a/manifests/moco-config.pp b/manifests/moco-config.pp
>index f9d83ca..7c35713 100644
>--- a/manifests/moco-config.pp
>+++ b/manifests/moco-config.pp
>@@ -417,6 +417,25 @@ class config inherits config::base {
>     $signing_scriptworker_artifact_upload_timeout = 600
>     $signing_scriptworker_verbose_logging = false
>
>+    # TC balrog scriptworkers
>+    $balrog_scriptworker_provisioner_id = "test-dummy-provisioner"
>+    $balrog_scriptworker_worker_group = "test-dummy-workers"
>+    $balrog_scriptworker_worker_type = "dummy-worker-mtabara"

The above are great for testing, but they'll need to change before we're production ready.

>+node "dev-linux64-ec2-mtabara2.dev.releng.use1.mozilla.com" {
>+    # the pins must come *before* the toplevel include
>+    $aspects = [ 'low-security' ]
>+    $slave_trustlevel = 'try'

We'll want high trust levels and security for production.

>+    $balrog_scriptworker_py35venv = "/builds/py35venv"
I made /builds/scriptworker the py35 venv, so /builds/scriptworker/bin would be the venv bin.  It could shell out to /builds/balrog.  Or, you could have /builds/balrog/scriptworker and /builds/balrog/$py27dir, and /builds/balrog/tools, if you wanted to make them siblings.  Clones could go into /tools as you were brainstorming, if you prefer.  But whatever directory you put the secrets into, you probably want to chmod 700 at some point in the parent directory hierarchy, so doing that at /builds/scriptworker or /builds/balrog would make sense.

>+    "valid_artifact_path_regexes": ["^/v1/task/(?P<taskId>[^/]+)(/runs/\\d+)?/artifacts/(?P<filepath>.*)$"],
>+    "verify_chain_of_trust": false,
>+    "sign_chain_of_trust": false,

These will probably have to change with the next release of scriptworker, but we don't have to worry about that atm.

>+    "artifact_expiration_hours": <%= scope.lookupvar("config::balrog_scriptworker_artifact_expiration_hours") %>,
>+    "artifact_upload_timeout": <%= scope.lookupvar("config::balrog_scriptworker_artifact_upload_timeout") %>,
>+    "task_script": ["<%= scope.lookupvar("config::balrog_scriptworker_py27venv") %>/bin/python",
>+                    "<%= scope.lookupvar("config::balrog_scriptworker_git_balrogworker_path") %>/bin/balrogworker.py",
>+                    "--taskdef", "<%= scope.lookupvar("config::balrog_scriptworker_root") %>/work/task.json",
>+                    "--verbose"],

Awesome, looks like you got the py27venv stuff right.
Attachment #8795951 - Flags: feedback?(aki) → feedback+
Thanks Aki for detailed review!

So, I'll sum this up to aggregate all "note to self" pieces so far and to address all Aki's comments.

Firstly, two brief replies:

(In reply to Aki Sasaki [:aki] from comment #11)
> I think funsize-balrogworker is a misnomer; this balrog scriptworker will be
> for both full and partials, yes?

Yes, couldn't agree more. There's various workarounds for this the medium/long term. I'm leaning towards migrating https://github.com/mozilla-releng/funsize-balrogworker/blob/master/bin/balrogworker.py in a standalone python package. The other solution would be to wait for Ben to finish up his rewriting/py35migrate Balrog-submitter client which will solve basically this problem. For now, I suppose I can tweak the repo destination foler to "balrogworker" or "balrogscript" for clarity.

(In reply to Aki Sasaki [:aki] from comment #12)
> >+    $balrog_scriptworker_py35venv = "/builds/py35venv"
> I made /builds/scriptworker the py35 venv, so /builds/scriptworker/bin would
> be the venv bin.  It could shell out to /builds/balrog.  Or, you could have
> /builds/balrog/scriptworker and /builds/balrog/$py27dir, and
> /builds/balrog/tools, if you wanted to make them siblings.  Clones could go
> into /tools as you were brainstorming, if you prefer.  But whatever
> directory you put the secrets into, you probably want to chmod 700 at some
> point in the parent directory hierarchy, so doing that at
> /builds/scriptworker or /builds/balrog would make sense.

While mimicking your scripts I realized I'd be having two virtualenvs, hence I could not relay on the /builds/scriptworker solely so I needed to split.
Thanks for this idea, I'll have therefore another approach:
 
/builds/balrog/ (chmod 700)
/builds/balrog/scriptworker/...
/builds/balrog/$py27env
/builds/balrog/$py35env
/builds/balrog/balrogscript
/builds/balrog/balrogscript/tools

I believe this addresses all your comments. The reason why tools lies within balrogscript (aka funsize-balrogworker) is because of https://github.com/mozilla-releng/funsize-balrogworker/blob/master/bin/balrogworker.py#L14 . Temporary fix anyway till we roll-out the new py3 client Balrogsubmitter and scripts.

Sum-up with leftovers and next steps:

Leftovers for loaner:

0. change the "funsize-balrogworker" name to something more accurate "balrogworker" or "balrogscript"
> done
1. deal with folders creation - rethink them a bit
> done
2. Feed in secrets to connect my loaner balrogworker to taskcluster - more here https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Secrets
3. update my puppet environment with these changes
4. deploy the catalog to my testing loaner
5. create a dummy task and watch as the ec2-box loaner balrogworker claims it, something similar to what I did under https://bugzilla.mozilla.org/show_bug.cgi?id=1301613#c3

Items for when switching to production:

6. before migrating to production, I need to make sure puppet patches work against a clean EC2 instance (either bring up a 2nd instance with a 2nd hostname+ip, or bring down  current instance and bring up a 2nd with the same hostname+ip)
7. have Coop (because git blame says so :P) review the cruncher configs pinning to requests-2.8.1 before pypi puppetizing-  requests-2.10.0 
8. add nagios - use bug 1295196 as an example. Also we need another check for signing, the pending queue; we may need a check like that for balrog scriptworker as well.  We'll probably use Queue.pendingTasks for that: https://docs.taskcluster.net/reference/platform/queue/api-docs#pendingTasks
9. we'll need a new client for the rolled out production instances that don't use dummy worker types.  I set up https://tools.taskcluster.net/auth/clients/#project%252freleng%252fscriptworker%252fsigning-linux for the signing scriptworkers.  We may want to have a clientid per instance, but for now they share
10. moco-config.pp #TC balrog scriptworkers configs need to be altered to reflect the production environment
11. moco-nodes.pp for production should have high trust levels and security for production
12. some of the config.json chain of trust vars may have to change with the next release of scriptworker
Comment on attachment 8795951 [details] [diff] [review]
Puppet changes to add balrogworker to testing environment

Review of attachment 8795951 [details] [diff] [review]:
-----------------------------------------------------------------

looks great! well done on getting up to speed on puppet. few comments below

::: manifests/moco-nodes.pp
@@ +1165,5 @@
>  ## Loaners
> +node "dev-linux64-ec2-mtabara2.dev.releng.use1.mozilla.com" {
> +    # the pins must come *before* the toplevel include
> +    $aspects = [ 'low-security' ]
> +    $slave_trustlevel = 'try'

I suppose the trustlevel, along with the 'low-security' comment from aki will have to change when in production?

::: modules/balrog_scriptworker/manifests/init.pp
@@ +96,5 @@
> +            group       => "${users::builder::group}",
> +            content     => template("${module_name}/config.json.erb"),
> +            show_diff   => false;
> +        '/root/certs.sh':
> +            ensure => absent;

this line seems odd. though I could be pretty rusty with puppet

::: modules/balrog_scriptworker/manifests/settings.pp
@@ +1,1 @@
> +class balrog_scriptworker::settings {

is this file just so we have short references to the balrog scriptworker config?

::: modules/python35/manifests/virtualenv/settings.pp
@@ +6,4 @@
>      # the root package directory into which all Python package tarballs are copied
>      $misc_python_dir = $::operatingsystem ? {
>          windows => 'c:\mozilla-build',
> +        default => "/tools/misc-python35",

this seems dangerous.. is this planned to stick in prod? what is the expected behaviour outside of scriptworker nodes?
Attachment #8795951 - Flags: feedback?(jlund) → feedback+
(In reply to Jordan Lund (:jlund) from comment #14)
> Comment on attachment 8795951 [details] [diff] [review]
> Puppet changes to add balrogworker to testing environment
> 
> Review of attachment 8795951 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> looks great! well done on getting up to speed on puppet. few comments below
> 
> ::: manifests/moco-nodes.pp
> @@ +1165,5 @@
> >  ## Loaners
> > +node "dev-linux64-ec2-mtabara2.dev.releng.use1.mozilla.com" {
> > +    # the pins must come *before* the toplevel include
> > +    $aspects = [ 'low-security' ]
> > +    $slave_trustlevel = 'try'
> 
> I suppose the trustlevel, along with the 'low-security' comment from aki
> will have to change when in production?

Yep ;) Most of the configs will change with the production switch. For now I just want a staging successfully deployed via puppet.

> ::: modules/balrog_scriptworker/manifests/init.pp
> @@ +96,5 @@
> > +            group       => "${users::builder::group}",
> > +            content     => template("${module_name}/config.json.erb"),
> > +            show_diff   => false;
> > +        '/root/certs.sh':
> > +            ensure => absent;
> 
> this line seems odd. though I could be pretty rusty with puppet

Leftover from signing-script I suppose. Thanks for the heads-up, I removed it.
 
> ::: modules/balrog_scriptworker/manifests/settings.pp
> @@ +1,1 @@
> > +class balrog_scriptworker::settings {
> 
> is this file just so we have short references to the balrog scriptworker
> config?

Yes. Kinda' like a pointer to the balrog_scriptworker module. Inheriting base server stuff is to keep the nodes.pp clean. That adds toplevel stuff, builds, puppetizing, etc. Quoting from an older conversation on IRC: 
22:17:56 <aki> possibly, but the toplevel stuff adds the builds dir and some other stuff, like puppetizing
22:17:56 <dustin> rather than each node getting a potpourri of classes applied
22:18:16 <dustin> mtabara: and toplevel is basically an inheritance hierarchy
22:18:35 <dustin> and allows us to do things like aki said: "all nodes have ___"

 
> ::: modules/python35/manifests/virtualenv/settings.pp
> @@ +6,4 @@
> >      # the root package directory into which all Python package tarballs are copied
> >      $misc_python_dir = $::operatingsystem ? {
> >          windows => 'c:\mozilla-build',
> > +        default => "/tools/misc-python35",
> 
> this seems dangerous.. is this planned to stick in prod? what is the
> expected behaviour outside of scriptworker nodes?

Should be fine AFAIK. But I'll defer to Aki before switching to production.
(In reply to Mihai Tabara [:mtabara] from comment #13)
> Leftovers for loaner:
> 
> 0. change the "funsize-balrogworker" name to something more accurate
> "balrogworker" or "balrogscript"
> > done
> 1. deal with folders creation - rethink them a bit
> > done
> 2. Feed in secrets to connect my loaner balrogworker to taskcluster - more
> here https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Secrets
> 3. update my puppet environment with these changes
> 4. deploy the catalog to my testing loaner
> 5. create a dummy task and watch as the ec2-box loaner balrogworker claims
> it, something similar to what I did under
> https://bugzilla.mozilla.org/show_bug.cgi?id=1301613#c3

This is done now, Yay!
https://tools.taskcluster.net/task-inspector/#YuvymM1MTnanM9AxwNMN4A/ was successfully claimed by the loaner and (as expected) failed due to malformed payload.

Need to wrap this slowly towards production.
Attachment #8796292 - Flags: feedback?(aki)
(In reply to Mihai Tabara [:mtabara] from comment #15)
> (In reply to Jordan Lund (:jlund) from comment #14)
> > Comment on attachment 8795951 [details] [diff] [review]
> > Puppet changes to add balrogworker to testing environment
> > 
> > Review of attachment 8795951 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > looks great! well done on getting up to speed on puppet. few comments below
> > 
> > ::: manifests/moco-nodes.pp
> > @@ +1165,5 @@
> > >  ## Loaners
> > > +node "dev-linux64-ec2-mtabara2.dev.releng.use1.mozilla.com" {
> > > +    # the pins must come *before* the toplevel include
> > > +    $aspects = [ 'low-security' ]
> > > +    $slave_trustlevel = 'try'
> > 
> > I suppose the trustlevel, along with the 'low-security' comment from aki
> > will have to change when in production?
> 
> Yep ;) Most of the configs will change with the production switch. For now I
> just want a staging successfully deployed via puppet.
> 
> > ::: modules/balrog_scriptworker/manifests/init.pp
> > @@ +96,5 @@
> > > +            group       => "${users::builder::group}",
> > > +            content     => template("${module_name}/config.json.erb"),
> > > +            show_diff   => false;
> > > +        '/root/certs.sh':
> > > +            ensure => absent;
> > 
> > this line seems odd. though I could be pretty rusty with puppet
> 
> Leftover from signing-script I suppose. Thanks for the heads-up, I removed
> it.

This was requested during the signing scriptwriter pentest.  https://bugzilla.mozilla.org/show_bug.cgi?id=1298199#c23

> > ::: modules/python35/manifests/virtualenv/settings.pp
> > @@ +6,4 @@
> > >      # the root package directory into which all Python package tarballs are copied
> > >      $misc_python_dir = $::operatingsystem ? {
> > >          windows => 'c:\mozilla-build',
> > > +        default => "/tools/misc-python35",
> > 
> > this seems dangerous.. is this planned to stick in prod? what is the
> > expected behaviour outside of scriptworker nodes?
> 
> Should be fine AFAIK. But I'll defer to Aki before switching to production.

This, or another change, is required for py35 to install alongside py27.  Not sure why Jordan thinks it's dangerous.
Comment on attachment 8796292 [details] [diff] [review]
Refactored Puppet changes to add deploy balrogworker in a testing environment - successfully tested.

The interdiff looks good, though we may want to continue to remove /root/certs.sh.
Attachment #8796292 - Flags: feedback?(aki) → feedback+
(In reply to Aki Sasaki [:aki] from comment #18)
> Comment on attachment 8796292 [details] [diff] [review]
> Refactored Puppet changes to add deploy balrogworker in a testing
> environment - successfully tested.
> 
> The interdiff looks good, though we may want to continue to remove
> /root/certs.sh.

Thanks Aki! I added it back on my local repo, won't reupload here though to avoid redudancy.
Now that we have staging environment working, we need to switch to production.
In order for that to happen, we need the following: 

5. find a way to add the missing public keys https://github.com/mozilla-releng/funsize-balrogworker/blob/master/Makefile#L13 to the puppet machine
6. before migrating to production, I need to make sure puppet patches work against a clean EC2 instance (either bring up a 2nd instance with a 2nd hostname+ip, or bring down  current instance and bring up a 2nd with the same hostname+ip)
7. have Coop (because git blame says so :P) review the cruncher configs pinning to requests-2.8.1 before pypi puppetizing-  requests-2.10.0 
8. add nagios - use bug 1295196 as an example. Also we need another check for signing, the pending queue; we may need a check like that for balrog scriptworker as well.  We'll probably use Queue.pendingTasks for that: https://docs.taskcluster.net/reference/platform/queue/api-docs#pendingTasks
9. we'll need a new client for the rolled out production instances that don't use dummy worker types.  I set up https://tools.taskcluster.net/auth/clients/#project%252freleng%252fscriptworker%252fsigning-linux for the signing scriptworkers.  We may want to have a clientid per instance, but for now they share
10. moco-config.pp #TC balrog scriptworkers configs need to be altered to reflect the production environment
11. moco-nodes.pp for production should have high trust levels and security for production
12. some of the config.json chain of trust vars may have to change with the next release of scriptworker
(In reply to Mihai Tabara [:mtabara] from comment #20)
> Now that we have staging environment working, we need to switch to
> production.
> In order for that to happen, we need the following: 
> 
> 5. find a way to add the missing public keys
> https://github.com/mozilla-releng/funsize-balrogworker/blob/master/
> Makefile#L13 to the puppet machine

this is done now

> 6. before migrating to production, I need to make sure puppet patches work
> against a clean EC2 instance (either bring up a 2nd instance with a 2nd
> hostname+ip, or bring down  current instance and bring up a 2nd with the
> same hostname+ip)

I'll do this first thing in the morning when buildduty folks are around.

> 7. have Coop (because git blame says so :P) review the cruncher configs
> pinning to requests-2.8.1 before pypi puppetizing-  requests-2.10.0 
> 8. add nagios - use bug 1295196 as an example. Also we need another check
> for signing, the pending queue; we may need a check like that for balrog
> scriptworker as well.  We'll probably use Queue.pendingTasks for that:
> https://docs.taskcluster.net/reference/platform/queue/api-docs#pendingTasks
> 9. we'll need a new client for the rolled out production instances that
> don't use dummy worker types.  I set up
> https://tools.taskcluster.net/auth/clients/
> #project%252freleng%252fscriptworker%252fsigning-linux for the signing
> scriptworkers.  We may want to have a clientid per instance, but for now
> they share

I need to investigate this a bit. Which client scopes should I use for this new client?

> 10. moco-config.pp #TC balrog scriptworkers configs need to be altered to
> reflect the production environment

depending on above, we'll tweak that accordingly

> 11. moco-nodes.pp for production should have high trust levels and security
> for production
> 12. some of the config.json chain of trust vars may have to change with the
> next release of scriptworker

I'll adapt these accordingly.

One more question: so how does it work now in terms of machines? Do I need to request a permanent loaner/machine to serve the purpose of balrog-scriptworker?
(In reply to Mihai Tabara [:mtabara] from comment #21)
> > 7. have Coop (because git blame says so :P) review the cruncher configs
> > pinning to requests-2.8.1 before pypi puppetizing-  requests-2.10.0 
> > 8. add nagios - use bug 1295196 as an example. Also we need another check
> > for signing, the pending queue; we may need a check like that for balrog
> > scriptworker as well.  We'll probably use Queue.pendingTasks for that:
> > https://docs.taskcluster.net/reference/platform/queue/api-docs#pendingTasks
> > 9. we'll need a new client for the rolled out production instances that
> > don't use dummy worker types.  I set up
> > https://tools.taskcluster.net/auth/clients/
> > #project%252freleng%252fscriptworker%252fsigning-linux for the signing
> > scriptworkers.  We may want to have a clientid per instance, but for now
> > they share
> 
> I need to investigate this a bit. Which client scopes should I use for this
> new client?

Maybe

project:releng:balrog:*
queue:claim-task:scriptworker-prov-v1/balrog-*
queue:poll-task-urls:scriptworker-prov-v1/balrog-*
queue:worker-id:balrog-v1/balrog-*

or something?
Those need to match your production worker group, provisioner id, and worker id.
I'm not sure if there are other balrog scriptworker types, in which case you may want to have names that differentiate the different types.

> > 11. moco-nodes.pp for production should have high trust levels and security
> > for production
> > 12. some of the config.json chain of trust vars may have to change with the
> > next release of scriptworker
> 
> I'll adapt these accordingly.
> 
> One more question: so how does it work now in terms of machines? Do I need
> to request a permanent loaner/machine to serve the purpose of
> balrog-scriptworker?

If this is a staging instance, it doesn't matter that much aiui.
If this is production, you'll need to make sure it's in the right subnet, and have instances spanning regions (use1, usw2) so if an aws region goes down we'll still have balrog scriptworkers.  Ben, Rail, or Amy might have opinions here.
See Also: → 1306610
Currently testing my changes against a fresh ec2-box to make sure everything works as expected.
Also, I just realized that switching to production outruns this bug's scope. 

@jlund: should we go ahead and proceed with a production node to serve our needs with nightlies?
Flags: needinfo?(jlund)
(In reply to Mihai Tabara [:mtabara] from comment #23)
> Currently testing my changes against a fresh ec2-box to make sure everything
> works as expected.
> Also, I just realized that switching to production outruns this bug's scope. 
> 
> @jlund: should we go ahead and proceed with a production node to serve our
> needs with nightlies?

please! :) nice status reporting in this bug btw
Flags: needinfo?(jlund)
Component: Release Automation → General Automation
QA Contact: rail → catlee
Blocks: 1306753
(In reply to Jordan Lund (:jlund) from comment #2)
> rather than waiting for cloudops state to be ready, we are going to deploy
> this along signingscript in puppet.
> 
> signingscript is also scriptworker based but has been deployed in releng
> infra under puppet. It would be great to deploy funsize-balrogworker (maybe
> balrogscript would be more appropriate name?) in the same manner so future
> maintenance and deployments are similar across scriptworker based scripts.
> Or at least until cloudops is ready so this doesn't block.
> 
> This will take three parts:
> 
> 1) drop docker for env setup and instead revert to simply venv+setuptools so
> that it can be puppetized
>    a.
> https://github.com/mozilla-releng/funsize-balrogworker/blob/master/Dockerfile
> 
> 2) port funsize-balrogworker and tools' balrog submitter api from py2->py3
>    *
> http://hg.mozilla.org/build/tools/file/a5d75df726be/lib/python/balrog/
> submitter
>       * maybe we should have a lib/python3/balrog/submitter ?
> 
> 3) puppetize funsize-balrogworker by deploying it on a staging node. similar
> to signingscript
>    * signingscript: http://hg.mozilla.org/build/puppet/rev/4f4b5d2b76a0
> 
> we may not require py2->py3 conversion if (a) the puppet scriptworker based
> nodes have python2 installed and is a sufficient interpreter. (b) the work
> to port yields little reward and is much harder than getting puppet
> scriptworker nodes py2 compliant.
> 
> I'm probably missing some things but this is the gist.

This is done now - we have a working staging environment and puppet changes to set that up. 
Since moving to production outruns this bug's scope and is also a slightly different process, I'd like to track that in a different bug 1306753. 
So I'm closing this for now.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
See Also: → 1307565
For future needs: task sample to test if staging instance is okay - https://tools.taskcluster.net/task-inspector/#FlI1rRKvS5ybTOphnN_dyA/
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: