Closed Bug 1154423 Opened 6 years ago Closed 6 years ago

deploy the buildbot bridge components

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

Attachments

(2 files, 1 obsolete file)

The buildbot <-> tc bridge being developed in bug 1135192 has three services that need to be run:
1) A piece that listens to TaskCluster events through Pulse.
2) A piece that listens to Buildbot events through Pulse.
3) A piece that runs periodically to reflects some state from Buildbot to Taskcluster.

All three pieces need access to a few things:
* Read-write access to the buildbot scheduler db
* Taskcluster's API
* Read-write access to a new database for the buildbot bridge itself (this might end just being an additional table in the buildbot scheduler db - TBD)

I would like to be able to deploy all of these pieces to multiple places. Eg, run each service in at least two datacentres (could be use1+usw2 or aws+scl3, or whatever). I _think_ that the two Pulse listeners are written in such a way that this will work fine. The reflector piece runs on a timer, and we may need some sort of in-database locking around some of its operations to make multiple instances work.

I'm thinking that the Buildbot masters might be a decent place to run these pieces. We already have them in multiple datacentres, they have some (maybe all) of the netflows needed, and we already run a few similar things on them.
The Buildbot Bridge is pretty far along at this point, and I think it's time to deploy it. I'm hoping to be testing against a twig in the near future, and I'd like to do that with a production deployment.

This is working well in my own testing. I've got it attached to one build master in each of use1, usw2, and scl3 for redundancy. The bblistener and tclistener are certainly fine to run multiple instances of, as they're using durable pulse queues. The reflector piece might need a couple of tweaks to safely run in multiple locations, that's something I'll work out in my twig testing. For now, I've only whitelisted builders with "bhearsum" in the name, because I'm not sure which twig I'll be using yet.
Attachment #8594760 - Flags: review?(dustin)
Comment on attachment 8594760 [details] [diff] [review]
puppet module for deploying the buildbot bridge

Review of attachment 8594760 [details] [diff] [review]:
-----------------------------------------------------------------

Good to go with really minor changes.

::: modules/buildbot_bridge/manifests/init.pp
@@ +16,5 @@
> +            user     => "${users::builder::username}",
> +            group    => "${users::builder::group}",
> +            packages => [
> +                # Taskcluster hard pins this version, so must we...
> +                # TODO: make sure these are available on releng-puppet

do & remove comment :)

@@ +36,5 @@
> +           ];
> +    }
> +
> +    file {
> +        "${buildbot_bridge::settings::root}/config.json":

Can you move this to buildbot_bridge::conf and include it from buildbot_bridge::services, too?  It's generally good form to include any classes that are require'd, and buildbot_bridge::services require's this file.
Attachment #8594760 - Flags: review?(dustin) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #2)
> Comment on attachment 8594760 [details] [diff] [review]
> puppet module for deploying the buildbot bridge
> 
> Review of attachment 8594760 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Good to go with really minor changes.
> 
> ::: modules/buildbot_bridge/manifests/init.pp
> @@ +16,5 @@
> > +            user     => "${users::builder::username}",
> > +            group    => "${users::builder::group}",
> > +            packages => [
> > +                # Taskcluster hard pins this version, so must we...
> > +                # TODO: make sure these are available on releng-puppet
> 
> do & remove comment :)

Whoops :). These are all there at this point...I had to grab them to test!

> @@ +36,5 @@
> > +           ];
> > +    }
> > +
> > +    file {
> > +        "${buildbot_bridge::settings::root}/config.json":
> 
> Can you move this to buildbot_bridge::conf and include it from
> buildbot_bridge::services, too?  It's generally good form to include any
> classes that are require'd, and buildbot_bridge::services require's this
> file.

Will do.

I need to hold off landing this at the moment until a new pulse user for the buildbot-bridge gets created. Hopefully I'll get this landed today or tomorrow though.
Here's the patch I'll be landing after I sort out the Pulse user and add the new secrets to Hiera. Carrying forward r+ (I think that's what was intended).
Attachment #8594760 - Attachment is obsolete: true
Attachment #8594816 - Flags: review+
Depends on: 1156350
I received database information over bug 1156775, and I've got all of the secrets added to Hiera. Tomorrow I'll land the Puppet patch, but I'm going to disable it from automatically starting from now (not sure how yet - maybe by turning off the supervisor parts completely) since I'll be away next week.
Comment on attachment 8594816 [details] [diff] [review]
move config file to separate class; adjust comments

OK, I've landed this with this small change:
diff --git a/modules/buildbot_bridge/manifests/services.pp b/modules/buildbot_bridge/manifests/services.pp
index 62323e8..dcff21a 100644
--- a/modules/buildbot_bridge/manifests/services.pp
+++ b/modules/buildbot_bridge/manifests/services.pp
@@ -22,16 +22,16 @@ class buildbot_bridge::services {
         "buildbot_bridge_reflector":
             command      => "${buildbot_bridge::settings::root}/bin/buildbot-bridge -c ${buildbot_bridge::settings::root}/config.json reflector",
             user         => $::config::builder_username,
             require      => [File["${buildbot_bridge::settings::root}/config.json"],
                             Python::Virtualenv["${buildbot_bridge::settings::root}"]],
             extra_config => template("${module_name}/reflector_supervisor_config.erb");
     }
 
-    exec {
-        "restart-buildbot-bridge":
-            command     => "/usr/bin/supervisorctl restart buildbot_bridge_bblistener buildbot_bridge_tclistener buildbot_bridge_reflector",
-            refreshonly => true,
-            subscribe   => [Python::Virtualenv["${buildbot_bridge::settings::root}"],
-                            File["${buildbot_bridge::settings::root}/config.json"]];
-    }
+#    exec {
+#        "restart-buildbot-bridge":
+#            command     => "/usr/bin/supervisorctl restart buildbot_bridge_bblistener buildbot_bridge_tclistener buildbot_bridge_reflector",
+#            refreshonly => true,
+#            subscribe   => [Python::Virtualenv["${buildbot_bridge::settings::root}"],
+#                            File["${buildbot_bridge::settings::root}/config.json"]];
+#    }
 }
diff --git a/modules/buildbot_bridge/templates/bblistener_supervisor_config.erb b/modules/buildbot_bridge/templates/bblistener_supervisor_config.erb
index ae75b88..e9e9d0c 100644
--- a/modules/buildbot_bridge/templates/bblistener_supervisor_config.erb
+++ b/modules/buildbot_bridge/templates/bblistener_supervisor_config.erb
@@ -1,5 +1,6 @@
 log_stderr=true
 log_stdout=true
 redirect_stderr=true
 stdout_logfile=/var/log/supervisor/bblistener.log
-autorestart=true
+autorestart=false
+autostart=false
diff --git a/modules/buildbot_bridge/templates/reflector_supervisor_config.erb b/modules/buildbot_bridge/templates/reflector_supervisor_config.erb
index c331238..3d8f93c 100644
--- a/modules/buildbot_bridge/templates/reflector_supervisor_config.erb
+++ b/modules/buildbot_bridge/templates/reflector_supervisor_config.erb
@@ -1,5 +1,6 @@
 log_stderr=true
 log_stdout=true
 redirect_stderr=true
 stdout_logfile=/var/log/supervisor/reflector.log
-autorestart=true
+autorestart=false
+autostart=false
diff --git a/modules/buildbot_bridge/templates/tclistener_supervisor_config.erb b/modules/buildbot_bridge/templates/tclistener_supervisor_config.erb
index cbee731..8ead978 100644
--- a/modules/buildbot_bridge/templates/tclistener_supervisor_config.erb
+++ b/modules/buildbot_bridge/templates/tclistener_supervisor_config.erb
@@ -1,5 +1,6 @@
 log_stderr=true
 log_stdout=true
 redirect_stderr=true
 stdout_logfile=/var/log/supervisor/tclistener.log
-autorestart=true
+autorestart=false
+autostart=false


Which will stop supervisord from starting any of these services unless explicitly requested.
Attachment #8594816 - Flags: checked-in+
The hosts puppetized _almost_ fine. I'd forgotten about this lovely issue with the "six" package though, as described on http://stackoverflow.com/questions/25185300/cant-find-six-but-its-installed.

The known fix is to run "easy_install --uprgade six", but that's no good for us because it pulls from pypi.python.org. I'm going to try to find a better fix.
It looks like the main difference between installing "six" with easy_install and a pip install is that you get an egg-info and .py file with pip, but you get an actual egg file with easy_install. Every other package in the virtualenv gets a _directory_ and an egg-info with pip.
It's still not clear to me what the problem is, but I discovered that not using the wheel packages of six that are on pypi.pub.build.mozilla.org fixes the problem. The Puppet pypi repos have 1.8.0, so using this version is somewhat of a hack to avoid getting the 1.9.0 from the other pypi.
Attachment #8596586 - Flags: review?(dustin)
Comment on attachment 8596586 [details] [diff] [review]
use older version of six

It sounds like we should just never have the wheel on our pypi's.

I'm confused by discussion of two pypi's -- puppet installs should only be using the pypi on the puppetmasters.  And that has 1.8.0 only, in tarball format -- http://puppetagain.pub.build.mozilla.org/data/python/packages/six-1.8.0.tar.gz

By the way, I see
  http://puppetagain.pub.build.mozilla.org/data/python/packages/six.tar.gz
modified on April 17 - was that yours?

Anyway, I don't think that downgrading the package without any indication of why it might be a bad idea to re-upgrade it is a good fix.
Attachment #8596586 - Flags: review?(dustin) → review-
(In reply to Dustin J. Mitchell [:dustin] from comment #10)
> Comment on attachment 8596586 [details] [diff] [review]
> use older version of six
> 
> It sounds like we should just never have the wheel on our pypi's.

Agreed. Are you willing to eat the risk of removing the wheels on http://pypi.pub.build.mozilla.org/pub/? I'm not.

> I'm confused by discussion of two pypi's -- puppet installs should only be
> using the pypi on the puppetmasters.  And that has 1.8.0 only, in tarball
> format --
> http://puppetagain.pub.build.mozilla.org/data/python/packages/six-1.8.0.tar.
> gz

Yeah, that surprised me as well. I'm guessing that package from http://pypi.pub.build.mozilla.org/pub/ got used because that server is specified in pip.conf. I could be wrong though.

> By the way, I see
>   http://puppetagain.pub.build.mozilla.org/data/python/packages/six.tar.gz
> modified on April 17 - was that yours?

Yeah. Looks like I screwed up when downloading that!

> Anyway, I don't think that downgrading the package without any indication of
> why it might be a bad idea to re-upgrade it is a good fix.

I don't either, but I don't think I can spend an enormous amount of time debugging what appears to be an issue with pip. Hopefully fixing the package name from above will fix the issue.
Okay, so even with six-1.9.0.tar.gz available on the puppet master I still end up with the wheel. I see this in the puppet log:
Debug: Exec[pip /builds/bbb||six==1.9.0](provider=posix): Executing '/builds/bbb/bin/pip install --no-deps --no-index  --find-links=http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages  --find-links=http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages  six==1.9.0'
Debug: Executing '/builds/bbb/bin/pip install --no-deps --no-index  --find-links=http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages  --find-links=http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages  six==1.9.0'
Notice: /Stage[main]/Buildbot_bridge/Python::Virtualenv[/builds/bbb]/Python::Virtualenv::Package[/builds/bbb||six==1.9.0]/Exec[pip /builds/bbb||six==1.9.0]/returns: executed successfully

But I suspect this ends up also using the mirrors listed in pip.conf. The help for --no-index says:
"  --no-index                  Ignore package index (only looking at --find-links URLs instead)."

which doesn't give me any confidence that --no-index makes it ignore the config file.
I just confirmed my theory with some manual commands on bm82. Starting with six uninstalled, I ran the same command as puppet to get it installed:
 bin/pip install --no-deps --no-index  --find-links=http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages  --find-links=http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages  six==1.9.0
Ignoring indexes: https://pypi.python.org/simple/
Downloading/unpacking six==1.9.0
  http://pypi.pvt.build.mozilla.org/pub uses an insecure transport scheme (http). Consider using https if pypi.pvt.build.mozilla.org has it available
  http://pypi.pub.build.mozilla.org/pub uses an insecure transport scheme (http). Consider using https if pypi.pub.build.mozilla.org has it available
  http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet1.srv.releng.use1.mozilla.com has it available
  http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet1.srv.releng.usw2.mozilla.com has it available
  http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet2.srv.releng.scl3.mozilla.com has it available
  http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet1.srv.releng.scl3.mozilla.com has it available
  Downloading six-1.9.0-py2.py3-none-any.whl
Installing collected packages: six
Successfully installed six
Cleaning up...


It spews some stuff about the pypi domains, so it's clearly looking at them. The buildbot-bridge continues to complain about "six" after this.

I uninstalled six and then ran the same command with --no-use-wheel:
 bin/pip install --no-deps --no-index  --find-links=http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages  --find-links=http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages --no-use-wheel six==1.9.0
Ignoring indexes: https://pypi.python.org/simple/
Downloading/unpacking six==1.9.0
  http://pypi.pvt.build.mozilla.org/pub uses an insecure transport scheme (http). Consider using https if pypi.pvt.build.mozilla.org has it available
  http://pypi.pub.build.mozilla.org/pub uses an insecure transport scheme (http). Consider using https if pypi.pub.build.mozilla.org has it available
  http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet1.srv.releng.use1.mozilla.com has it available
  http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet1.srv.releng.usw2.mozilla.com has it available
  http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet2.srv.releng.scl3.mozilla.com has it available
  http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages uses an insecure transport scheme (http). Consider using https if releng-puppet1.srv.releng.scl3.mozilla.com has it available
  Downloading six-1.9.0.tar.gz
  Running setup.py (path:/builds/bbb/build/six/setup.py) egg_info for package six
    
    no previously-included directories found matching 'documentation/_build'
Installing collected packages: six
  Running setup.py install for six
    
    no previously-included directories found matching 'documentation/_build'
Successfully installed six
Cleaning up...


Which installed from the tarball rather than wheel, and the bridge now works:
[cltbld@buildbot-master82.bb.releng.scl3.mozilla.com bbb]$ bin/buildbot-bridge --help
usage: buildbot-bridge [-h] [-v] [-q] -c CONFIG
                       {bblistener,reflector,tclistener}

positional arguments:
  {bblistener,reflector,tclistener}

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose
  -q, --quiet
  -c CONFIG, --config CONFIG


It seems like the root problem here is that Puppet virtualenv's are using pypi.{pvt,pub}.build.mozilla.org when they shouldn't be. Generally this hasn't been a problem because we either don't have version conflicts, or don't have wheels available for most packages - so you end up with what you want no matter where it's installed from.

For now, I've gone ahead and manually reinstalled "six" in the buildbot bridge virtualenvs to unblock me. I'm not sure what the right path forward here is - I can't block the buildbot bridge on this bug in our Puppet manifests though.
So basically, installing six from a wheel does not work.  So I'm not sure why we have a wheel on pypi.pvt.  Actually, I think I did that, although looking at command-line history maybe not.  At any rate, I replaced the 1.9.0 wheel on pypi.pvt with a tarball, so it should be fine now.
I filed bug 1157872 for the pypi issue
Thanks again for your help here Dustin.

I verified the deployment today by starting each component by hand. I had to fix up Hiera - apparently I screwed up the buildbot bridge db password - but other than that it worked fine. I'm going to call this fixed, and leave re-enabling autostart/restart to bug 1158240 after the app is in better shape.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.