Closed Bug 792836 Opened 10 years ago Closed 9 years ago

Manage slave secrets with puppet

Categories

(Infrastructure & Operations :: RelOps: Puppet, task, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: dustin)

References

Details

(Whiteboard: [puppet])

Attachments

(4 files, 9 obsolete files)

5.59 KB, patch
Callek
: review+
dustin
: checked-in+
Details | Diff | Splinter Review
24.58 KB, patch
Callek
: review+
Details | Diff | Splinter Review
11.17 KB, patch
Details | Diff | Splinter Review
1.95 KB, patch
Callek
: review+
Details | Diff | Splinter Review
We thought about using slavealloc to deliver secrets (ssh keys, android signing secrets, etc) because slavealloc has all information we ned to determine which set of secrets should be delivered to slaves. On the other hand, slavealloc doesn't provide a secure way to authorize slaves.

Puppet uses SSL certificates for slaves, so it's more secure then slavealloc. It is possible to create a module (or plugin) which talks to slavealloc and decides which set of secrets should be delivered to a slave.

Unfortunately this work should be done for both build/puppet-manifests and build/puppet.
Priority: -- → P3
Attached patch proof of concept (obsolete) — Splinter Review
A working proof of concept.

* only for build slaves only
* I used generate() to talk to slavealloc too lookup for slaves' trustlevel
* I wanted to avoid using custom puppet functions to avoid risk of API changes and possible reuse the script somewhere else
* trystlevel.py is dead simple now, need to add exception handling and cache slaves JSON file on disk.
* ssh::userconfig manages .ssh directory with purge+recurse, so it deletes all unmanaged files (when we move slaves from core to dev)
* we may want to add another trustlevel in slavealloc for loaned machines, so puppet removes all keys since "loaned" doesn't fall under any of of the listed trustlevels. Low priority, since we disable puppet when loan machines.
Attachment #667103 - Flags: feedback?(dustin)
Attachment #667103 - Flags: feedback?(catlee)
Comment on attachment 667103 [details] [diff] [review]
proof of concept

Review of attachment 667103 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/secrets/manifests/init.pp
@@ +17,5 @@
> +    case generate("/etc/puppet/environments/raliiev/env/modules/secrets/files/trustlevel.py", "${hostname}") {
> +        # TODO: the files should live out of hg root. puppet/apache may require alias configs
> +
> +        /^(aws|core|dev)$/: {
> +            secrets::file {

Note as a community member, and planning to start making use of SeaMonkey's puppetAgain master very soon, I'm against this approach, for a few reasons:
1) It requires access to slavealloc [or a slavealloc instance if we abstract it a bit] which Seamonkey doesn't have
2) It forces a very specific list of keys that SeaMonkey won't have, in any case.

So if we can find another solution for this (even if its part of the magic we do with config symlinks/includes like we do with http://mxr.mozilla.org/build/source/puppet/modules/ganglia/manifests/config-moco.pp I'd be happier.
Attachment #667103 - Flags: feedback-
(adding moco confidential since this is security sensitive)

(In reply to Rail Aliiev [:rail] from comment #1)
> Created attachment 667103 [details] [diff] [review]
> proof of concept
> 
> A working proof of concept.
> 
> * only for build slaves only
> * I used generate() to talk to slavealloc too lookup for slaves' trustlevel

I like not trusting the slave here.  However, it relies on slavealloc for security-sensitive information.  That's no worse than we're already doing, but now that I think about it, that's pretty bad, especially since slavealloc is currently unauthenticated on the releng network.  So, a few things to fix that:

If we ever get transitioned to the new releng web cluster, slavealloc will be LDAP-authenticated (and available with or without VPN, for added convenience), so that might not be so bad.

Once that's in place, I think we will still want some change-control in place, so we can know when a machine is moved to a particular trust level.  I have an idea that will solve both Callek's concerns and mine in this case: run a crontask on the masters that periodically polls slavalloc for all slaves' trust levels, and dumps that into a CSV.  Dump it into slave-trustlevels.csv.new, diff it against the current, and send an email to releng-shared@ with the diffs.   Then atomically swap it in (with mv).  Callek and other users can just add a static/hand-edited file.

This also gets you the caching - if slavealloc is down, masters continue with no interruption.

> * I wanted to avoid using custom puppet functions to avoid risk of API
> changes and possible reuse the script somewhere else

This isn't necessarily warranted - we have a few custom functions already.  But I agree it's better to avoid complexity where possible.

> * trystlevel.py is dead simple now, need to add exception handling and cache
> slaves JSON file on disk.
> * ssh::userconfig manages .ssh directory with purge+recurse, so it deletes
> all unmanaged files (when we move slaves from core to dev)

+1

> * we may want to add another trustlevel in slavealloc for loaned machines,
> so puppet removes all keys since "loaned" doesn't fall under any of of the
> listed trustlevels. Low priority, since we disable puppet when loan machines.

I think we'd need some additional mapping of trustlevel to keys.  I'll loop back to the idea I've suggested a few times before: having an "organization" value in config that's set to moco for us, sea for seamonkey, addons for addons, etc.  Then the keys can be looked up by "${::config::organization}-${trustlevel}", and we can all stay out of each others' way in the manifests.

I like having all of the secret-handling stuff in a single module, so we know to pay extra-special attention to patches there.  However, a monolithic "include secrets" won't work very well for the variety of slave and non-slave machines in use.  It also removes the locus of control from the modules that would require the secrets.  One thought I have is to use something like this each place a secret is required:
  secrets::file { "${::users::builder::home}/.mozpass.cfg": secret => 'mozpass.cfg'; }
maybe adding mode, owner, and group options, too.  Then secrets::file could use an extlookup based on ${::config::organization}-${trustlevel} to get that secret value.  It may also be useful to store secrets in some single-line format - perhaps base64, since there's already a custom format for that?
Attachment #667103 - Flags: feedback?(dustin) → feedback-
Comment on attachment 667103 [details] [diff] [review]
proof of concept

Review of attachment 667103 [details] [diff] [review]:
-----------------------------------------------------------------

Agree with most of the other feedback. Let's focus on using a local snapshot of trustlevel info so we can avoid security issues with slavealloc for now.
Attachment #667103 - Flags: feedback?(catlee)
Attached patch sikrits (obsolete) — Splinter Review
comments incoming
Attachment #667103 - Attachment is obsolete: true
Attachment #676615 - Flags: feedback?(dustin)
Comment on attachment 676615 [details] [diff] [review]
sikrits

Review of attachment 676615 [details] [diff] [review]:
-----------------------------------------------------------------

::: manifests/extlookup/moco-config.csv
@@ +7,5 @@
>  ganglia_config_class,ganglia::config-moco
>  global_authorized_keys,armenzg,asasaki,bhearsum,catlee,callek,coop,joduinn,nthomas,rail,dustin,cltbld,arr,zandr,mlarrain,jhopkins
> +slavealloc_dump_file,%{puppet_confdir}/slavealloc.json
> +organization,moco
> +slave_classification_script,%{puppet_manifestdir}/scripts/trustlevel.py

I added a custom fact to simplify testing the scripts without hardcoding their path.

::: manifests/scripts/slavealloc_diff.py
@@ +1,1 @@
> +#!/usr/bin/env python

just a proof of concept, I need to add better exception handling here.

::: modules/puppetmaster/templates/sync_slavealloc.sh.erb
@@ +1,1 @@
> +#! /bin/bash

This script fetches the slaves json file but doesn't apply it leaving the decision to us. It will be spamming us every hour until we "commit" the changes. Ignore the email subjects for now.

::: modules/secrets/manifests/buildslave/moco.pp
@@ +5,5 @@
> +    }
> +
> +    secrets::file {
> +        "${::users::builder::home}/.ssh/auspush":
> +            source => "puppet:///modules/secrets/${secrets_dir}/.ssh/auspush";

${secrets_dir} will be moved outside of the hg tree, or we should add those to .hgingore.

I'd prefer to use real files instead of base64 encoded ones because it'll be easier to copy and compare them with the real files.

::: modules/secrets/manifests/buildslave/mocoshared.pp
@@ +1,1 @@
> +class secrets::buildslave::mocoshared ($secrets_dir){

Files shared between try and production. Well not really the files, but just file names. The file sources are different.
Comment on attachment 676615 [details] [diff] [review]
sikrits

Review of attachment 676615 [details] [diff] [review]:
-----------------------------------------------------------------

Lots of questions and thoughts, but this looks good so far!

::: manifests/extlookup/moco-config.csv
@@ +7,5 @@
>  ganglia_config_class,ganglia::config-moco
>  global_authorized_keys,armenzg,asasaki,bhearsum,catlee,callek,coop,joduinn,nthomas,rail,dustin,cltbld,arr,zandr,mlarrain,jhopkins
> +slavealloc_dump_file,%{puppet_confdir}/slavealloc.json
> +organization,moco
> +slave_classification_script,%{puppet_manifestdir}/scripts/trustlevel.py

I like :)

::: manifests/scripts/slavealloc_diff.py
@@ +1,1 @@
> +#!/usr/bin/env python

Why not just use 'diff'?

::: manifests/scripts/trustlevel.py
@@ +1,1 @@
> +#!/usr/bin/env python

Would it be easier to make this a custom fact?

::: modules/puppetmaster/manifests/sync_slavealloc.pp
@@ +3,5 @@
> +        file {
> +            "/etc/puppet/sync_slavealloc.sh":
> +                mode   => '0755',
> +                source => template("puppet:///modules/puppetmaster/sync_slavealloc.sh.erb");
> +            "/etc/cron.d/sync_data.cron":

Shouldn't this be sync_slavealloc.cron?

::: modules/secrets/manifests/buildslave/moco.pp
@@ +5,5 @@
> +    }
> +
> +    secrets::file {
> +        "${::users::builder::home}/.ssh/auspush":
> +            source => "puppet:///modules/secrets/${secrets_dir}/.ssh/auspush";

True, but files are available to any client that has a certificate - not just clients that deserve the file.  So this would make build keys available for the asking on try slaves.

Also, eventually with Hiera we'll need to handle the file contents as strings within Puppet, rather than files.

You could change the way secrets::file is implemented to load the content locally based on an identifying string e.g., 
  secret => "${secrets_tree}/ssh/auspush"
and then secrets::file would load that data, probably with generate(), and then
  file {
    ..
      content => $secret_data;
  }

This means the puppetmaster will only send that data to hosts on which the File resource is instantiated.

::: modules/secrets/manifests/init.pp
@@ +7,5 @@
> +                                "${::hostname}")
> +
> +        case "${::config::organization}-${trustlevel}" {
> +
> +            /^(moco-(aws|core|dev(-aws)?))$/: {

Do you mean /^(moco-(aws|core|dev)(-aws)?)$/ here?

::: modules/toplevel/manifests/slave/build.pp
@@ +15,5 @@
>  
>      include packages::mozilla::git
>      include packages::mozilla::py27_virtualenv
>      include packages::mozilla::hgtool
> +    include secrets

This would end up in toplevel::slave eventually, right?  Or, if not, perhaps it should be
  include secrets::buildslave
so that we can later add
  include secrets::master
and so on?
Attachment #676615 - Flags: feedback?(dustin) → feedback+
(In reply to Dustin J. Mitchell [:dustin] from comment #7) 
> ::: manifests/scripts/slavealloc_diff.py
> @@ +1,1 @@
> > +#!/usr/bin/env python
> 
> Why not just use 'diff'?

A diff between two one-liner json files looks bad! :)
It's possible to dump both files with "python -m json" and then diff. Either WFM.

> ::: manifests/scripts/trustlevel.py
> @@ +1,1 @@
> > +#!/usr/bin/env python
> 
> Would it be easier to make this a custom fact?

Maybe as a function... We don't trust anything from slaves, right? :)


> Shouldn't this be sync_slavealloc.cron?

Silly copy/paste!
 
> ::: modules/secrets/manifests/buildslave/moco.pp
> @@ +5,5 @@
> > +    }
> > +
> > +    secrets::file {
> > +        "${::users::builder::home}/.ssh/auspush":
> > +            source => "puppet:///modules/secrets/${secrets_dir}/.ssh/auspush";
> 
> True, but files are available to any client that has a certificate - not
> just clients that deserve the file.  So this would make build keys available
> for the asking on try slaves.

Oh, right.. We need a better approach to return the content. file() would help here...

> Also, eventually with Hiera we'll need to handle the file contents as
> strings within Puppet, rather than files.

yeah... I'm not sure what we should 
 
> You could change the way secrets::file is implemented to load the content
> locally based on an identifying string e.g., 
>   secret => "${secrets_tree}/ssh/auspush"
> and then secrets::file would load that data, probably with generate(), and
> then
>   file {
>     ..
>       content => $secret_data;
>   }

I think, file() function is the easiest way to map content outside of the puppet tree. Something like
  source => file("/etc/puppet/sikrits/${secrets_dir}/.ssh/auspush");

I need some testing here. In the worst case scenario I may drop this part and replace it with custom functions...


> Do you mean /^(moco-(aws|core|dev)(-aws)?)$/ here?

Sure.

> ::: modules/toplevel/manifests/slave/build.pp
> This would end up in toplevel::slave eventually, right?  Or, if not, perhaps
> it should be
>   include secrets::buildslave
> so that we can later add
>   include secrets::master
> and so on?

Sounds reasonable.

Thanks for the feedback!!!
(In reply to Rail Aliiev [:rail] from comment #8)
> A diff between two one-liner json files looks bad! :)

Oh, silly me.  This is fine, then.

> > ::: manifests/scripts/trustlevel.py
> > @@ +1,1 @@
> > > +#!/usr/bin/env python
> > 
> > Would it be easier to make this a custom fact?
> 
> Maybe as a function... We don't trust anything from slaves, right? :)

Yes, that's what I should have said.

Glad the fb was helpful.  Sorry for the delay!
Component: Release Engineering: Machine Management → RelOps: Puppet
Depends on: 891853
Product: mozilla.org → Infrastructure & Operations
QA Contact: armenzg → dustin
I don't think that I can work on this bug this quarter. Back to the pool for now.
Assignee: rail → relops
Assignee: relops → dustin
Severity: normal → enhancement
Assignee: dustin → relops
Severity: enhancement → normal
Assignee: relops → dustin
Attached patch bug792836.patch (obsolete) — Splinter Review
OK, here's what I've got so far.  This moves trust levels into the node definitions, because it's easier and a good bit more secure.  Once we move out of mtv1 and scl1, the ugly conditionals on slave number will go away - slave type will be determined by vlan and name.

This checks (though not in an unforgeable way) that a slave's trust level doesn't change without being reimaged, to prevent accidental changes.

I've verified that an entire privkey can be easily bundled up with eyaml and added to a single key (although it's a very large secret!).

Remaining TODO: treat the google API key as a secret, rather than a package.  That should be easy, but it's MFBT so this is what I've got :)
Attachment #816075 - Flags: feedback?(rail)
Comment on attachment 816075 [details] [diff] [review]
bug792836.patch

Review of attachment 816075 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for poking this bug. I think, the idea should work.

Some comments below.

::: manifests/moco-config.pp
@@ +246,5 @@
>      $slaveapi_bugzilla_dev_url = "https://bugzilla-dev.allizom.org/rest/"
>      $slaveapi_bugzilla_prod_url = "https://bugzilla.mozilla.org/rest/"
> +
> +    $builder_ssh_keys_by_trustlevel = {
> +        "dev" => ['test'], # you're on your own

dev usually means the same key names but for staging. "on your own" won't work here because .ssh will purged by pupet, so you can put anything there...

::: manifests/moco-nodes.pp
@@ +63,5 @@
> +        } elsif $1 <= 37 {
> +            $slave_trustlevel = 'core'
> +        } elsif $1 <= 53 {
> +            $slave_trustlevel = 'try'
> +        }

We are introducing another source of truth by adding these rules. I'd like to hear what people working on slaveapi think about this.

::: modules/config/manifests/base.pp
@@ +68,5 @@
>      $releaserunner_ssh_key = ""
>      $install_avds = "no"
> +    $slavealloc_dump_file = ""
> +    $slave_classification_script = ""
> +    $slavalloc_url = ""

Are these used anywhere?
Attachment #816075 - Flags: feedback?(rail)
Attachment #816075 - Flags: feedback?(bhearsum)
Attachment #816075 - Flags: feedback+
So should the dev keys be the same as the try keys?

This is another source of truth, but only until we empty out mtv1/scl1 - everywhere else segregates try/build by VLAN.  So I think it's tolerable.
(In reply to Dustin J. Mitchell [:dustin] from comment #13)
> So should the dev keys be the same as the try keys?

For staging we use the same key file names as for production, but with different content of files (keys are different).
 
> This is another source of truth, but only until we empty out mtv1/scl1 -
> everywhere else segregates try/build by VLAN.  So I think it's tolerable.

Ah, OK. Makes more sense now.

Thanks again!
Duplicate of this bug: 913041
Attached patch interdiff.patch (obsolete) — Splinter Review
Attached patch bug792836-p1.patch (obsolete) — Splinter Review
Attachment #816075 - Attachment is obsolete: true
Attachment #816075 - Flags: feedback?(bhearsum)
Attachment #817147 - Flags: review?(rail)
Attachment #817147 - Flags: feedback?(bhearsum)
(In reply to Rail Aliiev [:rail] from comment #14)
> (In reply to Dustin J. Mitchell [:dustin] from comment #13)
> > So should the dev keys be the same as the try keys?
> 
> For staging we use the same key file names as for production, but with
> different content of files (keys are different).

Just to spell this out explicitly, there are 3 different sets of keys a build machine may have:
1) Production:
auspush		authorized_keys	b2gbld_dsa	b2gbld_dsa.pub	config		ffxbld_dsa	ffxbld_dsa.pub	id_dsa		id_dsa.pub	known_hosts	tbirdbld_dsa	xrbld_dsa
2) Staging/dev - Any keys with duplicate names have different contents:
aus  authorized_keys  config  cvs  cvs.pub  ffxbld_dsa  ffxbld_dsa.pub  id_dsa  id_rsa  id_rsa.pub  known_hosts  tbirdbld_dsa  trybld_dsa  trybld_dsa.pub  xrbld_dsa  xrbld_dsa.pub
3) Try:
aus		authorized_keys	config		cvs		known_hosts	trybld_dsa	trybld_dsa.pub

(The above all taken from lion machines - I'm pretty sure the other platforms are the same).

(In reply to Rail Aliiev [:rail] from comment #12)
> ::: manifests/moco-nodes.pp
> @@ +63,5 @@
> > +        } elsif $1 <= 37 {
> > +            $slave_trustlevel = 'core'
> > +        } elsif $1 <= 53 {
> > +            $slave_trustlevel = 'try'
> > +        }
> 
> We are introducing another source of truth by adding these rules. I'd like
> to hear what people working on slaveapi think about this.

I don't think slaveapi really interacts here at all - it merely queries other systems for truth. It is unfortunate to duplicate this information that's already in slavealloc though. Is there any way that Puppet can get this information from slavealloc, or have it autogenerated on a cycle or something?
Comment on attachment 817147 [details] [diff] [review]
bug792836-p1.patch

Review of attachment 817147 [details] [diff] [review]:
-----------------------------------------------------------------

::: manifests/moco-config.pp
@@ +246,5 @@
>      $slaveapi_bugzilla_dev_url = "https://bugzilla-dev.allizom.org/rest/"
>      $slaveapi_bugzilla_prod_url = "https://bugzilla.mozilla.org/rest/"
> +
> +    $builder_ssh_keys_by_trustlevel = {
> +        "dev" => ['trybld_dsa', 'b2gtry_dsa'],

This list looks wrong, per my previous comment. Looks like the different contents per trust level are handled if I'm reading ssh_key.pp right though!
Attachment #817147 - Flags: feedback?(bhearsum) → feedback+
I assume those are directory contents of .ssh, rather than a key named 'authorized_keys', right?  Are you sure all of those keys are in use?  What about id_dsa - what's that?  I based the set of keys specified for try and core on rail's patch.  Should dev match core, then?

Regarding encoding different machine numbers in puppet, as I said above, that's temporary until we get out of those datacenters, so there's no sense in doing extra work to get that data from elsewhere.
(In reply to Dustin J. Mitchell [:dustin] from comment #20)
> I assume those are directory contents of .ssh, rather than a key named
> 'authorized_keys', right?

Correct.

> Are you sure all of those keys are in use?  What
> about id_dsa - what's that?

That's the old "cltbld" key. I'm actually not sure if it's in use anymore...

> I based the set of keys specified for try and
> core on rail's patch.  Should dev match core, then?

dev should be both core and try, with different contents. We don't split the dev pool between try and non-try.

> Regarding encoding different machine numbers in puppet, as I said above,
> that's temporary until we get out of those datacenters, so there's no sense
> in doing extra work to get that data from elsewhere.

Ahhh, ok.
I *really* don't want to cargo-cult old keys into this new world, and if we do, I'd love for them to not be named 'id'.  The questionable keys, then, are
 * aus - should this be auspush?
 * cvs
 * id_rsa
 * id_dsa

aside from that, my understanding of the config is

    $builder_ssh_keys_by_trustlevel = {
        "dev" => ['trybld_dsa', 'b2gtry_dsa', 'auspush', 'ffxbld_dsa', 'xrbld_dsa', 'tbirdbld_dsa', 'b2gbld_dsa'],
        "try" => ['trybld_dsa', 'b2gtry_dsa'],
        "core" => ['auspush', 'ffxbld_dsa', 'xrbld_dsa', 'tbirdbld_dsa', 'b2gbld_dsa'],
    }
(In reply to Dustin J. Mitchell [:dustin] from comment #22)
> I *really* don't want to cargo-cult old keys into this new world, and if we
> do, I'd love for them to not be named 'id'.
> The questionable keys, then, are
>  * aus - should this be auspush?
>  * cvs
>  * id_rsa
>  * id_dsa

Does figuring this out need to block you? I can look into it, but not right away.

If we do do any file renaming we'll have to keep the old+new names in a short transition period, because our automation will be pointed at a specific filename.

> 
> aside from that, my understanding of the config is
> 
>     $builder_ssh_keys_by_trustlevel = {
>         "dev" => ['trybld_dsa', 'b2gtry_dsa', 'auspush', 'ffxbld_dsa',
> 'xrbld_dsa', 'tbirdbld_dsa', 'b2gbld_dsa'],
>         "try" => ['trybld_dsa', 'b2gtry_dsa'],
>         "core" => ['auspush', 'ffxbld_dsa', 'xrbld_dsa', 'tbirdbld_dsa',
> 'b2gbld_dsa'],
>     }

Looks right to me.
It does block deploying this because the patch will purge unrecognized keys.  This is a Q4 goal, though, so a few days' waiting is OK by me.  I'll flag you as needinfo so bugzilla will nag you :)
Flags: needinfo?(bhearsum)
Found some time to look today after all!

(In reply to Dustin J. Mitchell [:dustin] from comment #22)
> I *really* don't want to cargo-cult old keys into this new world, and if we
> do, I'd love for them to not be named 'id'.  The questionable keys, then, are
>  * aus - should this be auspush?

Turns out this was the _staging_ auspush key on the try machines. Very silly.

>  * cvs
>  * id_rsa
>  * id_dsa

AFAICT all of these keys are dead. I don't see any references to them in our repositories. I'll ask around a bit, but I don't think you need to block on this.
Flags: needinfo?(bhearsum)
What keys need to be in place on buildmasters?  As it stands, this will purge unrecognized ssh keys from all hosts, so all ssh keys anywhere will need to be handled by puppet.
(In reply to Dustin J. Mitchell [:dustin] from comment #26)
> What keys need to be in place on buildmasters?  As it stands, this will
> purge unrecognized ssh keys from all hosts, so all ssh keys anywhere will
> need to be handled by puppet.

Masters (build, test, try) have:
aus  authorized_keys  b2gbld_dsa  b2gtry_dsa  config  ffxbld_dsa  id_rsa  known_hosts  tbirdbld_dsa  trybld_dsa  xrbld_dsa

Like the slaves, "aus" and "id_rsa" can go away here.

We should audit other hosts and make sure they don't need keys either, specifically:
foopy's
signing machines
puppetmasters
slaveapi
(In reply to Ben Hearsum [:bhearsum] from comment #27)
> We should audit other hosts and make sure they don't need keys either,
> specifically:
> foopy's

These have a key atm for stats upload for http://mobile-dashboard.pub.build.mozilla.org/ however we no longer even add keys after foopy reimages, and are planning on fully decomming that dashboard, so its not a requirement anymore.
Comment on attachment 817147 [details] [diff] [review]
bug792836-p1.patch

OK, so I'll redraft for masters.  Foopies are OK per callek, puppetmasters are fine per me, and that just leaves slaveapi, which I assume is in your area of expertise, Ben?
Attachment #817147 - Flags: review?(rail)
(In reply to Dustin J. Mitchell [:dustin] from comment #29)
> Comment on attachment 817147 [details] [diff] [review]
> bug792836-p1.patch
> 
> OK, so I'll redraft for masters.  Foopies are OK per callek, puppetmasters
> are fine per me, and that just leaves slaveapi, which I assume is in your
> area of expertise, Ben?

Just doublechecked. No keys necessary on slaveapi.
Tested with --noop on
 buildbot-master80.srv.releng.usw2.mozilla.com
 bld-linux64-ec2-100.build.releng.use1.mozilla.com
 bld-lion-r5-030.try.releng.scl3.mozilla.com
 bld-centos6-hp-001.build.scl1.mozilla.com
and created the proper secrets in my environment.  There were no substantive changes to the keys (just removing trailing whitespace).

I need to figure out if secrets are getting sync'd between masters, then I'll make a plan to start deploying this.
Secrets are sync'd between masters, just not per-user secrets.  So that's solved.

However, it turns out that dev slaves are *not* identifiable by hostname, and in fact slaves often move back and forth between dev and prod.  So I think that, rather than a trustlevel, dev should be considered an aspect (and let's call it "staging", like other uses of that aspect) of production hosts.

I have myself mostly convinced that, at this point, the aspect is not a security-sensitive question -- a dev host *is* a prod host, and only has non-production keys on it to prevent accidental uploads to the wrong place.  The only questionable part here is that dev hosts are sometimes loaned to users, at which point we need to make sure that (a) they don't have prod keys and (b) they don't have any way to get prod keys.  Buildduty manually deletes keys when loaning, taking care of (a), and buildduty disables puppet when loaning, taking care of (b).

So we can determine the staging aspect using untrusted data.  I see two ways to determine that a host has this aspect:

 1. A custom fact, based on presence of a file on disk (e.g., /builds/STAGING).  This has the advantage of being immediate - touch the file and run puppet, and the keys are switched - but represents a different source of truth than slavealloc

 2. Sync data from slavealloc and use that to determine the aspect.  This will have a substantial replication delay: wait for the sync from slavealloc, wait for that to replicate from the distinguished master, then reboot the slave or re-run puppet.

Note that either one could be forged pretty easily.

I'll give #2 a shot.
Duplicate of this bug: 642790
Depends on: 931056
This sets up a sync of all of the data from slavealloc (which is a lot!)

I'll need to land this at least 10m before the next patch so that the data exists.
Attachment #676615 - Attachment is obsolete: true
Attachment #817144 - Attachment is obsolete: true
Attachment #817147 - Attachment is obsolete: true
Attachment #830414 - Flags: review?(bhearsum)
Attached patch bug792836-aspects.patch (obsolete) — Splinter Review
..and this patch sets $aspects appropriately, and sets up the rest of the secret-handling stuff.  I'll need to make sure that the secrets are set up before this lands.
Attachment #830417 - Flags: review?(bhearsum)
Comment on attachment 830414 [details] [diff] [review]
bug792836-slavealloc-extsync.patch

Just on a cursory skim I'm not too big a fan of this approach, but happy to defer to ben's opinion since I'm not reviewer and just wish to share my opinion [bikeshed]

This makes a much bigger file than we need for this work, because it also pulls into YAML the comment from slavealloc, the pool, enabled/disabled state, etc. where we only really need the info necessary for trust level determinism.

It might also be prudent to exclude decomm slaves from the YAML entirely.

The bloat in file size slows down the /etc/hiera rsync end to end time as well, of course.
Attachment #830414 - Flags: feedback-
Comment on attachment 830417 [details] [diff] [review]
bug792836-aspects.patch

Review of attachment 830417 [details] [diff] [review]:
-----------------------------------------------------------------

Haven't looked at everything yet, but a couple things I noticed on my first glance...

::: manifests/moco-nodes.pp
@@ +61,5 @@
>  
>  ## builders
>  
> +# builders' secrets are handled along two axes: $slave_trustlevel and $aspects.
> +# A host's trustlevel is the level of code it builds: try or core.  Either type

Unfortunately, this isn't quite right. We talked about this in comments 22 and 23. core/try each have their own sets of ssh keys, with no overlap. Dev/staging machines have _all_ of the dev/staging keys on them. These machines have "dev" for their trustlevel in addition to being in the "dev/pp" environment. I'm not sure if it's necessary to have both of these set, but it is necessary for the dev/pp slaves to have all of the dev/pp keys.

Sorry, I should've mentioned the trustlevel part when we were talking in #releng earlier.

@@ +85,5 @@
> +            $slave_trustlevel = 'core'
> +        } elsif $1 <= 53 {
> +            $slave_trustlevel = 'try'
> +        }
> +    }

I'm confused about why slave numbers are being hardcoded here. I thought a large part of the reason we were syncing from slavealloc was to use it as the source of truth for this?
Callek: how do I identify a decommed slave?

Ben: the trustlevel is determined from the hostname because we can't trust slavealloc's notion of trust level, since it's unauthenticated cleartext.  For all but mtv1/scl1, the VLAN determines the trustlevel.  I'm starting to think that's the root of the problem here, and that hacking around it in puppet is just asking for pain and vulnerabilities.

That said, I think the hack you're suggesting is to override the trustlevel *downward* to 'dev' when slavealloc indicates that a host has env 'dev/pp'.  So rather than do anything with aspects, we have:

 if (slavealloc_info('environment') == 'dev/pp') {
   $trustlevel = 'dev'
 } else {
   .. # determine trustlevel based on hostname
 }

Would that do the trick?
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #38)
> Ben: the trustlevel is determined from the hostname because we can't trust
> slavealloc's notion of trust level, since it's unauthenticated cleartext. 
> For all but mtv1/scl1, the VLAN determines the trustlevel.  I'm starting to
> think that's the root of the problem here, and that hacking around it in
> puppet is just asking for pain and vulnerabilities.
> 
> That said, I think the hack you're suggesting is to override the trustlevel
> *downward* to 'dev' when slavealloc indicates that a host has env 'dev/pp'. 
> So rather than do anything with aspects, we have:
> 
>  if (slavealloc_info('environment') == 'dev/pp') {
>    $trustlevel = 'dev'
>  } else {
>    .. # determine trustlevel based on hostname
>  }
> 
> Would that do the trick?

Yup, that should be fine. Thanks!
Attachment #830414 - Flags: review?(bhearsum) → review+
Comment on attachment 830417 [details] [diff] [review]
bug792836-aspects.patch

Review of attachment 830417 [details] [diff] [review]:
-----------------------------------------------------------------

This looks OK to me, pending the change from comment #38. Do you have a host or two that are puppetized with these that I could peek at too?
Attachment #830417 - Flags: review?(bhearsum) → feedback+
I don't have any hosts puppetized, but I can do so temporarily in relabs, once we sort out the below.

(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #38)
>  if (slavealloc_info('environment') == 'dev/pp') {
>    $trustlevel = 'dev'
>  } else {
>    .. # determine trustlevel based on hostname
>  }

To be clear, "dev/pp" just means that hosts are connected to staging masters and should be talking to staging services (dev-stage01, etc.), right?  So their trust level actually isn't any different from the trustlevel implied by their hostname?

I ask because I referred to "overrid[ing] the trustlevel *downward*" with slavealloc, e.g., from prod -> dev.  However, if such a dev host is made accessible to an untrusted party or code, it would only take a simple HTTP post to slavealloc to change the environment back to prod and run puppet; that would be an *upward* change in the trust level, dev -> prod.

I don't like detailed, special-case-laden arguments about security, especially about security of something as critical as this.  I'm still on the fence between "wait for a trustworthy slavealloc" and "implement and hope we're OK".

Thoughts?
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #41)
> I don't have any hosts puppetized, but I can do so temporarily in relabs,
> once we sort out the below.
> 
> (In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo
> me) from comment #38)
> >  if (slavealloc_info('environment') == 'dev/pp') {
> >    $trustlevel = 'dev'
> >  } else {
> >    .. # determine trustlevel based on hostname
> >  }
> 
> To be clear, "dev/pp" just means that hosts are connected to staging masters
> and should be talking to staging services (dev-stage01, etc.), right?

That's right.

> So
> their trust level actually isn't any different from the trustlevel implied
> by their hostname?

Depends on the host. For example, bld-lion-r5-087.build.releng.scl3.mozilla.com is currently in the dev/pp environment, but the hostname implies "core" trustlevel.

> I ask because I referred to "overrid[ing] the trustlevel *downward*" with
> slavealloc, e.g., from prod -> dev.  However, if such a dev host is made
> accessible to an untrusted party or code, it would only take a simple HTTP
> post to slavealloc to change the environment back to prod and run puppet;
> that would be an *upward* change in the trust level, dev -> prod.

It's not the best argument, but if you already have access to RelEng machines it's not terribly difficult to pwn them, even remotely, given that their completely unpatched. 

> 
> I don't like detailed, special-case-laden arguments about security,
> especially about security of something as critical as this.  I'm still on
> the fence between "wait for a trustworthy slavealloc" and "implement and
> hope we're OK".

Given the big benefit of managing the secrets, I don't think we should hold this on slavealloc improvements. The fact that a very small number of people can route to this machines has been our foremost layer of security for a long time.
Ben and I talked directly.

Slavealloc itself has two columns -- trustlevel and environment.  The proposal is to make slavealloc's trustlevel mirror that determined from the hostname.  That's a hard, permanent fact about the host, is reflected in its VLAN, and requires a complete reimage to change.  Puppet will determine trustlevel from the hostname, not from untrusted slavealloc data.

The environment, on the other hand, can change frequently, with slavealloc as the source of truth on that matter.

Now, we want different sets of keys:

trust  env      keys
-----  -----    -----
core   prod     real core keys (e.g., ffxbld)
try    prod     real try keys (e.g., trybld)
core   dev/pp   all staging keys (e.g., staging ffxbld, staging trybld)
try    dev/pp    --"--                 --"--                    --"--

So two things to fix:
 1. Remove the "dev" trustlevel in slavealloc (I'll make a new bug).
 2. Set up puppet to determine the key set more flexibly than just $trustlevel.

I need to give #2 a little thought to come up with an elegant design, but the principle is clear now.
Depends on: 938251
Duplicate of this bug: 624622
Updated slavealloc patch that only syncs "envrionment".
Attachment #830414 - Attachment is obsolete: true
Attachment #832311 - Flags: review?(bugspam.Callek)
Attachment #832311 - Flags: review?(bugspam.Callek) → review+
Attached patch bug792836-slavesecrets.patch (obsolete) — Splinter Review
OK, this introduces the notion of a "keyset" for slaves, and also addresses slave_secrets to all slaves, not just build.

For moco, the keyset is based on trust level and environment.
Attachment #832323 - Flags: review?(bhearsum)
Attachment #830417 - Attachment is obsolete: true
Comment on attachment 832311 [details] [diff] [review]
bug792836-slavealloc.patch

Landed, with a change to $puppetmaster_extsyncs for relabs.
Attachment #832311 - Flags: checked-in+
Duplicate of this bug: 678804
Comment on attachment 832323 [details] [diff] [review]
bug792836-slavesecrets.patch

Review of attachment 832323 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine overall, just a few questions.

::: manifests/moco-nodes.pp
@@ +95,5 @@
> +            'b2gbld_dsa' => 'builder_ssh_key_prod_b2gbld_dsa',
> +        },
> +        'dev/pp' => $staging_keyset,
> +    }
> +}

Would moco-config.pp be a better place for these? It seems a little weird to have them in with the nodes.

@@ +107,5 @@
> +            # decommed
> +        } elsif $1 <= 37 {
> +            $slave_trustlevel = 'core'
> +        } elsif $1 <= 53 {
> +            $slave_trustlevel = 'try'

38 through 48 got repurposed in bug 894394 - I updated slavealloc to reflect this.

@@ +109,5 @@
> +            $slave_trustlevel = 'core'
> +        } elsif $1 <= 53 {
> +            $slave_trustlevel = 'try'
> +        }
> +    }

I think you need a moco_slave_keyset call here too, to support staging?

::: modules/slave_secrets/manifests/init.pp
@@ +9,5 @@
> +    }
> +
> +    # compare the existing trustlevel to the specified; if they're not the same, we're in trouble
> +    if ($existing_slave_trustlevel != '' and $existing_slave_trustlevel != $slave_trustlevel) {
> +        fail("This host used to have trust level ${existing_slave_trustlevel}, and cannot be changed to ${slave_trustlevel} without a reimage")

My first thought here was "we need a re-image to switch to dev/staging"? But then I realized that that staging isn't a trust level so that's not the case, right?

::: modules/ssh/manifests/userconfig.pp
@@ +28,5 @@
>              ensure => directory,
>              mode => filemode(0700),
>              owner => $username,
> +            group => $group_,
> +            purge => true,

I'm pretty sure I'm just misunderstanding, but doesn't this mean that we'll always purge ssh keys? Or does the Anchoring around it (eg https://mxr.mozilla.org/build-central/source/puppet/modules/users/manifests/builder/setup.pp#24) prevent that?
Comment on attachment 832323 [details] [diff] [review]
bug792836-slavesecrets.patch

Review of attachment 832323 [details] [diff] [review]:
-----------------------------------------------------------------

::: manifests/moco-nodes.pp
@@ +95,5 @@
> +            'b2gbld_dsa' => 'builder_ssh_key_prod_b2gbld_dsa',
> +        },
> +        'dev/pp' => $staging_keyset,
> +    }
> +}

Callek was suggesting putting them in slave_secrets::moco or something like that.  I'll play around and see if I can find something more elegant.

@@ +109,5 @@
> +            $slave_trustlevel = 'core'
> +        } elsif $1 <= 53 {
> +            $slave_trustlevel = 'try'
> +        }
> +    }

Yes.

::: modules/slave_secrets/manifests/init.pp
@@ +9,5 @@
> +    }
> +
> +    # compare the existing trustlevel to the specified; if they're not the same, we're in trouble
> +    if ($existing_slave_trustlevel != '' and $existing_slave_trustlevel != $slave_trustlevel) {
> +        fail("This host used to have trust level ${existing_slave_trustlevel}, and cannot be changed to ${slave_trustlevel} without a reimage")

Correct.

::: modules/ssh/manifests/userconfig.pp
@@ +28,5 @@
>              ensure => directory,
>              mode => filemode(0700),
>              owner => $username,
> +            group => $group_,
> +            purge => true,

Yes, we'll always purge SSH keys that aren't specified in puppet.  That's kind of the goal here!  Is that a problem?
Attachment #832323 - Flags: review?(bhearsum) → review-
Hm, those replies in the splinter review didn't create the comment I'd hoped for.  Splinter itself will thread the replies, if it's not clear from the above.
Comment on attachment 832323 [details] [diff] [review]
bug792836-slavesecrets.patch

Review of attachment 832323 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/ssh/manifests/userconfig.pp
@@ +28,5 @@
>              ensure => directory,
>              mode => filemode(0700),
>              owner => $username,
> +            group => $group_,
> +            purge => true,

No, that's great! I just thought it would purge _all_ keys, even if you specified some. By the sounds of it, this purge is applied, and then afterwards the manifests that sync the desired keys is applied.
Attached patch bug792836-p2.patch (obsolete) — Splinter Review
This moves the specification of slave SSH keys into slave_secrets::ssh_keys, keyed by org.
Attachment #832323 - Attachment is obsolete: true
Attachment #8334533 - Flags: review?(bhearsum)
Attachment #8334533 - Flags: feedback?(bugspam.Callek)
Comment on attachment 8334533 [details] [diff] [review]
bug792836-p2.patch

Review of attachment 8334533 [details] [diff] [review]:
-----------------------------------------------------------------

Much better overall, one concern below.

::: modules/slave_secrets/manifests/google_api_key.pp
@@ +5,5 @@
> +class slave_secrets::google_api_key($ensure=present) {
> +    include dirs::builds
> +
> +    # this is only installed in the 'core' trustlevel
> +    if ($ensure == 'present' and $slave_trustlevel == 'core') {

We need this for try as well.
Attachment #8334533 - Flags: feedback?(bugspam.Callek) → feedback+
Comment on attachment 8334533 [details] [diff] [review]
bug792836-p2.patch

Review of attachment 8334533 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Justin Wood (:Callek) from comment #54)
> Comment on attachment 8334533 [details] [diff] [review]
> bug792836-p2.patch
> 
> Review of attachment 8334533 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Much better overall, one concern below.
> 
> ::: modules/slave_secrets/manifests/google_api_key.pp
> @@ +5,5 @@
> > +class slave_secrets::google_api_key($ensure=present) {
> > +    include dirs::builds
> > +
> > +    # this is only installed in the 'core' trustlevel
> > +    if ($ensure == 'present' and $slave_trustlevel == 'core') {
> 
> We need this for try as well.

r=me with this fixed. I had thought that we didn't put this key on the try slaves, but it turns out that we do.
Attachment #8334533 - Flags: review?(bhearsum) → review+
This turned out to need a few revisions:
 - keys for seamonkey
 - google API key on try builders too
 - make the google API key optional
 - use different keys for different slave types (test vs. build)
Attachment #8334533 - Attachment is obsolete: true
Attachment #8338586 - Flags: review?(bugspam.Callek)
Comment on attachment 8338586 [details] [diff] [review]
bug792836-p3.patch

Review of attachment 8338586 [details] [diff] [review]:
-----------------------------------------------------------------

r+ based on inspection of interdiff below

::: modules/slave_secrets/manifests/google_api_key.pp
@@ +6,5 @@
> +    include config
> +    include dirs::builds
> +
> +    # this is only installed in the 'core' trustlevel
> +    if ($ensure == 'present' and $config::install_google_api_key) {

note, previously google_api_key was set to always install on slaves, even for SeaMonkey, so I'm not sure we `need` the config conditional.

That said, the comment above this conditional is out of date, and we also don't actually want this on test slaves, but I believe from scanning that we don't try to include this class on test slaves.
Attachment #8338586 - Flags: review?(bugspam.Callek) → review+
You're right - here's a fix.
Attachment #8338618 - Flags: review+
Well, I was testing this with --noop on various types of machines, and come to find out that from bug 795060 AWS systems have different keys than other systems.  So back to the (now very badly in need of expansion) drawing board.
Word from the sec folks is that AWS is part of the build network, and there's no reason to hand out different keys -- with the same mechanism no less -- there.  So, back to my --noop testing.
Here's my check, running this patch from my environment.  These are truncated md5 caches of private keys, all trailing newlines stripped; the hashes in the puppet messages are different because they don't strip newlines.  If you see any issues, speak now!

I applied the following to my environment, fixing typos:

diff --git a/modules/slave_secrets/manifests/ssh_keys.pp b/modules/slave_secrets/manifests/ssh_keys.pp
index a312a97..9773366 100644
--- a/modules/slave_secrets/manifests/ssh_keys.pp
+++ b/modules/slave_secrets/manifests/ssh_keys.pp
@@ -19,8 +19,8 @@ class slave_secrets::ssh_keys($slave_type) {
                 'b2gbld_dsa' => 'builder_ssh_key_staging_b2gbld_dsa',
             }
             $prod_try_keyset = {
-                'trybld_dsa' => 'builder_ssh_key_prod_trybld_dsa',
-                'b2gtry_dsa' => 'builder_ssh_key_prod_b2gbld_dsa',
+                'trybld_dsa' => 'builder_ssh_key_try_trybld_dsa',
+                'b2gtry_dsa' => 'builder_ssh_key_try_b2gtry_dsa',
             }
             $prod_core_keyset = {
                 'auspush' => 'builder_ssh_key_prod_auspush',

Notes:
 - ffxbld_dsa, xrbld_dsa for masters and prod don't match.  Is that OK?
 - I only used keys from scl3.  The keys on hosts in AWS will change.  Should I file a new bug to remove those from authorized_keys on the relevant hosts?

Keys in puppet:

buildmaster_ssh_key_b2gbld_dsa c11d083a71c3  -
buildmaster_ssh_key_b2gtry_dsa f6d18a95fe4d  -
buildmaster_ssh_key_ffxbld_dsa 3d2b0d9d57e3  -
buildmaster_ssh_key_tbirdbld_dsa 644180525416  -
buildmaster_ssh_key_trybld_dsa 4cd7ca992730  -
buildmaster_ssh_key_xrbld_dsa 0689b3f3393b  -
builder_ssh_key_prod_auspush b8a55f2cfef3  -
builder_ssh_key_prod_ffxbld_dsa d106c84f3f47  -
builder_ssh_key_prod_xrbld_dsa dc3ba71c72b1  -
builder_ssh_key_prod_tbirdbld_dsa ea03b81a29fc  -
builder_ssh_key_prod_b2gbld_dsa 92e8e57fd9f4  -
builder_ssh_key_try_trybld_dsa 4cd7ca992730  -
builder_ssh_key_try_b2gtry_dsa f6d18a95fe4d  -
builder_ssh_key_staging_trybld_dsa 90736c2a29cd  -
builder_ssh_key_staging_b2gtry_dsa 5fe726ed1176  -
builder_ssh_key_staging_auspush 53c586e93ea4  -
builder_ssh_key_staging_ffxbld_dsa dfa1ee790387  -
builder_ssh_key_staging_xrbld_dsa cfb4af898ec6  -
builder_ssh_key_staging_tbirdbld_dsa 063658c5c2c1  -
builder_ssh_key_staging_b2gbld_dsa 319ac29ac16c  -

** buildbot-master63.srv.releng.use1.mozilla.com:

aus 53c586e93ea4  - should be removed
b2gbld_dsa c11d083a71c3  - correct
b2gtry_dsa f6d18a95fe4d  - correct
ffxbld_dsa 3d2b0d9d57e3  - correct
id_rsa 53c586e93ea4  - should be removed
tbirdbld_dsa 644180525416  - correct
trybld_dsa 4cd7ca992730  - correct
xrbld_dsa 0689b3f3393b  - correct

Notice: /File[/home/cltbld/.ssh/aus]/ensure: current_value file, should be absent (noop)
Notice: /File[/home/cltbld/.ssh/id_rsa]/ensure: current_value file, should be absent (noop)

** bld-lion-r5-017.try.scl3.mozilla.com

b2gtry_dsa f6d18a95fe4d - correct
b2gtry_dsa.pub 02dd46f6cfa1 - should be removed
id_dsa 4cd7ca992730 - should be removed
id_dsa.pub fbdc438ee2c0 - should be removed
trybld_dsa 4cd7ca992730 - correct
trybld_dsa.pub fbdc438ee2c0 - should be removed

Notice: /File[/Users/cltbld/.ssh/id_dsa]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/b2gtry_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/trybld_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/id_dsa.pub]/ensure: current_value file, should be absent (noop)

** bld-lion-r5-043.build.releng.scl3.mozilla.com (staging)

aus 53c586e93ea4 - should be removed
cvs 6a1512185e68 - should be removed
cvs.pub 9726ed3fc81e - should be removed
ffxbld_dsa dfa1ee790387 - correct
ffxbld_dsa.pub e043f5d1b10a - should be removed
id_dsa 53c586e93ea4 - should be removed
id_rsa 53c586e93ea4 - should be removed
id_rsa.pub 2e82042c073e - should be removed
tbirdbld_dsa 063658c5c2c1 - correct
trybld_dsa 90736c2a29cd - correct
trybld_dsa.pub fdce1ff66566 - should be removed
xrbld_dsa cfb4af898ec6 - correct
xrbld_dsa.pub 4cf45203360a - should be removed
b2gtry_dsa - needed
b2gbld_dsa - needed
auspush - needed

Notice: /File[/Users/cltbld/.ssh/aus]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/cvs]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/cvs.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/ffxbld_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/id_dsa]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/id_rsa]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/id_rsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/trybld_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/xrbld_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[b2gtry_dsa]/Ssh::User_privkey[b2gtry_dsa]/File[/Users/cltbld/.ssh/b2gtry_dsa]/ensure: current_value absent, should be file (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[b2gbld_dsa]/Ssh::User_privkey[b2gbld_dsa]/File[/Users/cltbld/.ssh/b2gbld_dsa]/ensure: current_value absent, should be file (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[auspush]/Ssh::User_privkey[auspush]/File[/Users/cltbld/.ssh/auspush]/ensure: current_value absent, should be file (noop)

** talos-linux64-ix-030.test.releng.scl3.mozilla.com

nothing present - correct

puppet does nothing - correct

** bld-lion-r5-044.build.releng.scl3.mozilla.com

auspush 332a7f070b4f - !!
b2gbld_dsa c11d083a71c3 - !!
b2gbld_dsa.pub 94e85bec7430 - should be removed 
ffxbld_dsa 3d2b0d9d57e3 - !!
ffxbld_dsa.pub 376fa1a6254b - should be removed
id_dsa 2e8d2654c53b - should be removed
id_dsa.pub cc09d26ff850 - should be removed
tbirdbld_dsa 644180525416 - !!
xrbld_dsa 7ab2229edb64 - !!

The "!!" keys are different from what puppet will install, although they work fine:

[cltbld@bld-lion-r5-044.build.releng.scl3.mozilla.com .ssh]$ ssh -i auspush ffxbld@aus3-staging.mozilla.org
Warning: Permanently added the RSA host key for IP address '10.8.74.90' to the list of known hosts.
[cltbld@bld-lion-r5-044.build.releng.scl3.mozilla.com .ssh]$ ssh -i b2gbld_dsa b2gbld@stage.mozilla.org
Last login: Tue Nov 26 18:47:59 2013 from zlb1.dmz.scl3.mozilla.com
[cltbld@bld-lion-r5-044.build.releng.scl3.mozilla.com .ssh]$ ssh -i ffxbld_dsa ffxbld@stage.mozilla.org
Last login: Mon Dec  2 04:57:02 2013 from zlb1.dmz.scl3.mozilla.com
[cltbld@bld-lion-r5-044.build.releng.scl3.mozilla.com .ssh]$ ssh -i tbirdbld_dsa tbirdbld@stage.mozilla.org
Last login: Tue Nov 26 15:11:48 2013 from zlb1.dmz.scl3.mozilla.com
[cltbld@bld-lion-r5-044.build.releng.scl3.mozilla.com .ssh]$ ssh -i xrbld_dsa xrbld@stage.mozilla.org
Last login: Wed Dec  4 08:33:13 2013 from zlb1.dmz.scl3.mozilla.com

Notice: /File[/Users/cltbld/.ssh/b2gbld_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/id_dsa]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/id_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /File[/Users/cltbld/.ssh/ffxbld_dsa.pub]/ensure: current_value file, should be absent (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[xrbld_dsa]/Ssh::User_privkey[xrbld_dsa]/File[/Users/cltbld/.ssh/xrbld_dsa]/content: current_value {md5}2f51dc98ab1e, should be {md5}073a4cb466a4 (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[b2gbld_dsa]/Ssh::User_privkey[b2gbld_dsa]/File[/Users/cltbld/.ssh/b2gbld_dsa]/content: current_value {md5}b22de60b605a, should be {md5}ec90e39889b0 (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[auspush]/Ssh::User_privkey[auspush]/File[/Users/cltbld/.ssh/auspush]/content: current_value {md5}4bb519d9d89c, should be {md5}5375ef84a6bf (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[ffxbld_dsa]/Ssh::User_privkey[ffxbld_dsa]/File[/Users/cltbld/.ssh/ffxbld_dsa]/content: current_value {md5}166b900d8ef5, should be {md5}95f39334325d (noop)
Notice: /Stage[main]/Slave_secrets::Ssh_keys/Slave_secrets::Ssh_key[tbirdbld_dsa]/Ssh::User_privkey[tbirdbld_dsa]/File[/Users/cltbld/.ssh/tbirdbld_dsa]/content: current_value {md5}afcf3de48074, should be {md5}8e29eeba47fb (noop)
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #62)
> Notes:
>  - ffxbld_dsa, xrbld_dsa for masters and prod don't match.  Is that OK?

Assuming the new ones work to connect to the known places, yes. I also assume that they match "something" (e.g. current aws keys match these maybe?)

>  - I only used keys from scl3.  The keys on hosts in AWS will change. 
> Should I file a new bug to remove those from authorized_keys on the relevant
> hosts?

Please, I'll want to *try* and identify what was using any keys we're stripping from authorized_keys though, but I don't know if doing an audit is going to be any easier than just removing it and watch for breakage.

I'd also likely not do the actual work for this until after the new year, to account for possible issues with chemspills and us not being around, to reduce risk.


> ** bld-lion-r5-044.build.releng.scl3.mozilla.com
> 
> auspush 332a7f070b4f - !!
> b2gbld_dsa c11d083a71c3 - !!
> b2gbld_dsa.pub 94e85bec7430 - should be removed 
> ffxbld_dsa 3d2b0d9d57e3 - !!
> ffxbld_dsa.pub 376fa1a6254b - should be removed
> id_dsa 2e8d2654c53b - should be removed
> id_dsa.pub cc09d26ff850 - should be removed
> tbirdbld_dsa 644180525416 - !!
> xrbld_dsa 7ab2229edb64 - !!
> 
> The "!!" keys are different from what puppet will install, although they
> work fine:

We think someone may have grabbed keys from an AWS host rather than another in-house host at some point, this ought to be fine given our needs here.

- I did not do a hefty audit of the keys used since ben was pretty good with that thus far.
I don't know what the current AWS keys are.  The slaves are kind of a mess with what's where, so it's hard to reliably know what key is what.  The new keys do work, so let's just install those.
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #62)
> Notes:
>  - ffxbld_dsa, xrbld_dsa for masters and prod don't match.  Is that OK?

Can you expand on this? I compared bm63 to bld-centos6-hp-006 to bld-lion-r5-004 and didn't find differences here.

>  - I only used keys from scl3.  The keys on hosts in AWS will change. 
> Should I file a new bug to remove those from authorized_keys on the relevant
> hosts?

Yes please!

Based on the rest of your comment everything looks in order modulo the ffxbld/xrbld confusion getting cleared up.
The confusion was just the AWS and non-AWS key difference.  I changed the secret so that there's only one xrbld_dsa and one ffxbld_dsa in puppet, although given the proliferation of keys hither and yon I can't say whether that's the old or new key.  They seem to all work interchangeably anyway (as designed).
Landed, with :bhearsum's OK.  We're watching for problems.
There was a leading '\' in the b2gbld_dsa key.  I have no idea how that got there, but it's gone now.
It looks like this is good enough that we're not rolling back, at least.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.