Fix nagios alerts for preproduction and preproduction-stage

RESOLVED WONTFIX

Status

RESOLVED WONTFIX
7 years ago
5 months ago

People

(Reporter: coop, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [monitoring][nagios][preproduction])

Attachments

(1 attachment)

On old pp-master we had nagios checks not puppetized.
Right now to make it work with the new system we have to fix the following checks:

* MySQL connectivity: it should use localhost 
* buildbot: min 1, max 6 processes

Updated

6 years ago
Priority: -- → P3
Created attachment 638503 [details] [diff] [review]
preproduction puppet changes

* Added min/max_master variable
* check_mysql moved to check_mysql.cfg.erb
* $libdir moved to paths::libdir

I tested it with --noop --environmnt=rail to check the diff and it looked fine on pp-master and on one of the production bm.
Attachment #638503 - Flags: review?(catlee)
Assignee: nobody → rail
I found another couple of issues.

preproduction-master is trying to upload Thunderbird logs using the ffxbld key, so we end up with alerts like this
 preproduction-master.srv.releng.scl3:Command Queue is CRITICAL: 9 dead items
when that fails. If you look at /builds/buildbot/builder-master/postrun.cfg it doesn't have all the things you'd expect from a prod master. Is this a problem with how we setup the masters after tearing them down every week ?

/builds/buildbot/release-master/master/postrun.cfg is also missing.

Puppet also seems to be busted:
Jul  4 14:30:53 preproduction-master puppet-agent[5775]: Could not retrieve catalog from remote server: Error 400 on SERVER: No support for http method POST
Jul  4 14:30:53 preproduction-master puppet-agent[5775]: Using cached catalog
Jul  4 14:30:53 preproduction-master puppet-agent[5775]: Could not run Puppet configuration client: interning empty string
Google says this is from running a v2.7 client against a v2.6 master, but we have 0.25.4 on pp-m, and 0.25.5 on master-puppet1.
> /builds/buildbot/release-master/master/postrun.cfg is also missing.

Yeah, that's bug 739513

postrun.cfg is managed by puppet, but pp masters aren't really managed by it...
Comment on attachment 638503 [details] [diff] [review]
preproduction puppet changes

Review of attachment 638503 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/buildmaster/manifests/init.pp
@@ +25,5 @@
> +        $min_masters = $num_masters
> +    }
> +    if $max_masters == '' {
> +        $max_masters = $num_masters
> +    }

are you sure this works? puppet doesn't like overriding variables.
Comment on attachment 638503 [details] [diff] [review]
preproduction puppet changes

(In reply to Chris AtLee [:catlee] from comment #5)
> Comment on attachment 638503 [details] [diff] [review]
> preproduction puppet changes
> 
> Review of attachment 638503 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: modules/buildmaster/manifests/init.pp
> @@ +25,5 @@
> > +        $min_masters = $num_masters
> > +    }
> > +    if $max_masters == '' {
> > +        $max_masters = $num_masters
> > +    }
> 
> are you sure this works? puppet doesn't like overriding variables.

--noop generated no diff for a production master and generated 1:6 for pp master. I'll investigate this issue deeper.
Attachment #638503 - Flags: review?(catlee)
back to the pool
Assignee: rail → nobody
Priority: P3 → --
Blocks: 885560
We killed preprod masters.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WONTFIX
(Assignee)

Updated

5 years ago
Product: mozilla.org → Release Engineering
(Assignee)

Updated

5 months ago
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.