Closed Bug 1321513 Opened 3 years ago Closed 3 years ago
Add nagios checks for pushapk scriptworkers
3.75 KB, patch
|Details | Diff | Splinter Review|
59 bytes, text/x-review-board-request
Nagios config files have been created on the nodes sides (added in bug 1307826). Pushapkworker doesn't rely on the scriptworker shared module yet (introduced in bug 1316702). If I understand bug 1317783 comment 3 correctly, this may not happen immediately (the scriptworker module enables chain of trust AFAUI). Therefore, I believe we can turn on nagios just like Aki did in bug 1295196  https://hg.mozilla.org/build/puppet/file/8fbdd01cb464/modules/pushapkworker/templates/nagios.cfg.erb
Things have changed since bug 1295196 comment 1, no more SVN repo is required. Releng folks have access to the git repos listed at  (and more precisely, the nagios part of it)  https://mana.mozilla.org/wiki/display/SYSADMIN/Git
The patch is totally similar to bug 1295196. Before asking for a formal review to the folks in #moc, I have a question for you, Aki. I'm not sure I need to add the resource "pushapk-scriptworker-procs", because "scriptworker-procs" already exist. Should I create it anyway, or is it better to just add a hostgroup to "scriptworker-procs"?
Attachment #8816127 - Flags: feedback?(aki)
Comment on attachment 8816127 [details] [diff] [review] add_pushapk_scriptworker_in_nagios.diff The current idea is to have each scriptworker instance use scriptworker-procs, rather than have one *-procs check per scriptworker instance type, especially since I believe they'll look exactly the same other than the name. If there isn't a strong reason to add another check, let's reuse the check.
Attachment #8816127 - Flags: feedback?(aki) → feedback+
Here's a new revision where pushapk-scriptworkers is now listed under "scriptworker-procs" instead of a newly created entry. Aki, do you have the rights to review and merge my patch? Or should I ask Amy?
Oops, I should've added --no-color when generating the diff.
For the record, the patch has also been rebased on top of the latest master.
Comment on attachment 8823221 [details] [diff] [review] add_pushapk_scriptworker_in_nagios_rev3.diff I think :arr is a better reviewer here.
Attachment #8823221 - Flags: review?(aki) → review?(arich)
Comment on attachment 8823221 [details] [diff] [review] add_pushapk_scriptworker_in_nagios_rev3.diff This looks fine to me. I presume that /builds/scriptworker/bin/scriptworker already exists on the client and that the nagios config has been modified in releng puppet to include the nrpe check?
Attachment #8823221 - Flags: review?(arich) → review+
The releng had been modified to support the nrpe check. However, thanks for calling out `/builds/scriptworker`! I noticed pushapk-scriptworker and signing-scriptworker don't share the same root. Therefore, something needs to be changed: a) either on the releng puppet; we make the root pushapk at /build/scriptworker b) or on the current nagios config; we extend the regex to support other roots. I see drawbacks in both scenarios: a) if you're logged on a machine, you may not know what type of scriptworker instance you have, unless you dig into its configuration b) the regex might become a nightmare after several instances being added. On the other had, having consistency across scriptworker instances is better, and will help setting up new ones. Moreover, if you're logged on a machine, there's great probability its FQDN refers to what type of scriptworker instance you're dealing with. Then, I'll change the puppet configuration.  https://hg.mozilla.org/build/puppet/file/3e2076f7aedc/modules/pushapkworker/manifests/init.pp#l63  Pushapk being defined as: https://hg.mozilla.org/build/puppet/file/3e2076f7aedc/manifests/moco-config.pp#l453 Signing-scriptworker as: https://hg.mozilla.org/build/puppet/file/3e2076f7aedc/modules/signing_scriptworker/manifests/settings.pp#l2
Comment on attachment 8823758 [details] Bug 1321513 - Make pushapkworker live under /build/scriptworker https://reviewboard.mozilla.org/r/102242/#review102620 At some point, we should get pushapkworker using modules/scriptworker, and we can make sure all workers using modules/scriptworker have a /builds/scriptworker root. I think this works until that point.
Attachment #8823758 - Flags: review?(aki) → review+
Even though the puppet run was passing yesterday, a nit slipped through the patch. I updated it. The latest revision of that patch passed lint tests there: https://github.com/mozilla/build-puppet/pull/24 I tested the changes against dev-linux64-ec2-jlorenzo.dev.releng.use1.m.c: * /build/pushapkworker doesn't exist anymore * /build/scriptworker does and contains all the files * `supervisorctl status pushapkworker` says the process is running * /builds/scriptworker/logs/worker.log shows the process does what it needs to do (polling TC's queue) I think we're good to merge and deploy attachment 8823758 [details] (which is necessary to land attachment 8823221 [details] [diff] [review]).
Landed on default  and merged in production . Puppet agent applied the changes restarted the process (which restarted properly). On the production machine, I see now /builds/scriptworker/ but not /build/pushapkworker/ anymore. This complies with .  https://hg.mozilla.org/build/puppet/rev/1a429ca163d9  https://hg.mozilla.org/build/puppet/rev/b4d2f4f777e4  https://papertrailapp.com/systems/526654063/events?focus=753674615796166664  https://papertrailapp.com/events?q=program%3Apuppet-agent&focus=753673798523449394
Attachment #8823758 - Flags: checked-in+
Comment on attachment 8823221 [details] [diff] [review] add_pushapk_scriptworker_in_nagios_rev3.diff This patch was landed as revision 568ad3460d032bf498f7bf0e9239938244f64784 on master in the git repo (linked in comment 1).
Attachment #8823221 - Flags: checked-in+
I connected to the releng nagios instance and looked for pushapkwoker. The entry there shows: * 7 checks (PING + the 6 new checks added with attachment 8823221 [details] [diff] [review]). * /builds/scriptworker/bin/scriptworker is declared as running. I can tell it's also working, because this job has just finished and it ran fine. * Every check is green. Looks like we're all set. Time to close this bug! Thanks for the help Ami and Aki!  Listed at https://mana.mozilla.org/wiki/display/SYSADMIN/Nagios#Nagios-Hardware  https://tools.taskcluster.net/task-inspector/#Tmlx52SQT9uwVhi9oEqagg/
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
(In reply to Aki Sasaki [:aki] from comment #12) > At some point, we should get pushapkworker using modules/scriptworker, and > we can make sure all workers using modules/scriptworker have a > /builds/scriptworker root. I agree. I looked more into using this module. I get the feeling I need a greater version than 0.7.1. That is to say, Chain of Trust has to be enabled. Hence, I guess the module can be added alongside bug 1317783.
Depends on: 1329944
You need to log in before you can comment on or make changes to this bug.