Closed
Bug 870853
Opened 11 years ago
Closed 10 years ago
move off of using ganglia to graphite/collectd
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dividehex, Assigned: dividehex)
References
Details
(Whiteboard: [2013Q4] [tracker])
Attachments
(13 files, 2 obsolete files)
10.56 KB,
patch
|
dustin
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
4.82 KB,
patch
|
dustin
:
review+
Callek
:
feedback+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
675 bytes,
patch
|
dustin
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
437 bytes,
patch
|
rail
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
946 bytes,
patch
|
coop
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
324 bytes,
patch
|
coop
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
1.27 KB,
patch
|
coop
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
21.98 KB,
patch
|
dustin
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
683 bytes,
patch
|
coop
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
1.18 KB,
patch
|
dustin
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
41.12 KB,
image/png
|
Details | |
5.67 KB,
patch
|
dustin
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
2.66 KB,
patch
|
dustin
:
review+
dividehex
:
checked-in+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•11 years ago
|
||
This is the base manifest (and config templates) for the collectd module. It's only written for centos right now but will be expanded to include other OSes as we build packages >= 5.1 For documentation, see: https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/collectd
Attachment #762419 -
Flags: review?(dustin)
Assignee | ||
Comment 2•11 years ago
|
||
+ Port "2003" + Prefix "test.dividehex." +# Postfix "" That prefix is for debugging only and will change to: + Prefix "hosts.releng."
Comment 3•11 years ago
|
||
Comment on attachment 762419 [details] [diff] [review] Base collectd puppet module Looks good overall, just some nits: $graphite_cluster_fqdn = "graphite1.private.scl3.mozilla.com" I'd love to have this ="" in the base, and instead specify it in the org configs (e.g. moco/servo) since I *suspect* you won't open this to SeaMonkey. Even if you do open it to SeaMonkey it sounds like the sort of thing where a good default is no default. In that regard I also suspect we want to *not* install collectd if graphic_cluster is not defined/blank. No sense in collecting stuff if we're not reporting anywhere. Lastly, + Prefix "test.dividehex." should be a config param as well, servo would want different than moco machines I expect. And SeaMonkey, if we have access to teh same graphite instance will certainly want/need a different Prefix.
Attachment #762419 -
Flags: feedback-
Comment 4•11 years ago
|
||
Comment on attachment 762419 [details] [diff] [review] Base collectd puppet module Review of attachment 762419 [details] [diff] [review]: ----------------------------------------------------------------- ..plus what Callek said. ::: modules/collectd/manifests/params.pp @@ +1,5 @@ > +# This Source Code Form is subject to the terms of the Mozilla Public > +# License, v. 2.0. If a copy of the MPL was not distributed with this > +# file, You can obtain one at http://mozilla.org/MPL/2.0/. > +class collectd::params { > + include packages::collectd Why is this include needed? It'd probably be good to import all of the configuration parameters callek mentioned into this class, e.g., $prefix = $::config::collectd_prefix Also, not really a big deal, but we tend to call these collection-of-variables classes whatever::settings, rather than whatever::params. It might be nice to be consistent. ::: modules/collectd/templates/collectd.conf.erb @@ +1,2 @@ > +##### This file under configuration management control > +##### DO NOT EDIT MANUALLY I'm not keen on these headers. Nothing under /etc should be edited manually, whether or not under Puppet's control (in the former case, puppet will revert it; in the latter, the system will behave unexpectedly and that behavior will not persist over a reimage).
Attachment #762419 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 5•11 years ago
|
||
Attachment #762419 -
Attachment is obsolete: true
Attachment #764306 -
Flags: review?(dustin)
Attachment #764306 -
Flags: feedback?(bugspam.Callek)
Comment 6•11 years ago
|
||
Comment on attachment 764306 [details] [diff] [review] Base collectd puppet module Review of attachment 764306 [details] [diff] [review]: ----------------------------------------------------------------- lgtm with that fix ::: modules/config/manifests/base.pp @@ +34,4 @@ > $buildbot_configs_hg_repo = "https://hg.mozilla.org/build/buildbot-configs" > $buildbot_configs_branch = "production" > $buildbot_mail_to = "nobody@mozilla.com" > + $collectd_graphite_cluster_fqdn = "" $collectd_graphite_prefix should be here too
Attachment #764306 -
Flags: review?(dustin) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #764306 -
Flags: feedback?(bugspam.Callek) → checked-in+
Assignee | ||
Comment 7•11 years ago
|
||
:callek brought up the point that if $collectd_graphite_cluster_fqdn goes undefined or as an empty string it should *NOT* fail() but should skip the collectd module all together. I'll slip this change in on the next (ubuntu) patch.
Assignee | ||
Comment 8•11 years ago
|
||
collectd 5.3.0 ubuntu packages (amd64 and i386) have been merged into the repo. Before this was rsync'd I backed up db/ and releng/dists/precise to /home/jwatkins/apt-repo-backup/ root@relabs07:~/data/apt# rsync . jwatkins@releng-puppet1.srv.releng.scl3.mozilla.com:/data/repos/apt/ -avn --progress The authenticity of host 'releng-puppet1.srv.releng.scl3.mozilla.com (10.26.48.45)' can't be established. RSA key fingerprint is c4:e1:71:61:a6:cf:61:47:a4:07:15:82:b2:a8:5e:85. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'releng-puppet1.srv.releng.scl3.mozilla.com,10.26.48.45' (RSA) to the list of known hosts. sending incremental file list db/ db/checksums.db db/packages.db db/references.db db/release.caches.db db/version releng/dists/precise/ releng/dists/precise/Release releng/dists/precise/main/binary-amd64/ releng/dists/precise/main/binary-amd64/Packages releng/dists/precise/main/binary-amd64/Packages.bz2 releng/dists/precise/main/binary-i386/ releng/dists/precise/main/binary-i386/Packages releng/dists/precise/main/binary-i386/Packages.bz2 releng/dists/precise/main/source/ releng/dists/precise/main/source/Sources.gz releng/pool/main/ releng/pool/main/c/ releng/pool/main/c/collectd/ releng/pool/main/c/collectd/collectd-core_5.3.0_amd64.deb releng/pool/main/c/collectd/collectd-core_5.3.0_i386.deb releng/pool/main/c/collectd/collectd-dbg_5.3.0_amd64.deb releng/pool/main/c/collectd/collectd-dbg_5.3.0_i386.deb releng/pool/main/c/collectd/collectd-dev_5.3.0_all.deb releng/pool/main/c/collectd/collectd-utils_5.3.0_amd64.deb releng/pool/main/c/collectd/collectd-utils_5.3.0_i386.deb releng/pool/main/c/collectd/collectd_5.3.0.debian.tar.gz releng/pool/main/c/collectd/collectd_5.3.0.dsc releng/pool/main/c/collectd/collectd_5.3.0.orig.tar.bz2 releng/pool/main/c/collectd/collectd_5.3.0_amd64.deb releng/pool/main/c/collectd/collectd_5.3.0_i386.deb releng/pool/main/c/collectd/libcollectdclient-dev_5.3.0_amd64.deb releng/pool/main/c/collectd/libcollectdclient-dev_5.3.0_i386.deb releng/pool/main/c/collectd/libcollectdclient1_5.3.0_amd64.deb releng/pool/main/c/collectd/libcollectdclient1_5.3.0_i386.deb sent 4484250 bytes received 19820 bytes 191662.55 bytes/sec total size is 117288026202 speedup is 26040.45 (DRY RUN)
Assignee | ||
Comment 9•11 years ago
|
||
This patch adds ubuntu support and now does nothing if a graphite server isn't specified
Attachment #765720 -
Flags: review?(dustin)
Attachment #765720 -
Flags: feedback?(bugspam.Callek)
Comment 10•11 years ago
|
||
Comment on attachment 765720 [details] [diff] [review] Patch for collectd ubuntu support Review of attachment 765720 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/collectd/manifests/init.pp @@ +5,4 @@ > include collectd::settings > > + # do not configure unless graphite server is defined > + if $::config::collectd_graphite_cluster_fqdn or !$::config::collectd_graphite_cluster_fqdn == "" { s/or/and/
Attachment #765720 -
Flags: feedback?(bugspam.Callek) → feedback+
Comment 11•11 years ago
|
||
Comment on attachment 765720 [details] [diff] [review] Patch for collectd ubuntu support Review of attachment 765720 [details] [diff] [review]: ----------------------------------------------------------------- with callek's change
Attachment #765720 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 12•11 years ago
|
||
Comment on attachment 765720 [details] [diff] [review] Patch for collectd ubuntu support Checked in with 's/or/and' :callek, good catch btw
Attachment #765720 -
Flags: checked-in+
Assignee | ||
Comment 13•11 years ago
|
||
As discussed today in the relops meeting, we can start deploying the collectd module to select servers We'll start with the mobile imaging servers
Attachment #770632 -
Flags: review?(dustin)
Updated•11 years ago
|
Attachment #770632 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 14•11 years ago
|
||
Comment on attachment 770632 [details] [diff] [review] include collectd module in mobile imaging server node defs landed
Attachment #770632 -
Flags: checked-in+
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
Comment 15•11 years ago
|
||
BTW, I've been observing these kind of messages in the logs: Jul 23 05:02:57 buildbot-master79 /usr/sbin/gmond[1074]: Error creating multicast server mcast_join=127.0.0.1 port=8649 mcast_if=NULL family='inet4'. Exiting.#012 And puppet tries to start gmond every run.
Comment 16•11 years ago
|
||
Rail: this bug isn't for ganglia (gmond), it's for collectd. There's no statistics gathering that works on the AWS regions because there are no ganglia servers there and collectd has not been added to them. If there are manifests that try to install/start gmond for AWS hosts, it should be disabled. If you would like us to add collectd to all buildbot masters (including those in AWS), please let us know here.
Flags: needinfo?(rail)
Comment 17•11 years ago
|
||
Oh, one of the patches made me think that this bug is related. Having some stats would be great though.
Flags: needinfo?(rail)
Assignee | ||
Comment 18•11 years ago
|
||
Attachment #783921 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #783921 -
Flags: review?(rail) → review+
Assignee | ||
Comment 19•11 years ago
|
||
Comment on attachment 783921 [details] [diff] [review] bug870853-buildmasters.patch pushed to buildmasters
Attachment #783921 -
Flags: checked-in+
Comment 20•11 years ago
|
||
I think we can target the rest of the silos for servers that are hosted by Mozilla: foopys, puppetmasters, signing machines Am I missing any from that list?
Comment 21•11 years ago
|
||
At least some of the signing servers are OS X, so we'd need to put some more work into getting that rolling. More than happy to do that if we have your okay to push it out to the OS X signing servers. I think at that point we can declare it in the toplevel server definition. We also have the ability to push it out to linux slaves (builders, since there's no timing tests to worry about) as well if you'd like.
Assignee | ||
Comment 22•11 years ago
|
||
This adds collectd to the puppetmasters. Once we get collectd dmgs built for all of our osx flavors, we can pull the per node includes and just slip it into the toplevel::server module
Attachment #789154 -
Flags: review?(coop)
Updated•11 years ago
|
Attachment #789154 -
Flags: review?(coop) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #789154 -
Flags: checked-in+
Assignee | ||
Comment 23•11 years ago
|
||
collectd-libvirt is failing to install on the aws puppetmasters. There are some multilib version conflicts.
Assignee | ||
Comment 24•11 years ago
|
||
These multilib version conflicts stem from version-release mismatches between the i386 and the x86_64 versions available in the repo. For examlple, [root@releng-puppet1.srv.releng.use1.mozilla.com ~]# yum info libgcrypt Loaded plugins: security Installed Packages Name : libgcrypt Arch : x86_64 Version : 1.4.5 Release : 9.el6_2.2 Size : 524 k Repo : installed From repo : base Summary : A general-purpose cryptography library URL : http://www.gnupg.org/ License : LGPLv2+ Description : Libgcrypt is a general purpose crypto library based on the code used : in GNU Privacy Guard. This is a development version. Available Packages Name : libgcrypt Arch : i686 Version : 1.4.5 Release : 9.el6 Size : 228 k Repo : base Summary : A general-purpose cryptography library URL : http://www.gnupg.org/ License : LGPLv2+ Description : Libgcrypt is a general purpose crypto library based on the code used : in GNU Privacy Guard. This is a development version.
Comment 25•11 years ago
|
||
Huh - the only instance of that release is /data/repos/yum/mirrors/centos/6/2012-03-07/updates/i386/Packages/libgcrypt-1.4.5-9.el6_2.2.i686.rpm Looking at /var/log/yum.log, the correct version was initially installed, and then after puppet was installed a 'yum upgrade' back in May resulted in an upgrade to 9.el6_2.2, from some repository other than ours (most likely upstream). This is a problem we've run into before, and was particularly fun on Ubuntu which uses its upstream security repo even if you tell it not to. I suspect that a manual 'yum downgrade libgcrypt' will fix this.
Assignee | ||
Comment 26•11 years ago
|
||
I downgraded the offending packages on all 4 aws puppetmastsers. For releng-puppet2.srv.releng.usw2.mozilla.com, releng-puppet2.srv.releng.use1.mozilla.com, releng-puppet1.srv.releng.usw2.mozilla.com These packages were downgraded: zlib kexec-tools python python-libs libtasn1 libgcrypt gnutls cyrus-sasl-lib cyrus-sasl-plain cyrus-sasl db4 db4-utils For releng-puppet1.srv.releng.use1.mozilla.com these packages were downgraded: zlib kexec-tools python python-libs libtasn1 libgcrypt gnutls [jwatkins@releng-puppet2.srv.releng.usw2.mozilla.com ~]$ sudo yum downgrade zlib kexec-tools python python-libs libtasn1 libgcrypt gnutls cyrus-sasl-lib cyrus-sasl-plain cyrus-sasl db4 db4-utils Loaded plugins: security Setting up Downgrade Process Resolving Dependencies --> Running transaction check ---> Package cyrus-sasl.x86_64 0:2.1.23-13.el6 will be a downgrade ---> Package cyrus-sasl.x86_64 0:2.1.23-13.el6_3.1 will be erased ---> Package cyrus-sasl-lib.x86_64 0:2.1.23-13.el6 will be a downgrade ---> Package cyrus-sasl-lib.x86_64 0:2.1.23-13.el6_3.1 will be erased ---> Package cyrus-sasl-plain.x86_64 0:2.1.23-13.el6 will be a downgrade ---> Package cyrus-sasl-plain.x86_64 0:2.1.23-13.el6_3.1 will be erased ---> Package db4.x86_64 0:4.7.25-16.el6 will be a downgrade ---> Package db4.x86_64 0:4.7.25-17.el6 will be erased ---> Package db4-utils.x86_64 0:4.7.25-16.el6 will be a downgrade ---> Package db4-utils.x86_64 0:4.7.25-17.el6 will be erased ---> Package gnutls.x86_64 0:2.8.5-4.el6 will be a downgrade ---> Package gnutls.x86_64 0:2.8.5-10.el6_4.1 will be erased ---> Package kexec-tools.x86_64 0:2.0.0-209.el6 will be a downgrade ---> Package kexec-tools.x86_64 0:2.0.0-258.el6 will be erased ---> Package libgcrypt.x86_64 0:1.4.5-9.el6 will be a downgrade ---> Package libgcrypt.x86_64 0:1.4.5-9.el6_2.2 will be erased ---> Package libtasn1.x86_64 0:2.3-3.el6 will be a downgrade ---> Package libtasn1.x86_64 0:2.3-3.el6_2.1 will be erased ---> Package python.x86_64 0:2.6.6-29.el6 will be a downgrade ---> Package python.x86_64 0:2.6.6-36.el6 will be erased ---> Package python-libs.x86_64 0:2.6.6-29.el6 will be a downgrade ---> Package python-libs.x86_64 0:2.6.6-36.el6 will be erased ---> Package zlib.x86_64 0:1.2.3-27.el6 will be a downgrade ---> Package zlib.x86_64 0:1.2.3-29.el6 will be erased --> Finished Dependency Resolution Dependencies Resolved =========================================================================================================================== Package Arch Version Repository Size =========================================================================================================================== Downgrading: cyrus-sasl x86_64 2.1.23-13.el6 base 78 k cyrus-sasl-lib x86_64 2.1.23-13.el6 base 136 k cyrus-sasl-plain x86_64 2.1.23-13.el6 base 30 k db4 x86_64 4.7.25-16.el6 base 565 k db4-utils x86_64 4.7.25-16.el6 base 130 k gnutls x86_64 2.8.5-4.el6 base 343 k kexec-tools x86_64 2.0.0-209.el6 base 255 k libgcrypt x86_64 1.4.5-9.el6 base 228 k libtasn1 x86_64 2.3-3.el6 base 238 k python x86_64 2.6.6-29.el6 base 4.8 M python-libs x86_64 2.6.6-29.el6 base 621 k zlib x86_64 1.2.3-27.el6 base 72 k Transaction Summary =========================================================================================================================== Downgrade 12 Package(s) Total download size: 7.4 M Is this ok [y/N]: y Downloading Packages: (1/12): cyrus-sasl-2.1.23-13.el6.x86_64.rpm | 78 kB 00:00 (2/12): cyrus-sasl-lib-2.1.23-13.el6.x86_64.rpm | 136 kB 00:00 (3/12): cyrus-sasl-plain-2.1.23-13.el6.x86_64.rpm | 30 kB 00:00 (4/12): db4-4.7.25-16.el6.x86_64.rpm | 565 kB 00:00 (5/12): db4-utils-4.7.25-16.el6.x86_64.rpm | 130 kB 00:00 (6/12): gnutls-2.8.5-4.el6.x86_64.rpm | 343 kB 00:00 (7/12): kexec-tools-2.0.0-209.el6.x86_64.rpm | 255 kB 00:00 (8/12): libgcrypt-1.4.5-9.el6.x86_64.rpm | 228 kB 00:00 (9/12): libtasn1-2.3-3.el6.x86_64.rpm | 238 kB 00:00 (10/12): python-2.6.6-29.el6.x86_64.rpm | 4.8 MB 00:00 (11/12): python-libs-2.6.6-29.el6.x86_64.rpm | 621 kB 00:00 (12/12): zlib-1.2.3-27.el6.x86_64.rpm | 72 kB 00:00 --------------------------------------------------------------------------------------------------------------------------- Total 32 MB/s | 7.4 MB 00:00 Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Installing : db4-4.7.25-16.el6.x86_64 1/24 Installing : zlib-1.2.3-27.el6.x86_64 2/24 Installing : cyrus-sasl-lib-2.1.23-13.el6.x86_64 3/24 Installing : python-libs-2.6.6-29.el6.x86_64 4/24 Installing : python-2.6.6-29.el6.x86_64 5/24 Installing : libgcrypt-1.4.5-9.el6.x86_64 6/24 Installing : libtasn1-2.3-3.el6.x86_64 7/24 Installing : gnutls-2.8.5-4.el6.x86_64 8/24 Installing : cyrus-sasl-plain-2.1.23-13.el6.x86_64 9/24 Installing : cyrus-sasl-2.1.23-13.el6.x86_64 10/24 Installing : kexec-tools-2.0.0-209.el6.x86_64 11/24 Installing : db4-utils-4.7.25-16.el6.x86_64 12/24 Cleanup : python-libs-2.6.6-36.el6.x86_64 13/24 Cleanup : python-2.6.6-36.el6.x86_64 14/24 Cleanup : gnutls-2.8.5-10.el6_4.1.x86_64 15/24 Cleanup : cyrus-sasl-2.1.23-13.el6_3.1.x86_64 16/24 Cleanup : kexec-tools-2.0.0-258.el6.x86_64 17/24 Cleanup : cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64 18/24 Cleanup : cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 19/24 Cleanup : db4-utils-4.7.25-17.el6.x86_64 20/24 Cleanup : db4-4.7.25-17.el6.x86_64 21/24 Cleanup : zlib-1.2.3-29.el6.x86_64 22/24 Cleanup : libgcrypt-1.4.5-9.el6_2.2.x86_64 23/24 Cleanup : libtasn1-2.3-3.el6_2.1.x86_64 24/24 Verifying : db4-utils-4.7.25-16.el6.x86_64 1/24 Verifying : cyrus-sasl-plain-2.1.23-13.el6.x86_64 2/24 Verifying : zlib-1.2.3-27.el6.x86_64 3/24 Verifying : cyrus-sasl-2.1.23-13.el6.x86_64 4/24 Verifying : kexec-tools-2.0.0-209.el6.x86_64 5/24 Verifying : libtasn1-2.3-3.el6.x86_64 6/24 Verifying : db4-4.7.25-16.el6.x86_64 7/24 Verifying : gnutls-2.8.5-4.el6.x86_64 8/24 Verifying : libgcrypt-1.4.5-9.el6.x86_64 9/24 Verifying : python-2.6.6-29.el6.x86_64 10/24 Verifying : python-libs-2.6.6-29.el6.x86_64 11/24 Verifying : cyrus-sasl-lib-2.1.23-13.el6.x86_64 12/24 Verifying : cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 13/24 Verifying : libgcrypt-1.4.5-9.el6_2.2.x86_64 14/24 Verifying : db4-4.7.25-17.el6.x86_64 15/24 Verifying : python-2.6.6-36.el6.x86_64 16/24 Verifying : zlib-1.2.3-29.el6.x86_64 17/24 Verifying : kexec-tools-2.0.0-258.el6.x86_64 18/24 Verifying : python-libs-2.6.6-36.el6.x86_64 19/24 Verifying : libtasn1-2.3-3.el6_2.1.x86_64 20/24 Verifying : gnutls-2.8.5-10.el6_4.1.x86_64 21/24 Verifying : cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64 22/24 Verifying : db4-utils-4.7.25-17.el6.x86_64 23/24 Verifying : cyrus-sasl-2.1.23-13.el6_3.1.x86_64 24/24 Removed: cyrus-sasl.x86_64 0:2.1.23-13.el6_3.1 cyrus-sasl-lib.x86_64 0:2.1.23-13.el6_3.1 cyrus-sasl-plain.x86_64 0:2.1.23-13.el6_3.1 db4.x86_64 0:4.7.25-17.el6 db4-utils.x86_64 0:4.7.25-17.el6 gnutls.x86_64 0:2.8.5-10.el6_4.1 kexec-tools.x86_64 0:2.0.0-258.el6 libgcrypt.x86_64 0:1.4.5-9.el6_2.2 libtasn1.x86_64 0:2.3-3.el6_2.1 python.x86_64 0:2.6.6-36.el6 python-libs.x86_64 0:2.6.6-36.el6 zlib.x86_64 0:1.2.3-29.el6 Installed: cyrus-sasl.x86_64 0:2.1.23-13.el6 cyrus-sasl-lib.x86_64 0:2.1.23-13.el6 cyrus-sasl-plain.x86_64 0:2.1.23-13.el6 db4.x86_64 0:4.7.25-16.el6 db4-utils.x86_64 0:4.7.25-16.el6 gnutls.x86_64 0:2.8.5-4.el6 kexec-tools.x86_64 0:2.0.0-209.el6 libgcrypt.x86_64 0:1.4.5-9.el6 libtasn1.x86_64 0:2.3-3.el6 python.x86_64 0:2.6.6-29.el6 python-libs.x86_64 0:2.6.6-29.el6 zlib.x86_64 0:1.2.3-27.el6 Complete!
Assignee | ||
Comment 27•11 years ago
|
||
Deploys collected to the signing[456].srv.releng.scl3.mozilla.com
Attachment #790822 -
Flags: review?(coop)
Updated•11 years ago
|
Attachment #790822 -
Flags: review?(coop) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #790822 -
Flags: checked-in+
Comment 28•11 years ago
|
||
Jake: can you please push collectd out to the linux builders (not testers)? This probably requires some coordination with rail for AWS stuff to make sure everything works. Let's test this out on a few nodes in each datacenter to test.
Updated•11 years ago
|
Assignee: jwatkins → arich
Status: NEW → ASSIGNED
Updated•11 years ago
|
Assignee: arich → jwatkins
Assignee | ||
Comment 29•11 years ago
|
||
:coop, can you update this bug with the linux builders I can use (to test collectd) when you get a chance? I will need at least an IX linux bld and a HP builder. If you can also get me an aws node, that would be great too!
Flags: needinfo?(coop)
Comment 30•11 years ago
|
||
Still waiting for builds to finish on both boxes, but I've set aside the follwoing machines for you: bld-centos6-hp-006 bld-linux64-ix-027 I'll comment again when the builds are done.
Flags: needinfo?(coop)
Comment 31•11 years ago
|
||
bld-centos6-hp-006 is ready now, but bld-linux64-ix-027 is still building. Jake: how do you want me to test/verify once collectd is installed?
Assignee | ||
Comment 32•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #31) > bld-centos6-hp-006 is ready now, but bld-linux64-ix-027 is still building. > > Jake: how do you want me to test/verify once collectd is installed? Honestly, I'm not sure there is anything to test on your end. I'm just going to make sure collectd installs properly and ensure we don't run into any problems like we had with multilib discrepancies seen in the aws puppetmasters.
Assignee | ||
Comment 33•11 years ago
|
||
collectd is installed and running on bld-centos6-hp-006 without issue.
Comment 34•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #30) > bld-linux64-ix-027 This one is ready now too.
Assignee | ||
Comment 35•11 years ago
|
||
bld-linux64-ix-027 is also installed and running without issue.
Assignee | ||
Comment 36•11 years ago
|
||
To move forward here, we really need to test the deployment on a linux builder in aws. :Rail (or :Coop), could one of you lend me an one of these nodes from each aws DC (use1 & usw2)? Thanks
Flags: needinfo?(rail)
Flags: needinfo?(coop)
Comment 37•11 years ago
|
||
(In reply to Jake Watkins [:dividehex] from comment #36) > To move forward here, we really need to test the deployment on a linux > builder in aws. :Rail (or :Coop), could one of you lend me an one of these > nodes from each aws DC (use1 & usw2)? Thanks I will grab you two slaves. Also, both bld-centos6-hp-006 and bld-linux64-ix-027 are running normally and builds (and build times) don't seem to be impacted.
Flags: needinfo?(rail)
Flags: needinfo?(coop)
Comment 38•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #37) > (In reply to Jake Watkins [:dividehex] from comment #36) > > To move forward here, we really need to test the deployment on a linux > > builder in aws. :Rail (or :Coop), could one of you lend me an one of these > > nodes from each aws DC (use1 & usw2)? Thanks > > I will grab you two slaves. bld-linux64-ec2-199.build.releng.use1.mozilla.com = 10.134.53.219 bld-linux64-ec2-300.build.releng.usw2.mozilla.com = 10.132.54.25 300 is available now, 199 is still building (ETA 1h).
Comment 39•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #38) > 300 is available now, 199 is still building (ETA 1h). bld-linux64-ec2-199 is ready now too.
Assignee | ||
Comment 40•11 years ago
|
||
:coop, Thanks! I have installed collectd on both ec2 nodes without any issues.
Assignee | ||
Comment 41•11 years ago
|
||
:coop, if you are ready, I can push this into production for linux bld slaves
Attachment #802559 -
Flags: review?(coop)
Updated•11 years ago
|
Attachment #802559 -
Flags: review?(coop) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #802559 -
Flags: checked-in+
Assignee | ||
Comment 42•11 years ago
|
||
I also landed collectd::disable in case we need to back out and disable collectd at any point.
Updated•11 years ago
|
Whiteboard: [2013Q4] [tracker]
Assignee | ||
Comment 43•11 years ago
|
||
* Adds darwin support for collectd * splits out the base plugins from common.conf * common plugins are include based on os profile * Adds modules 'logfile' and 'csv' primarily for debugging * unixsock plugin refactored * syslog loglevel can be adjusted
Attachment #807518 -
Flags: review?(dustin)
Assignee | ||
Comment 44•11 years ago
|
||
Forgot to mention collectd-dmg.sh is also included Collectd packages are built for 10.6 thru 10.9 and have been tested on all but 10.9. The dmgs are already in the public puppetagain dmg repo.
Assignee | ||
Comment 45•11 years ago
|
||
It looks like the 'Disk' modules isn't working on any version of OSX. It doesn't spit out any error but also doesn't output any data. This might be a bug in the module and possibly be related to this, https://github.com/collectd/collectd/issues/245
Assignee | ||
Comment 46•11 years ago
|
||
I recompiled collectd with debugging enabled and got a little more useful info out of the logs. This just reenforces my belief it is a bug in the module itself. [2013-09-19 19:02:03] [debug] plugin_read_thread: Handling `disk'. [2013-09-19 19:02:03] [debug] disk plugin: CFDictionaryGetValue(kIOBSDNameKey) failed. [2013-09-19 19:02:03] [debug] disk plugin: CFDictionaryGetValue(kIOBSDNameKey) failed. [2013-09-19 19:02:03] [debug] IORegistryEntryGetChildEntry (disk) failed: 0xe00002c0 [2013-09-19 19:02:03] [debug] disk plugin: CFDictionaryGetValue(kIOBSDNameKey) failed. [2013-09-19 19:02:03] [debug] plugin_read_thread: Effective interval of the disk plugin is 10.000 seconds. [2013-09-19 19:02:03] [debug] plugin_read_thread: Next read of the disk plugin at 1379642533.175. [2013-09-19 19:02:03] [debug] pid = 13; name = diskarbitrationd; [2013-09-19 19:02:03] [debug] pid = 3615; name = diskimages-helpe;
Comment 47•11 years ago
|
||
Comment on attachment 807518 [details] [diff] [review] bug870853-darwin.patch Review of attachment 807518 [details] [diff] [review]: ----------------------------------------------------------------- This looks great - just a few syntactic things, plus some trailing whitespace. ::: modules/collectd/manifests/plugins/csv.pp @@ +10,5 @@ > + > + $plugin_name = 'csv' > + > + case $::operatingsystem { > + /(CentOS|Ubuntu)/: { You can do this with just a comma, too: CentOS, Ubuntu: { ::: modules/collectd/manifests/util.pp @@ +4,5 @@ > +class collectd::util { > + include collectd > + include collectd::settings > + > + define config_gen ($arg_array) { Hm, I didn't even know this syntax worked. It's certainly unusual - this would ordinarily be in `modules/collectd/manifests/util/config_gen.pp`. Is there any strong reason to put it here?
Attachment #807518 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 48•11 years ago
|
||
Comment on attachment 807518 [details] [diff] [review] bug870853-darwin.patch Checked-in with recommended changes
Attachment #807518 -
Flags: checked-in+
Assignee | ||
Comment 49•11 years ago
|
||
Enables collectd on OSX signing servers
Attachment #808770 -
Flags: review?(coop)
Updated•11 years ago
|
Attachment #808770 -
Flags: review?(coop) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #808770 -
Flags: checked-in+
Comment 50•11 years ago
|
||
What has been collected as OS metrics and where can I find / see them? What are the systems/functions in releng that are not yet instrumented for OS level metrics?
Comment 51•11 years ago
|
||
OS metrics are available for all linux builder systems, linux servers, OS X signing servers, and (starting today) OS X builders (not test machines) being managed by puppet, including those in AWS. Metrics for all of the above are being stored in graphite (the same system we're using to store metrics for other IT systems): https://graphite.mozilla.org/ We are still in the process of working on a viable solution for Windows (see bug 918988) and expect that to have that done in early Q4. Test machines are pending joint work with releng to make sure the software doesn't impact test numbers (or that we can at least filter the noise).
Assignee | ||
Comment 52•11 years ago
|
||
adds collectd to bld-lion and slaveapi
Attachment #809415 -
Flags: review?(coop)
Assignee | ||
Updated•11 years ago
|
Attachment #809415 -
Flags: review?(coop) → review?(dustin)
Updated•11 years ago
|
Attachment #809415 -
Flags: review?(dustin) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #809415 -
Flags: checked-in+
Comment 53•11 years ago
|
||
This is an example of the data we get from graphite. This is a puppet server which was heavily loaded in August, and became *very* heavily overloaded as September began. The sudden dip on the right occurred when we added some CPU resources to the host this morning.
Comment 54•11 years ago
|
||
Bug 918677 comment 8 suggests some additional, higher-level host metrics that we could feed into collectd to help determine the root cause of some ongoing, difficult-to-diagnose issues. It would also be helpful to pull data out of slavealloc, buildapi, and slaveapi, as these tools have a higher-level view of the releng automation: total number of buildslaves, number in various states, number and type of running and pending jobs, and so on.
Comment 55•11 years ago
|
||
Please add rules for collectd on b-linux64-hp* as well. I'm installing them this week.
Assignee | ||
Comment 56•11 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #55) > Please add rules for collectd on b-linux64-hp* as well. I'm installing them > this week. Collectd was included for these nodes in changeset e582c0fa64f9
Assignee | ||
Comment 57•11 years ago
|
||
Now that the collectd module supports OSX and is rolled out to all servers and build slaves, we can clean up and consolidate the includes down to toplevel::server and toplevel::slave::build
Attachment #809917 -
Flags: review?(dustin)
Updated•11 years ago
|
Attachment #809917 -
Flags: review?(dustin) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #809917 -
Flags: checked-in+
Assignee | ||
Comment 58•11 years ago
|
||
Consolidates collectd include into toplevel::slave
Attachment #832403 -
Flags: review?(dustin)
Comment 59•11 years ago
|
||
Comment on attachment 832403 [details] [diff] [review] bug870853-consolidate2.patch We can go all the way up to toplevel::base, rather than stopping at toplevel::slave and toplevel::server.
Attachment #832403 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 60•11 years ago
|
||
moves include collectd to toplevel::base
Attachment #832403 -
Attachment is obsolete: true
Attachment #832513 -
Flags: review?(dustin)
Updated•11 years ago
|
Attachment #832513 -
Flags: review?(dustin) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #832513 -
Flags: checked-in+
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•