Closed Bug 755409 Opened 12 years ago Closed 12 years ago

Setup nagios + ganglia for github-sync1.dmz.releng.scl3.mozilla.com

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: ashish)

References

Details

I'm going to begin applying more real world load to this machine, and would like to be able to see trends and problems.

It appears nrpe is running, but I can't either find or have permission to access the host in nagios.

Ganglia trends would also be nice to have.

for reference, this vm was created via bug 731329
Developer Services set this up, so handing it off to them to grant more access.
I suspect we'll eventually want to be monitoring this via the releng nagios in scl3, but that's not functional yet.  shyam/bkero, what do you guys think we should be doing here?  Are you monitoring this box elsewhere already?
Assignee: server-ops-releng → server-ops-devservices
Component: Server Operations: RelEng → Server Operations: Developer Services
QA Contact: arich → shyam
Hal, 

This wasn't supposed to be production, I hope it's not turning into that.

What are the services running on it? (So I can toss it over to ops to setup monitoring).
(In reply to Shyam Mani [:fox2mike] from comment #2)
> This wasn't supposed to be production, I hope it's not turning into that.

Definitely not! Just trying to gather data to appropriately size the production setup.

> What are the services running on it? (So I can toss it over to ops to setup
> monitoring).

There are no "services" running on it. Just cron jobs I'll be setting up. I suspect the "standard" memory/swap/disk/network type measurements would be fine (but I'm not an ops person). The trending would be to nice to help correlate spikes in resource usage with actions as I work through various scenarios.

If the standard monitoring setups aren't available, I'm more than happy to run something like a sar job in the background. However, I would like input from ops on what values to capture, so we have all the relevant data when we sit down to size the production setup.

As an example, we've already uncovered that a different FS is likely a requirement (bug 739100) - I want to ensure we have an option to spot any other such situation before we go live. :)
Ops,

Please setup standard nagios checks on this box + ganglia. Puppetize the box if you have to (for ganglia). Nagios does NOT need to page oncall, IRC only is fine.
Assignee: server-ops-devservices → server-ops
Component: Server Operations: Developer Services → Server Operations
QA Contact: shyam → phong
Summary: need access to perf data for github-sync1.dmz.releng.scl3.mozilla.com → Setup nagiso + ganglia for github-sync1.dmz.releng.scl3.mozilla.com
Summary: Setup nagiso + ganglia for github-sync1.dmz.releng.scl3.mozilla.com → Setup nagios + ganglia for github-sync1.dmz.releng.scl3.mozilla.com
Since this is a poc box, we'd prefer the alerts not go to IRC #buildduty at this time. I.e. no notifications enabled, I'd prefer to poll if I have access to the web i/f for this host.

Ditto for ganglia - url preferred 

Thanks!
Ping - any progress update here?
Assignee: server-ops → rbryce
Please do this for github-sync1-dev.dmz.scl3.mozilla.com as well. Two hosts.
Hosts added to Nagioses[1][2] and Ganglias[3][4]

Also:
06:24:46 < nagios-scl3> [502] github-sync1-dev.dmz.scl3.mozilla.com:Disk - All is CRITICAL: DISK CRITICAL - free space: / 28576 MB (44% inode=4%)

[1] https://nagios.mozilla.org/scl3/cgi-bin/status.cgi?hostgroup=github-sync&style=detail
[2] https://nagios.mozilla.org/sjc1/cgi-bin/status.cgi?navbarsearch=1&host=github-sync1.dmz.releng.scl3
[3] http://ganglia1.dmz.releng.scl3.mozilla.com/ganglia/?c=github-sync
[4] https://ganglia-scl3.mozilla.org/ganglia/?c=github-sync
Assignee: rbryce → ashish
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reopening as I can't access parts of it:

scl3 nagios [1] I can log in and shows the host page, but no services. Error message is "It appears as though you do not have permission to view information for any of the services you requested..."


[1] https://nagios.mozilla.org/scl3/cgi-bin/status.cgi?hostgroup=github-sync&style=detail
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Hal Wine [:hwine] from comment #5)
> Since this is a poc box, we'd prefer the alerts not go to IRC #buildduty at
> this time. I.e. no notifications enabled, I'd prefer to poll if I have
> access to the web i/f for this host.

Since I don't have access, can you disable notifications, please?
This was all complete, worked with Hal on irc.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.