Closed
Bug 755409
Opened 12 years ago
Closed 12 years ago
Setup nagios + ganglia for github-sync1.dmz.releng.scl3.mozilla.com
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: ashish)
References
Details
I'm going to begin applying more real world load to this machine, and would like to be able to see trends and problems. It appears nrpe is running, but I can't either find or have permission to access the host in nagios. Ganglia trends would also be nice to have. for reference, this vm was created via bug 731329
Comment 1•12 years ago
|
||
Developer Services set this up, so handing it off to them to grant more access. I suspect we'll eventually want to be monitoring this via the releng nagios in scl3, but that's not functional yet. shyam/bkero, what do you guys think we should be doing here? Are you monitoring this box elsewhere already?
Assignee: server-ops-releng → server-ops-devservices
Component: Server Operations: RelEng → Server Operations: Developer Services
QA Contact: arich → shyam
Comment 2•12 years ago
|
||
Hal, This wasn't supposed to be production, I hope it's not turning into that. What are the services running on it? (So I can toss it over to ops to setup monitoring).
Reporter | ||
Comment 3•12 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #2) > This wasn't supposed to be production, I hope it's not turning into that. Definitely not! Just trying to gather data to appropriately size the production setup. > What are the services running on it? (So I can toss it over to ops to setup > monitoring). There are no "services" running on it. Just cron jobs I'll be setting up. I suspect the "standard" memory/swap/disk/network type measurements would be fine (but I'm not an ops person). The trending would be to nice to help correlate spikes in resource usage with actions as I work through various scenarios. If the standard monitoring setups aren't available, I'm more than happy to run something like a sar job in the background. However, I would like input from ops on what values to capture, so we have all the relevant data when we sit down to size the production setup. As an example, we've already uncovered that a different FS is likely a requirement (bug 739100) - I want to ensure we have an option to spot any other such situation before we go live. :)
Comment 4•12 years ago
|
||
Ops, Please setup standard nagios checks on this box + ganglia. Puppetize the box if you have to (for ganglia). Nagios does NOT need to page oncall, IRC only is fine.
Assignee: server-ops-devservices → server-ops
Component: Server Operations: Developer Services → Server Operations
QA Contact: shyam → phong
Summary: need access to perf data for github-sync1.dmz.releng.scl3.mozilla.com → Setup nagiso + ganglia for github-sync1.dmz.releng.scl3.mozilla.com
Updated•12 years ago
|
Summary: Setup nagiso + ganglia for github-sync1.dmz.releng.scl3.mozilla.com → Setup nagios + ganglia for github-sync1.dmz.releng.scl3.mozilla.com
Reporter | ||
Comment 5•12 years ago
|
||
Since this is a poc box, we'd prefer the alerts not go to IRC #buildduty at this time. I.e. no notifications enabled, I'd prefer to poll if I have access to the web i/f for this host. Ditto for ganglia - url preferred Thanks!
Comment 6•12 years ago
|
||
Ping - any progress update here?
Updated•12 years ago
|
Assignee: server-ops → rbryce
Comment 7•12 years ago
|
||
Please do this for github-sync1-dev.dmz.scl3.mozilla.com as well. Two hosts.
Comment 8•12 years ago
|
||
ping?
Assignee | ||
Comment 9•12 years ago
|
||
Hosts added to Nagioses[1][2] and Ganglias[3][4] Also: 06:24:46 < nagios-scl3> [502] github-sync1-dev.dmz.scl3.mozilla.com:Disk - All is CRITICAL: DISK CRITICAL - free space: / 28576 MB (44% inode=4%) [1] https://nagios.mozilla.org/scl3/cgi-bin/status.cgi?hostgroup=github-sync&style=detail [2] https://nagios.mozilla.org/sjc1/cgi-bin/status.cgi?navbarsearch=1&host=github-sync1.dmz.releng.scl3 [3] http://ganglia1.dmz.releng.scl3.mozilla.com/ganglia/?c=github-sync [4] https://ganglia-scl3.mozilla.org/ganglia/?c=github-sync
Assignee: rbryce → ashish
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 10•12 years ago
|
||
Reopening as I can't access parts of it: scl3 nagios [1] I can log in and shows the host page, but no services. Error message is "It appears as though you do not have permission to view information for any of the services you requested..." [1] https://nagios.mozilla.org/scl3/cgi-bin/status.cgi?hostgroup=github-sync&style=detail
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 11•12 years ago
|
||
(In reply to Hal Wine [:hwine] from comment #5) > Since this is a poc box, we'd prefer the alerts not go to IRC #buildduty at > this time. I.e. no notifications enabled, I'd prefer to poll if I have > access to the web i/f for this host. Since I don't have access, can you disable notifications, please?
Assignee | ||
Comment 12•12 years ago
|
||
This was all complete, worked with Hal on irc.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•