Closed Bug 1472874 Opened 7 years ago Closed 7 years ago

monitor c7000s boa and switches in MDC2

Categories

(Infrastructure & Operations :: MOC: Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: van, Assigned: ryanc)

References

Details

please monitor these new hosts hpswitch1-access.gf135.ops.mdc2.mozilla.net: 10.50.8.17 boa-a1-chassis1.gf135.ops.mdc2.mozilla.com: 10.50.8.32 boa-a2-chassis1.gf135.ops.mdc2.mozilla.com: 10.50.8.33 hpswitch2-access.gf135.ops.mdc2.mozilla.net:10.50.8.18 boa-a1-chassis2.gf135.ops.mdc2.mozilla.com: 10.50.8.50 boa-a2-chassis2.gf135.ops.mdc2.mozilla.com: 10.50.8.51
Summary: monitor c7000s in MDC2 → monitor c7000s boa and switches in MDC2
err, these are not hosts but rather onboard administrators and cisco switches.
Assignee: nobody → rchilds
Status: NEW → ASSIGNED
(In reply to Van Le [:van] from comment #0) > please monitor these new hosts > > hpswitch1-access.gf135.ops.mdc2.mozilla.net: 10.50.8.17 > boa-a1-chassis1.gf135.ops.mdc2.mozilla.com: 10.50.8.32 > boa-a2-chassis1.gf135.ops.mdc2.mozilla.com: 10.50.8.33 > > hpswitch2-access.gf135.ops.mdc2.mozilla.net:10.50.8.18 > boa-a1-chassis2.gf135.ops.mdc2.mozilla.com: 10.50.8.50 > boa-a2-chassis2.gf135.ops.mdc2.mozilla.com: 10.50.8.51 For the switches, we're only monitoring ping or is there anything else we should be doing with snmp etc?
Flags: needinfo?(vle)
See Also: → 1470774
Initial stuff pushed in 9c73a0918d31db5765bece7302c2497cdececa22 -- Will report back
these should be monitored the same as our other cisco module switches. for reference: switch1.r301-5.ops.scl3.mozilla.net (10.22.8.140) switch1.r302-1.ops.scl3.mozilla.net (10.22.8.149)
Flags: needinfo?(vle)
These are not looking consistent. Please make sure ACLs are in place and that all BOA are using the same community string, 05:17:11 <nagios-mdc2> ryanc: [Unknown] boa-a1-chassis1.gf135.ops.mdc2.mozilla.com:HP Blade Chassis is OK - OK - System: 'BladeSystem c7000 Enclosure G2', SN: 'USE951W478', Firmware: '4.22', hardware working fine, 1 blades, 2 i/o modules Last Checked: 2018-07-04 09:08:25 UTC 05:17:11 <nagios-mdc2> ryanc: [Unknown] boa-a1-chassis2.gf135.ops.mdc2.mozilla.com:HP Blade Chassis is CRITICAL - PSU 6 is Failed (generalFailure), input line status: linePowerLoss<br/>Enclosure overall health condition is Degraded Last Checked: 2018-07-04 09:08:31 UTC 05:17:11 <nagios-mdc2> ryanc: [Unknown] boa-a2-chassis1.gf135.ops.mdc2.mozilla.com:HP Blade Chassis is CRITICAL - SNMP CRITICAL: No response from remote host "10.50.8.33" Last Checked: 2018-07-04 09:08:45 UTC 05:17:11 <nagios-mdc2> ryanc: [Unknown] boa-a2-chassis2.gf135.ops.mdc2.mozilla.com:HP Blade Chassis is CRITICAL - SNMP CRITICAL: No response from remote host "10.50.8.51" Last Checked: 2018-07-04 09:16:50 UTC
(IRC) Fri 03:08:32 UTC [9204] [Unknown] boa-a1-chassis2.gf135.ops.mdc2.mozilla.com:HP Agents is CRITICAL: Compaq/HP Agent Check: cpqRackPowerEnclosureCondition (1:degraded) cpqRackPowerSupplyCondition (6:failed) (http://m.mozilla.org/HP+Agents) (IRC) Fri 03:08:34 UTC [9205] [Unknown] boa-a1-chassis2.gf135.ops.mdc2.mozilla.com:HP Blade Chassis is CRITICAL: PSU 6 is Failed (generalFailure), input line status: linePowerLoss<br/>Enclosure overall health condition is Degraded (http://m.mozilla.org/HP+Blade+Chassis) (IRC) Fri 03:08:57 UTC [9208] [Unknown] boa-a2-chassis1.gf135.ops.mdc2.mozilla.com:HP Blade Chassis is CRITICAL: SNMP CRITICAL: No response from remote host "10.50.8.33" (http://m.mozilla.org/HP+Blade+Chassis) Downtimed 7d waiting for the fixes for comment5
acls are not the issue. it appears since these are on active/standby, only the active boa will respond to snmp. please monitor the "boa-a1*" boards. we can monitor the secondary (a2) for ping since they respond to that.
>05:17:11 <nagios-mdc2> ryanc: [Unknown] boa-a1-chassis2.gf135.ops.mdc2.mozilla.com:HP Blade Chassis is CRITICAL - PSU 6 is Failed (generalFailure), input line status: linePowerLoss<br/>Enclosure overall health condition is Degraded Last Checked: 2018-07-04 09:08:31 UTC just an fyi, i have opened bug 1472871 for the bad PSU.
Monitoring changes made in commit 59b102d6183fceb02fb55ce5c62a5781c691889b.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.