Closed Bug 947122 Opened 11 years ago Closed 9 years ago

Add new switches to Nagios

Categories

(Infrastructure & Operations :: MOC: Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arzhel, Assigned: vinh)

References

Details

The new switches that DCops is deploying for the SCL3 expansion now have an interface on the ops network. Thus, they can be monitored directly from the Nagios box (in addition to the checks on the console interface that we should keep).

To start, could you please add those switches to Nagios (just ping checks):

switch1.r601-1.ops.scl3.mozilla.net, parents: core[1|2].scl3
switch1.r601-2.ops.scl3.mozilla.net parent: switch1.r601-1.ops.scl3.mozilla.net

DCops will add new switches as they deploy them.

Then, they will be parents to the machines plugged there.

On a side note:
Van, switch1.r601-9.ops.scl3.mozilla.net has only 1 upstream interface called "to_core1" and its lldp neighboor is called "switch1.r001.console.scl3.mozilla.net" which is not in DNS. What is it exactly, what's its parent?

thanks!
Flags: needinfo?(vle)
>On a side note:
Van, switch1.r601-9.ops.scl3.mozilla.net has only 1 upstream interface called "to_core1" and its lldp neighboor is called "switch1.r001.console.scl3.mozilla.net" which is not in DNS. What is it exactly, what's its parent?

This is a quick set up we did for the virtualization team to get their netapps off the ground when they came on site to do their cabling. We're waiting for the over head fibers to be completed and this switch will permanently tie back to its end of row aggregrator - switch1.r601-1. (FYI bug 938782)

switch1.r001.console.scl3.mozilla.net is agg1 in inventory which was renamed to switch1.r101-14.console.scl3.mozilla.net a few months later. I logged in and noticed it had the wrong name configured so I just renamed it to its permanently name. There's also a bug to rename these switches as we've shuffled around the location names after the recent scl3 expansion. (bug 937711)
Flags: needinfo?(vle)
switch1.r001.console.scl3.mozilla.net should actually be configured as switch1.r001-1.console.scl3.mozilla.net, which I've fixed.
turned up 3 new switches:

vle@vle-10516 ~ $ sudo fping switch1.r202-1.console.scl3.mozilla.net
switch1.r202-1.console.scl3.mozilla.net is alive
vle@vle-10516 ~ $ sudo fping switch1.r202-2.console.scl3.mozilla.net
switch1.r202-2.console.scl3.mozilla.net is alive
vle@vle-10516 ~ $ sudo fping switch1.r202-10.console.scl3.mozilla.net
switch1.r202-10.console.scl3.mozilla.net is alive
I renamed them and gave them an interface on the ops vlan:
switch1.r202-1.ops.releng.scl3.mozilla.net: parent core1.releng
switch1.r202-2.ops.releng.scl3.mozilla.net: parent switch1.r202-1.ops.releng.scl3
switch1.r202-10.ops.releng.scl3.mozilla.net: parent switch1.r202-1.ops.releng.scl3
Added in sysadmins r79413.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee: server-ops → ashish
Keeping this open for new requests until all new switches are installed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
out of curiosity, are we renaming all switches to *.ops.scl3.mozilla.net from *.console.scl3.mozilla.net? if so we should probably fix DNS as this can get quite confusing if inventory, configs, and DNS dont match up at all.

new aggregrate switch i configured today, :Xionix please confirm configs on core[12] and aggregate if needed.

switch1.r602-1.ops.scl3.mozilla.net : parent  core[12].console.scl3.mozilla.net
In DNS we should still have the .console. interfaces, we're just adding a new one (in .ops.) which now is the default hostname in the config (to have the proper names showing up in LLDP neighboors).
Inventory entries should still have the .console. as each entry is unique per member (while .ops. is for the full stack).
I hope this is clear enough :)

The switch looks fine! Parent is core1/2.scl3.mozilla.net (it needs to be a routable address).
this switch is in production now:
switch1.r602-10.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net

these 8 switches are for the tegras and while their mgmt is reachable, their ops interface wont be until our shipment of some sfps.

switch1.r602-11.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch2.r602-11.ops.releng.scl3.mozilla.net : parent switch1.r602-11.ops.releng.scl3.mozilla.net


switch1.r601-11.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch2.r601-11.ops.releng.scl3.mozilla.net : parent switch1.r601-11.ops.releng.scl3.mozilla.net


switch1.r202-11.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch2.r202-11.ops.releng.scl3.mozilla.net : parent switch1.r202-11.ops.releng.scl3.mozilla.net


switch1.r201-11.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch2.r201-11.ops.releng.scl3.mozilla.net : parent switch1.r201-11.ops.releng.scl3.mozilla.net
switch1.r202-9.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
All of these are added. Please close this bug if there no more new switches expected. Thanks!
switch1.r301-13.ops.phx1.mozilla.net : parent agg1.s301.console.phx1.mozilla.net
I added an ops interface on agg1.s301. So now parent should be agg1.s301.ops.phx1.mozilla.net

switch1.r301-11.ops.phx1.mozilla.net
switch1.r301-1.ops.phx1.mozilla.net
switch1.r301-2.ops.phx1.mozilla.net
switch1.r301-18.ops.phx1.mozilla.net
switch1.r301-8.ops.phx1.mozilla.net
switch1.r301-7.ops.phx1.mozilla.net

Should also be added with the same parent.
Added these in sysadmins r82411.
switch1.r201-1.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r101-4.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r301-9.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r302-1.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r302-2.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r302-4.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r302-6.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r302-7.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r302-8.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net

switch1.r401-1.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
I have an issue with switch1.r302-2.console.scl3.mozilla.net when I try to add it to Observium:

"Already got device with SNMP-read sysName (switch1.r302-2.console.scl3.mozilla.net.mozilla.net) and 'snmpEngineID' = 800000090300D4A02A422301 (switch1.r102-2.console.scl3.mozilla.net)."

Is it an host that has been renamed? Or is it a brand new switch?

Also could you have the switches hostnames be the .ops. name and not the .console. one?

Thanks!
Actually I got my answer, switch1.r102-2.console.scl3.mozilla.net renamed to switch1.r302-2.ops.scl3.mozilla.net in Observium
switch1.r401-3.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r401-4.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r401-5.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r401-6.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r401-7.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r401-8.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r401-9.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r401-10.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net

switch1.r402-1.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r402-2.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r402-3.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r402-6.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r402-7.ops.scl3.mozilla.net : parent core[12].scl3.mozilla.net
switch1.r102-10.console.scl3.mozilla.net : parent core[12].scl3.mozilla.net
Thanks! I believe the last one is switch1.r302-10.ops.scl3.mozilla.net based on the hostname, I added it to DNS.

All of those have been added to inventory
Oops, by inventory I mean Observium..
these switches will be online by Monday.

switch1.r202-3.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net

switch1.r202-4.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net

switch1.r201-10.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r202-5.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r201-9.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r202-6.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r202-7.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r201-8.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r201-7.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r202-8.ops.releng.scl3.mozilla.net : parent switch1.r202-9.console.scl3.mozilla.net
switch2.r202-8.ops.releng.scl3.mozilla.net : parent switch1.r202-9.console.scl3.mozilla.net
switch3.r202-8.ops.releng.scl3.mozilla.net : parent switch1.r202-9.console.scl3.mozilla.net

this was the trial rack for the panda's move to scl3. eventually we'll move this rack over to the 100s pod when it's built out.
By "parent switch1.r202-9.console.scl3.mozilla.net", do you mean switch1.r202-9.ops.releng.scl3.mozilla.net?
>By "parent switch1.r202-9.console.scl3.mozilla.net", do you mean switch1.r202-9.ops.releng.scl3.mozilla.net?

oops yeah. same switch but should have given you the routable hostname

these are the inband switches that have had an inband IP configured on them. i've updated inventory with these hostnames as well.

inband1.r201-7.inband.releng.scl3.mozilla.net : parent switch1.r201-7.ops.releng.scl3.mozilla.net
inband1.r202-10.inband.releng.scl3.mozilla.net : parent switch1.r202-10.ops.releng.scl3.mozilla.net
inband1.r202-3.inband.releng.scl3.mozilla.net : parent switch1.r202-3.ops.releng.scl3.mozilla.net
inband1.r202-4.inband.releng.scl3.mozilla.net : parent switch1.r202-4.ops.releng.scl3.mozilla.net
inband1.r202-5.inband.releng.scl3.mozilla.net : parent switch1.r202-5.ops.releng.scl3.mozilla.net
inband1.r202-6.inband.releng.scl3.mozilla.net : parent switch1.r202-6.ops.releng.scl3.mozilla.net
inband1.r202-7.inband.releng.scl3.mozilla.net : parent switch1.r202-7.ops.releng.scl3.mozilla.net
inband1.r202-9.inband.releng.scl3.mozilla.net : parent switch1.r202-9.ops.releng.scl3.mozilla.net
inband1.r401-2.inband.releng.scl3.mozilla.net : parent switch1.r401-2.ops.releng.scl3.mozilla.net
inband1.r401-3.inband.releng.scl3.mozilla.net : parent switch1.r401-3.ops.releng.scl3.mozilla.net
inband1.r401-4.inband.releng.scl3.mozilla.net : parent switch1.r401-4.ops.releng.scl3.mozilla.net
inband1.r401-5.inband.releng.scl3.mozilla.net : parent switch1.r401-5.ops.releng.scl3.mozilla.net
inband1.r401-6.inband.releng.scl3.mozilla.net : parent switch1.r401-6.ops.releng.scl3.mozilla.net
inband1.r402-9.inband.releng.scl3.mozilla.net : parent switch1.r402-9.ops.releng.scl3.mozilla.net
inband2.r401-2.inband.releng.scl3.mozilla.net : parent switch1.r401-2.ops.releng.scl3.mozilla.net
inband2.r401-3.inband.releng.scl3.mozilla.net : parent switch1.r401-3.ops.releng.scl3.mozilla.net
inband2.r401-4.inband.releng.scl3.mozilla.net : parent switch1.r401-4.ops.releng.scl3.mozilla.net
inband2.r401-5.inband.releng.scl3.mozilla.net : parent switch1.r401-5.ops.releng.scl3.mozilla.net
inband2.r401-6.inband.releng.scl3.mozilla.net : parent switch1.r401-6.ops.releng.scl3.mozilla.net
switch1.r602-2.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch2.r602-2.ops.releng.scl3.mozilla.net : parent switch1.r602-2.ops.releng.scl3.mozilla.net
switch3.r602-2.ops.releng.scl3.mozilla.net : parent switch1.r602-2.ops.releng.scl3.mozilla.net

switch1.r602-3.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch2.r602-3.ops.releng.scl3.mozilla.net : parent switch1.r602-3.ops.releng.scl3.mozilla.net
switch3.r602-3.ops.releng.scl3.mozilla.net : parent switch1.r602-3.ops.releng.scl3.mozilla.net

switch1.r602-4.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch2.r602-4.ops.releng.scl3.mozilla.net : parent switch1.r602-4.ops.releng.scl3.mozilla.net
switch3.r602-4.ops.releng.scl3.mozilla.net : parent switch1.r602-4.ops.releng.scl3.mozilla.net
:aj, regarding comment 30. these are the correct parent hostnames.

switch1.r202-8.ops.releng.scl3.mozilla.net : parent switch1.r202-9.ops.releng.scl3.mozilla.net
switch2.r202-8.ops.releng.scl3.mozilla.net : parent switch1.r202-9.ops.releng.scl3.mozilla.net
switch3.r202-8.ops.releng.scl3.mozilla.net : parent switch1.r202-9.ops.releng.scl3.mozilla.net
updated switch1.r102-10.console.scl3.mozilla.net to switch1.r102-10.ops.scl3.mozilla.net

After these hosts are added, please file a NEW bug for any more additional switches.
Assignee: ashish → afernandez
I like having 1 bug as it keeps everything in 1 location and it's easier to keep track of what's going on, rather than being CCed on a new bug each time a new switch is racked.

But otherwise please CC me on the related/similar bugs.
switch1.r602-5.ops.releng.scl3.mozilla.net : parent switch1.r201-1.ops.releng.scl3.mozilla.net
switch2.r602-5.ops.releng.scl3.mozilla.net : parent switch1.r602-5.ops.releng.scl3.mozilla.net
switch3.r602-5.ops.releng.scl3.mozilla.net : parent switch1.r602-5.ops.releng.scl3.mozilla.net
switch1.r602-6.ops.releng.scl3.mozilla.net : parent switch1.r201-1.ops.releng.scl3.mozilla.net
switch2.r602-6.ops.releng.scl3.mozilla.net : parent switch1.r602-6.ops.releng.scl3.mozilla.net 
switch3.r602-6.ops.releng.scl3.mozilla.net : parent switch1.r602-6.ops.releng.scl3.mozilla.net
switch1.r602-7.ops.releng.scl3.mozilla.net : parent switch1.r201-1.ops.releng.scl3.mozilla.net
switch2.r602-7.ops.releng.scl3.mozilla.net : parent switch1.r602-7.ops.releng.scl3.mozilla.net
switch3.r602-7.ops.releng.scl3.mozilla.net : parent switch1.r602-7.ops.releng.scl3.mozilla.net
new switches added for panda move train:

switch1.r602-8.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch2.r602-8.ops.releng.scl3.mozilla.net : parent switch1.r602-8.ops.releng.scl3.mozilla.net
switch3.r602-8.ops.releng.scl3.mozilla.net : parent switch1.r602-8.ops.releng.scl3.mozilla.net

switch1.r602-9.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch2.r602-9.ops.releng.scl3.mozilla.net : parent switch1.r602-9.ops.releng.scl3.mozilla.net
switch3.r602-9.ops.releng.scl3.mozilla.net : parent switch1.r602-9.ops.releng.scl3.mozilla.net

switch1.r601-5.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch2.r601-5.ops.releng.scl3.mozilla.net : parent switch1.r601-5.ops.releng.scl3.mozilla.net
switch3.r601-5.ops.releng.scl3.mozilla.net : parent switch1.r601-5.ops.releng.scl3.mozilla.net
switch4.r601-5.ops.releng.scl3.mozilla.net : parent switch1.r601-5.ops.releng.scl3.mozilla.net


i have the parent wrong for these following switches in comment 37. i put down parent as switch1.r201-1 but should be switch1.r202-1:

switch1.r602-5.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch1.r602-6.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch1.r602-7.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch1.f3.ops.pek1.mozilla.net : parent core1.ops.pek1.mozilla.net
switch1.r101-10.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net
switch1.r102-10.ops.releng.scl3.mozilla.net : parent core1.releng.scl3.mozilla.net

added switches to observium.
switch2.r102-10.ops.releng.scl3.mozilla.net : parent switch1.r102-10.ops.releng.scl3.mozilla.net
switch2.r101-10.ops.releng.scl3.mozilla.net : parent switch1.r101-10.ops.releng.scl3.mozilla.net

switch2.r602-1.ops.scl3.mozilla.net : parent switch1.r602-1.ops.scl3.mozilla.net
switch2.r601-1.ops.scl3.mozilla.net : parent switch1.r601-1.ops.scl3.mozilla.net

the switches have been added to observium
switch2.r201-1.ops.releng.scl3.mozilla.net : parent switch1.r201-1.ops.releng.scl3.mozilla.net

switch2.r202-1.ops.releng.scl3.mozilla.net : parent switch1.r202-1.ops.releng.scl3.mozilla.net
switch1.r101-5.ops.releng.scl3.mozilla.net : parent switch1.r101-10.ops.releng.scl3.mozilla.net

switch1.r102-5.ops.releng.scl3.mozilla.net : parent switch1.r102-10.ops.releng.scl3.mozilla.net
Assignee: afernandez → server-ops
Group: infra
Component: Server Operations → MOC: Service Requests
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → lypulong
Assignee: server-ops → vhua
Added a handful of switches to nagios today up to comment 32.
The rest of the switches have been added to nagios.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago9 years ago
Resolution: --- → FIXED
Pasting sample of how I added the switches per sal's request.


 'switch2.r201-1.ops.releng.scl3.mozilla.net' => {
            parents => 'switch1.r201-1.ops.releng.scl3.mozilla.net',
            contact_groups => 'sysalerts, build, netopsalertslist',
            hostgroups => [
                'releng-switches'
            ]
        },
        'switch2.r202-1.ops.releng.scl3.mozilla.net' => {
            parents => 'switch1.r202-1.ops.releng.scl3.mozilla.net',
            contact_groups => 'sysalerts, build, netopsalertslist',
            hostgroups => [
                'releng-switches'
            ]
        },
You need to log in before you can comment on or make changes to this bug.