10:30 < nagios-releng> Fri 07:30:37 PST  buildbot-master82.srv.releng.scl3.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100% 10:30 < nagios-releng> Fri 07:30:57 PST  releng-puppet2.srv.releng.scl3.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100% Nothing else seems to be down, still looking it. Trees are still open.
These are both on the same ESX cluster according to inventory, I think there's something wrong there... I also just saw a few more go: 10:39 < nagios-releng> Fri 07:39:48 PST  admin1b.private.releng.scl3.mozilla.com is DOWN :CRITICAL - Host Unreachable (10.26.75.7) 10:40 < nagios-releng> Fri 07:40:57 PST  buildbot-master84.srv.releng.scl3.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100% 10:40 < nagios-releng> Fri 07:40:58 PST  ns2.private.releng.scl3.mozilla.com is DOWN :CRITICAL - Host Unreachable (10.26.75.41) Raising to blocker so we can try to fix this before the trees get closed because of it.
We are looking into this now.
We are moving our ESX boxes around in scl3, from one rack area to another. I just brought up two post-physical move. I put some test boxes on them, looked fine, so I put them back in rotation. *speculation* It would appear that some releng VLAN wasn't trunked in properly. So when the cluster rebalanced releng VMs onto the 'new' hosts, they got cut off. Sorry for the trouble. They're back in maintenance mode while I go look for the exact root cause.
Found the knowledge and cabling gap, ppened 961068 to get the right cabling into place. In the meantime, I've left the moved ESX hosts in maintenance mode and paused doing more physical moves.