The other addons*.stage.db seem fine, but [email@example.com ~]# host puppet1.private.phx1.mozilla.com puppet1.private.phx1.mozilla.com has address 10.8.75.10 [firstname.lastname@example.org ~]# nc -vz puppet1.private.phx1.mozilla.com 8140 times out Looks like it started yesterday: Sep 6 19:42:54 addons2 puppet-agent: (/Stage[main]/Ldap_users::Groups::Admin/Ldap_users::Dotfiles[ckolos]/File[/home/ckolos/]) Failed to generate additional resources using 'eval_generate: Connection timed out - connect(2) I'm guessing this is a flow problem, since [email@example.com ~]# nc -vz puppet1.private.scl3.mozilla.com 8140 Connection to puppet1.private.scl3.mozilla.com 8140 port [tcp/*] succeeded! Can you check the flows, and if nothing's amiss bounce back to server ops?
I've pulled it from the zeus pool, even though mysql seemed to be OK.
We've also had ridiculous amounts of nagios flapping, here's the link to yesterday: http://nagios1.private.phx1.mozilla.com/phx1/cgi-bin/history.cgi?host=addons2.stage.db.phx1.mozilla.com&type=0&statetype=0&archive=1 (and today is similar). Setting it to server operations - if nc works, then the flow is open and there's not really much network operations can do.
nc's working now, but my paste above where it didn't work was for real, so this is definitely flapping. But that suggests it's not a flow problem.
Replication is broken, because: Last_IO_Error: error reconnecting to master 'firstname.lastname@example.org:3306' - retry-time: 60 retries: 86400 There are network-related problems here.
I did a yum update and the new kernel had a newer be2net driver for the NIC card. old: 4.0.160r new (current): 4.1.307r Let's keep an eye on that.
addons2.db.stage hasn't been flapping, so I'm going to mark this resolved.