addons2.stage.db.phx1 can't reach puppet1.private.phx1

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: dustin, Assigned: dumitru)

Tracking

Details

(Reporter)

Description

5 years ago
The other addons*.stage.db seem fine, but

[root@addons2.stage.db.phx1 ~]# host puppet1.private.phx1.mozilla.com
puppet1.private.phx1.mozilla.com has address 10.8.75.10
[root@addons2.stage.db.phx1 ~]# nc -vz puppet1.private.phx1.mozilla.com 8140

times out

Looks like it started yesterday:

Sep  6 19:42:54 addons2 puppet-agent[1168]: (/Stage[main]/Ldap_users::Groups::Admin/Ldap_users::Dotfiles[ckolos]/File[/home/ckolos/]) Failed to generate additional resources using 'eval_generate: Connection timed out - connect(2)

I'm guessing this is a flow problem, since

[root@addons2.stage.db.phx1 ~]# nc -vz puppet1.private.scl3.mozilla.com 8140
Connection to puppet1.private.scl3.mozilla.com 8140 port [tcp/*] succeeded!

Can you check the flows, and if nothing's amiss bounce back to server ops?
(Reporter)

Comment 1

5 years ago
I've pulled it from the zeus pool, even though mysql seemed to be OK.
We've also had ridiculous amounts of nagios flapping, here's the link to yesterday:

http://nagios1.private.phx1.mozilla.com/phx1/cgi-bin/history.cgi?host=addons2.stage.db.phx1.mozilla.com&type=0&statetype=0&archive=1

(and today is similar).

Setting it to server operations - if nc works, then the flow is open and there's not really much network operations can do.
Assignee: network-operations → server-ops
Component: Server Operations: ACL Request → Server Operations
QA Contact: ravi → jdow
(Reporter)

Comment 3

5 years ago
nc's working now, but my paste above where it didn't work was for real, so this is definitely flapping.  But that suggests it's not a flow problem.
Replication is broken, because:

                Last_IO_Error: error reconnecting to master 'slave_user@10.8.70.139:3306' - retry-time: 60  retries: 86400


There are network-related problems here.
(Assignee)

Comment 5

5 years ago
I did a yum update and the new kernel had a newer be2net driver for the NIC card.
old: 4.0.160r
new (current): 4.1.307r

Let's keep an eye on that.
Assignee: server-ops → dgherman
addons2.db.stage hasn't been flapping, so I'm going to mark this resolved.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.