Closed Bug 1436623 Opened 7 years ago Closed 6 years ago

Redistribute connections among Pulse nodes

Categories

(Webtools :: Pulse, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mcote, Unassigned)

Details

Right now there are 18 socket descriptors in use on Pulse node 1, 2 on node 2, and 127 on node 3. This makes memory usage on node 3 quite a bit higher than the rest. This is likely what triggered the memory-usage alerts of the past day or so. As far as I can tell, most of these connections are from taskcluster-queue. They *may* have shifted over after I rebooted node 3 and then node 1, although why almost none are on node 2, I'm not sure. Can we somehow redistribute these connections across the nodes to equalize load?
Assignee: nobody → dustin
dustin@jemison ~ $ dig pulse.mozilla.org ;; ANSWER SECTION: pulse.mozilla.org. 35 IN CNAME orange-antelope.rmq.cloudamqp.com. orange-antelope.rmq.cloudamqp.com. 5 IN CNAME ec2-52-52-230-243.us-west-1.compute.amazonaws.com. ec2-52-52-230-243.us-west-1.compute.amazonaws.com. 86375 IN A 52.52.230.243 so we're not getting the DNS round-robin we might expect here. It looks like this is just connecting to one of the three instancess (-01, specifically): dustin@jemison ~ $ host orange-antelope-01.rmq.cloudamqp.com. orange-antelope-01.rmq.cloudamqp.com is an alias for ec2-52-52-230-243.us-west-1.compute.amazonaws.com. ec2-52-52-230-243.us-west-1.compute.amazonaws.com has address 52.52.230.243 dustin@jemison ~ $ host orange-antelope-02.rmq.cloudamqp.com. orange-antelope-02.rmq.cloudamqp.com is an alias for ec2-52-52-230-113.us-west-1.compute.amazonaws.com. ec2-52-52-230-113.us-west-1.compute.amazonaws.com has address 52.52.230.113 dustin@jemison ~ $ host orange-antelope-03.rmq.cloudamqp.com. orange-antelope-03.rmq.cloudamqp.com is an alias for ec2-52-8-30-112.us-west-1.compute.amazonaws.com. ec2-52-8-30-112.us-west-1.compute.amazonaws.com has address 52.8.30.112 Repeatedly querying the authoritative DNS server for this domain (route53) switches apparently randomly between -01 and -03: dustin@jemison ~ $ dig @ns-1998.awsdns-57.co.uk. orange-antelope.rmq.cloudamqp.com. ;; ANSWER SECTION: orange-antelope.rmq.cloudamqp.com. 30 IN CNAME ec2-52-52-230-243.us-west-1.compute.amazonaws.com. dustin@jemison ~ $ dig @ns-1998.awsdns-57.co.uk. orange-antelope.rmq.cloudamqp.com. ;; ANSWER SECTION: orange-antelope.rmq.cloudamqp.com. 30 IN CNAME ec2-52-8-30-112.us-west-1.compute.amazonaws.com. So I think this is a service misconfiguration, rather than something in the taskcluster libs.
Component: Operations → Pulse
Product: Taskcluster → Webtools
Version: unspecified → other
No longer blocks: 1436735
Assignee: dustin → nobody

This is fixed in the new TC-lib-pulse, which reconnects periodically to distribute connections.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.