Redistribute connections among Pulse nodes

NEW
Unassigned

Status

10 months ago
10 months ago

People

(Reporter: mcote, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

10 months ago
Right now there are 18 socket descriptors in use on Pulse node 1, 2 on node 2, and 127 on node 3.  This makes memory usage on node 3 quite a bit higher than the rest.  This is likely what triggered the memory-usage alerts of the past day or so.

As far as I can tell, most of these connections are from taskcluster-queue.  They *may* have shifted over after I rebooted node 3 and then node 1, although why almost none are on node 2, I'm not sure.

Can we somehow redistribute these connections across the nodes to equalize load?
Assignee: nobody → dustin
dustin@jemison ~ $ dig pulse.mozilla.org
;; ANSWER SECTION:
pulse.mozilla.org.      35      IN      CNAME   orange-antelope.rmq.cloudamqp.com.
orange-antelope.rmq.cloudamqp.com. 5 IN CNAME   ec2-52-52-230-243.us-west-1.compute.amazonaws.com.
ec2-52-52-230-243.us-west-1.compute.amazonaws.com. 86375 IN A 52.52.230.243

so we're not getting the DNS round-robin we might expect here.  It looks like this is just connecting to one of the three instancess (-01, specifically):

dustin@jemison ~ $ host orange-antelope-01.rmq.cloudamqp.com.
orange-antelope-01.rmq.cloudamqp.com is an alias for ec2-52-52-230-243.us-west-1.compute.amazonaws.com.
ec2-52-52-230-243.us-west-1.compute.amazonaws.com has address 52.52.230.243
dustin@jemison ~ $ host orange-antelope-02.rmq.cloudamqp.com.
orange-antelope-02.rmq.cloudamqp.com is an alias for ec2-52-52-230-113.us-west-1.compute.amazonaws.com.
ec2-52-52-230-113.us-west-1.compute.amazonaws.com has address 52.52.230.113
dustin@jemison ~ $ host orange-antelope-03.rmq.cloudamqp.com.
orange-antelope-03.rmq.cloudamqp.com is an alias for ec2-52-8-30-112.us-west-1.compute.amazonaws.com.
ec2-52-8-30-112.us-west-1.compute.amazonaws.com has address 52.8.30.112

Repeatedly querying the authoritative DNS server for this domain (route53) switches apparently randomly between -01 and -03:

dustin@jemison ~ $ dig @ns-1998.awsdns-57.co.uk. orange-antelope.rmq.cloudamqp.com.
;; ANSWER SECTION:
orange-antelope.rmq.cloudamqp.com. 30 IN CNAME  ec2-52-52-230-243.us-west-1.compute.amazonaws.com.

dustin@jemison ~ $ dig @ns-1998.awsdns-57.co.uk. orange-antelope.rmq.cloudamqp.com.
;; ANSWER SECTION:
orange-antelope.rmq.cloudamqp.com. 30 IN CNAME  ec2-52-8-30-112.us-west-1.compute.amazonaws.com.

So I think this is a service misconfiguration, rather than something in the taskcluster libs.
Component: Operations → Pulse
Product: Taskcluster → Webtools
Version: unspecified → other
No longer blocks: 1436735
Assignee: dustin → nobody
You need to log in before you can comment on or make changes to this bug.