Right now there are 18 socket descriptors in use on Pulse node 1, 2 on node 2, and 127 on node 3. This makes memory usage on node 3 quite a bit higher than the rest. This is likely what triggered the memory-usage alerts of the past day or so. As far as I can tell, most of these connections are from taskcluster-queue. They *may* have shifted over after I rebooted node 3 and then node 1, although why almost none are on node 2, I'm not sure. Can we somehow redistribute these connections across the nodes to equalize load?
dustin@jemison ~ $ dig pulse.mozilla.org ;; ANSWER SECTION: pulse.mozilla.org. 35 IN CNAME orange-antelope.rmq.cloudamqp.com. orange-antelope.rmq.cloudamqp.com. 5 IN CNAME ec2-52-52-230-243.us-west-1.compute.amazonaws.com. ec2-52-52-230-243.us-west-1.compute.amazonaws.com. 86375 IN A 126.96.36.199 so we're not getting the DNS round-robin we might expect here. It looks like this is just connecting to one of the three instancess (-01, specifically): dustin@jemison ~ $ host orange-antelope-01.rmq.cloudamqp.com. orange-antelope-01.rmq.cloudamqp.com is an alias for ec2-52-52-230-243.us-west-1.compute.amazonaws.com. ec2-52-52-230-243.us-west-1.compute.amazonaws.com has address 188.8.131.52 dustin@jemison ~ $ host orange-antelope-02.rmq.cloudamqp.com. orange-antelope-02.rmq.cloudamqp.com is an alias for ec2-52-52-230-113.us-west-1.compute.amazonaws.com. ec2-52-52-230-113.us-west-1.compute.amazonaws.com has address 184.108.40.206 dustin@jemison ~ $ host orange-antelope-03.rmq.cloudamqp.com. orange-antelope-03.rmq.cloudamqp.com is an alias for ec2-52-8-30-112.us-west-1.compute.amazonaws.com. ec2-52-8-30-112.us-west-1.compute.amazonaws.com has address 220.127.116.11 Repeatedly querying the authoritative DNS server for this domain (route53) switches apparently randomly between -01 and -03: dustin@jemison ~ $ dig @ns-1998.awsdns-57.co.uk. orange-antelope.rmq.cloudamqp.com. ;; ANSWER SECTION: orange-antelope.rmq.cloudamqp.com. 30 IN CNAME ec2-52-52-230-243.us-west-1.compute.amazonaws.com. dustin@jemison ~ $ dig @ns-1998.awsdns-57.co.uk. orange-antelope.rmq.cloudamqp.com. ;; ANSWER SECTION: orange-antelope.rmq.cloudamqp.com. 30 IN CNAME ec2-52-8-30-112.us-west-1.compute.amazonaws.com. So I think this is a service misconfiguration, rather than something in the taskcluster libs.
Component: Operations → Pulse
Product: Taskcluster → Webtools
Version: unspecified → other
You need to log in before you can comment on or make changes to this bug.