Closed Bug 1446209 Opened 7 years ago Closed 7 years ago

Intermittent hostname resolution failure for workers

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1206658

People

(Reporter: gerard-majax, Unassigned)

Details

Attachments

(1 file)

Intermittently, from several network and computers, I'm having troubles being able to resolve hostnames for taskcluster workers, like elmxvgaaaaaweksh7keym6sq3mcqflze5trpbgaj5soicbhc.taskcluster-worker.NET. This issue was reproducible sometimes a few months ago, then somehow went away, and I've started to notice that again a few days ago. Each time, by the time I try to capture network packets to debug, it ends up working. When it's reproduced, a |host xxx.taskcluster-worker.net| would resolve to some IPv4 but then show some timeout trying to reach some other host.
The `taskcluster-worker.net` domain has no SOA or NS records. The TLD servers use the nameservers from the registry: dustin@jemison ~ $ whois taskcluster-worker.net [Querying whois.verisign-grs.com] [Redirected to whois.markmonitor.com] [Querying whois.markmonitor.com] [whois.markmonitor.com] Domain Name: taskcluster-worker.net Registry Domain ID: 1905453488_DOMAIN_NET-VRSN Registrar WHOIS Server: whois.markmonitor.com Registrar URL: http://www.markmonitor.com Updated Date: 2017-09-02T04:00:35-0700 ... Name Server: stateless-dns-1.8ba909e3.cont.dockerapp.io Name Server: stateless-dns.d85ca0db.svc.dockerapp.io DNSSEC: unsigned URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/ >>> Last update of WHOIS database: 2018-03-16T06:23:05-0700 <<< dustin@jemison ~ $ dig -tns taskcluster-worker.net @m.gtld-servers.NET ; <<>> DiG 9.10.5-P2-RedHat-9.10.5-2.P2.fc25 <<>> -tns taskcluster-worker.net @m.gtld-servers.NET ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13591 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;taskcluster-worker.net. IN NS ;; AUTHORITY SECTION: taskcluster-worker.net. 172800 IN NS stateless-dns-1.8ba909e3.cont.dockerapp.io. taskcluster-worker.net. 172800 IN NS stateless-dns.d85ca0db.svc.dockerapp.io. ;; Query time: 55 msec ;; SERVER: 2001:501:b1f9::30#53(2001:501:b1f9::30) ;; WHEN: Fri Mar 16 09:23:50 EDT 2018 ;; MSG SIZE rcvd: 148 However, only one of those hostnames works: dustin@jemison ~ $ dig -ta elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net @stateless-dns.d85ca0db.svc.dockerapp.io. ; <<>> DiG 9.10.5-P2-RedHat-9.10.5-2.P2.fc25 <<>> -ta elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net @stateless-dns.d85ca0db.svc.dockerapp.io. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38138 ;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net. IN A ;; ANSWER SECTION: elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net. 600 IN A 34.226.233.44 ;; Query time: 77 msec ;; SERVER: 34.211.55.63#53(34.211.55.63) ;; WHEN: Fri Mar 16 09:26:20 EDT 2018 ;; MSG SIZE rcvd: 105 dustin@jemison ~ $ dig -ta elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net @stateless-dns-1.8ba909e3.cont.dockerapp.io. dig: couldn't get address for 'stateless-dns-1.8ba909e3.cont.dockerapp.io.': not found -- so I suspect that just removing that hostname from the registry would do the trick. It's in the shape of a container endpoint, and the name has since changed. If the registry doesn't allow listing only one nameserver, it might be best to run a second deployment and list both deployments (even if both are in docker-cloud).
Flags: needinfo?(jopsen)
Attached file dig
Reproducing from the Paris office just now
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
I think this folds into migrating away from docker cloud which is going away in 2 months time, so we're now on a deadline.
Flags: needinfo?(jopsen)
Component: Operations → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: