Closed
Bug 1446209
Opened 7 years ago
Closed 7 years ago
Intermittent hostname resolution failure for workers
Categories
(Taskcluster :: Operations and Service Requests, task)
Taskcluster
Operations and Service Requests
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1206658
People
(Reporter: gerard-majax, Unassigned)
Details
Attachments
(1 file)
3.18 KB,
text/plain
|
Details |
Intermittently, from several network and computers, I'm having troubles being able to resolve hostnames for taskcluster workers, like elmxvgaaaaaweksh7keym6sq3mcqflze5trpbgaj5soicbhc.taskcluster-worker.NET. This issue was reproducible sometimes a few months ago, then somehow went away, and I've started to notice that again a few days ago.
Each time, by the time I try to capture network packets to debug, it ends up working.
When it's reproduced, a |host xxx.taskcluster-worker.net| would resolve to some IPv4 but then show some timeout trying to reach some other host.
Comment 1•7 years ago
|
||
The `taskcluster-worker.net` domain has no SOA or NS records. The TLD servers use the nameservers from the registry:
dustin@jemison ~ $ whois taskcluster-worker.net
[Querying whois.verisign-grs.com]
[Redirected to whois.markmonitor.com]
[Querying whois.markmonitor.com]
[whois.markmonitor.com]
Domain Name: taskcluster-worker.net
Registry Domain ID: 1905453488_DOMAIN_NET-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2017-09-02T04:00:35-0700
...
Name Server: stateless-dns-1.8ba909e3.cont.dockerapp.io
Name Server: stateless-dns.d85ca0db.svc.dockerapp.io
DNSSEC: unsigned
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2018-03-16T06:23:05-0700 <<<
dustin@jemison ~ $ dig -tns taskcluster-worker.net @m.gtld-servers.NET
; <<>> DiG 9.10.5-P2-RedHat-9.10.5-2.P2.fc25 <<>> -tns taskcluster-worker.net @m.gtld-servers.NET
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13591
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;taskcluster-worker.net. IN NS
;; AUTHORITY SECTION:
taskcluster-worker.net. 172800 IN NS stateless-dns-1.8ba909e3.cont.dockerapp.io.
taskcluster-worker.net. 172800 IN NS stateless-dns.d85ca0db.svc.dockerapp.io.
;; Query time: 55 msec
;; SERVER: 2001:501:b1f9::30#53(2001:501:b1f9::30)
;; WHEN: Fri Mar 16 09:23:50 EDT 2018
;; MSG SIZE rcvd: 148
However, only one of those hostnames works:
dustin@jemison ~ $ dig -ta elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net @stateless-dns.d85ca0db.svc.dockerapp.io.
; <<>> DiG 9.10.5-P2-RedHat-9.10.5-2.P2.fc25 <<>> -ta elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net @stateless-dns.d85ca0db.svc.dockerapp.io.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38138
;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net. IN A
;; ANSWER SECTION:
elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net. 600 IN A 34.226.233.44
;; Query time: 77 msec
;; SERVER: 34.211.55.63#53(34.211.55.63)
;; WHEN: Fri Mar 16 09:26:20 EDT 2018
;; MSG SIZE rcvd: 105
dustin@jemison ~ $ dig -ta elroslaaaaawel2o7wchhiqmtzptmmchgkbh2rzfjwidhw3n.taskcluster-worker.net @stateless-dns-1.8ba909e3.cont.dockerapp.io.
dig: couldn't get address for 'stateless-dns-1.8ba909e3.cont.dockerapp.io.': not found
--
so I suspect that just removing that hostname from the registry would do the trick. It's in the shape of a container endpoint, and the name has since changed.
If the registry doesn't allow listing only one nameserver, it might be best to run a second deployment and list both deployments (even if both are in docker-cloud).
Flags: needinfo?(jopsen)
Reporter | ||
Comment 2•7 years ago
|
||
Reproducing from the Paris office just now
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Comment 4•7 years ago
|
||
I think this folds into migrating away from docker cloud which is going away in 2 months time, so we're now on a deadline.
Flags: needinfo?(jopsen)
Assignee | ||
Updated•6 years ago
|
Component: Operations → Operations and Service Requests
You need to log in
before you can comment on or make changes to this bug.
Description
•