Closed Bug 1206658 Opened 9 years ago Closed 7 years ago

stateless-dns-server: DNS reply are broken (with some ISPs)

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gerard-majax, Assigned: bstack)

References

Details

So, I am using a Hetzner VPS as a VPN endpoint. This VPS was configured to use Hetzner's DNS servers. When connected to my VPN, I cannot resolve the hosts used for live log feature. Switching to Google's DNS fixed it. I contacted Hetzner support, who replied me the following: > We are using several methods in order to harden our recursiv dns servers. Amongst other things dns 0x20. > > https://tools.ietf.org/html/draft-vixie-dnsext-dns0x20-00 > > Your dns servers > > taskcluster-dns-server-1.taskcluster.cont.tutum.io. > taskcluster-dns-server-2.taskcluster.cont.tutum.io. > > does not seem to support this type of query and therefore are sending malformed answers back which is why our recursive dns servers consider > to not use these answers. So it looks like we have a bug on our taskcluster infra :(
I think long term we're talking about proxying livelogs, instead of doing fancy DNS tricks. This will also allow a better isolation of worker from the internet.
Summary: DNS reply are broken → stateless-dns-server: DNS reply are broken (with some ISPs)
Jonas, will the webhook tunnel still make use of the stateless-dns-proxy?
Flags: needinfo?(jopsen)
Do we have a date/plan for when we intend to deploy a centralised log proxy service? Is this something for Q4/2017?
webhooktunnel will deprecate stateless-dns-server, so no need to fix this. No idea about date/plan: tc-worker is already using webhooktunnel generic-worker could be any minute patch should be fairly simple docker-worker is in progress
Flags: needinfo?(jopsen)
So from the sounds of this, likely tc-worker and docker-worker by end of q3 or near that. generic-worker could make use of it anytime before then or in early q4.
I think the near-term fix, since we won't have webhooktunnel for all workers soon, is to run this service on two EC2 instances, and put those two instances' permanent IP addresses in the registry. Probably not having NS and SOA's.
Assignee: nobody → jopsen
Assignee: jopsen → bstack
Status: NEW → ASSIGNED
The work to move this service to ec2 instances is complete. Any ideas how we can verify that we have fixed this bug? Or should we just say fixed?
Flags: needinfo?(dustin)
I think there's a command-line tool to generate hostnames. So I guess generate a hostname and then run `dig @8.8.8.8 $hostname` and 8.8.4.4 and 1.1.1.1 and maybe @ip1 and @ip2 for the IPs of the service. If all of that works, it's probably fine :)
Flags: needinfo?(dustin)
This works fine for all of those normal resolvers. I think I found the Hetzner resolvers and tried it with them, but it did not work. gerard-majax, can you confirm for me that this is still broken in Hetzner? I'm not sure we'll ever get around to fixing the dns 0x20 stuff tbh. May as well wait on having webhooktunnel running rather than fixing that. Closing as fixed for now. Please reopen if you would like it fixed now instead.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(lissyx+mozillians)
Resolution: --- → FIXED
Yeah, it seems it is still not working :(
Flags: needinfo?(lissyx+mozillians)
I'm with Brian: I'd rather not fix this. 0x20 is but one of hundreds of features missing from this DNS server and there's no sense in trying to implement all of them since we have webhooktunnel right around the corner.
You need to log in before you can comment on or make changes to this bug.