Closed Bug 1451364 Opened 6 years ago Closed 5 years ago

Issues resolving DNS records for wiki.mozilla.org

Categories

(Infrastructure & Operations :: SRE, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Usul, Unassigned)

References

Details

jcristau> hmm.  did the wiki.m.o cname get switched back to something.appsvcs-generic.nubis.allizom.org?
<jcristau> i can't resolve it now
* Usul checks
<jcristau> wiki.mozilla.org. 59 IN CNAME www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.
<jcristau> www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org. 173 IN CNAME wiki-prod-1394614349.us-west-2.elb.amazonaws.com.
<Usul> https://usul.pastebin.mozilla.org/9082147
<jcristau> my resolver doesn't like the second cname, likely dnssec-related, it already broke the other day and was worked around by having the cname directly to foo.elb.amazonaws.com

with VPN on
[ludovic@poney ~]$ dig wiki.mozilla.org 

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> wiki.mozilla.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55060
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;wiki.mozilla.org.		IN	A

;; ANSWER SECTION:
wiki.mozilla.org.	60	IN	CNAME	www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.
www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.	190 IN CNAME wiki-prod-1394614349.us-west-2.elb.amazonaws.com.
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 60 IN	A 52.89.171.193
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 60 IN	A 54.68.243.126
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 60 IN	A 34.213.203.225

;; AUTHORITY SECTION:
us-west-2.elb.amazonaws.com. 1534 IN	NS	ns-1475.awsdns-56.org.
us-west-2.elb.amazonaws.com. 1534 IN	NS	ns-1769.awsdns-29.co.uk.
us-west-2.elb.amazonaws.com. 1534 IN	NS	ns-332.awsdns-41.com.
us-west-2.elb.amazonaws.com. 1534 IN	NS	ns-560.awsdns-06.net.

;; Query time: 20 msec
;; SERVER: 62.210.16.6#53(62.210.16.6)
;; WHEN: mer. avril 04 15:06:51  2018
;; MSG SIZE  rcvd: 362

[ludovic@poney ~]$

[ludovic@poney ~]$ dig wiki.mozilla.org @9.9.9.9

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> wiki.mozilla.org @9.9.9.9
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 47945
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;wiki.mozilla.org.		IN	A

;; Query time: 455 msec
;; SERVER: 9.9.9.9#53(9.9.9.9)
;; WHEN: mer. avril 04 15:08:28  2018
;; MSG SIZE  rcvd: 45

[ludovic@poney ~]$ dig wiki.mozilla.org @8.8.8.8

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> wiki.mozilla.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23114
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;wiki.mozilla.org.		IN	A

;; ANSWER SECTION:
wiki.mozilla.org.	59	IN	CNAME	www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.
www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.	299 IN CNAME wiki-prod-1394614349.us-west-2.elb.amazonaws.com.
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 59 IN	A 52.89.171.193
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 59 IN	A 34.213.203.225
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 59 IN	A 54.68.243.126

;; Query time: 46 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: mer. avril 04 15:08:41  2018
;; MSG SIZE  rcvd: 228
[ludo@Oulanl ~]$ dig wiki.mozilla.org @127.0.0.1

; <<>> DiG 9.11.3-RedHat-9.11.3-2.fc27 <<>> wiki.mozilla.org @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 53995
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;wiki.mozilla.org.		IN	A

;; Query time: 2195 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: mer. avril 04 21:59:44 CEST 2018
;; MSG SIZE  rcvd: 45

[ludo@Oulanl ~]$ dig wiki.mozilla.org @9.9.9.9

; <<>> DiG 9.11.3-RedHat-9.11.3-2.fc27 <<>> wiki.mozilla.org @9.9.9.9
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 24268
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;wiki.mozilla.org.		IN	A

;; Query time: 714 msec
;; SERVER: 9.9.9.9#53(9.9.9.9)
;; WHEN: mer. avril 04 22:00:07 CEST 2018
;; MSG SIZE  rcvd: 45

[ludo@Oulanl ~]$ dig wiki.mozilla.org @8.8.8.8

; <<>> DiG 9.11.3-RedHat-9.11.3-2.fc27 <<>> wiki.mozilla.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41154
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;wiki.mozilla.org.		IN	A

;; ANSWER SECTION:
wiki.mozilla.org.	59	IN	CNAME	www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.
www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.	189 IN CNAME wiki-prod-1394614349.us-west-2.elb.amazonaws.com.
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 59 IN	A 34.213.203.225
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 59 IN	A 52.89.171.193
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 59 IN	A 54.68.243.126

;; Query time: 41 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: mer. avril 04 22:00:15 CEST 2018
;; MSG SIZE  rcvd: 228

[ludo@Oulanl ~]$ dig wiki.mozilla.org @1.1.1.1

; <<>> DiG 9.11.3-RedHat-9.11.3-2.fc27 <<>> wiki.mozilla.org @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16482
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1536
;; QUESTION SECTION:
;wiki.mozilla.org.		IN	A

;; ANSWER SECTION:
wiki.mozilla.org.	41	IN	CNAME	www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.
www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.	281 IN CNAME wiki-prod-1394614349.us-west-2.elb.amazonaws.com.
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 41 IN	A 34.213.203.225
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 41 IN	A 52.89.171.193
wiki-prod-1394614349.us-west-2.elb.amazonaws.com. 41 IN	A 54.68.243.126

;; Query time: 77 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: mer. avril 04 22:00:22 CEST 2018
;; MSG SIZE  rcvd: 228

[ludo@Oulanl ~]$
[ludo@Oulanl ~]$ traceroute 9.9.9.9
traceroute to 9.9.9.9 (9.9.9.9), 30 hops max, 60 byte packets
 1  gateway (192.168.0.254)  2.996 ms  2.907 ms  2.862 ms
 2  gri82-1-78-212-88-254.fbx.proxad.net (78.212.88.254)  6.401 ms  6.596 ms  6.848 ms
 3  78.254.18.190 (78.254.18.190)  7.082 ms  7.307 ms  7.531 ms
 4  toulouse-6k-1-po3.intf.routers.proxad.net (212.27.57.141)  8.100 ms  8.765 ms  8.728 ms
 5  bzn-crs16-2-be1107.intf.routers.proxad.net (194.149.160.1)  19.646 ms  19.638 ms  19.598 ms
 6  londres-6k-1-po104.intf.routers.proxad.net (194.149.161.238)  24.498 ms  24.838 ms  25.210 ms
 7  195.66.225.238 (195.66.225.238)  25.203 ms  24.726 ms  24.902 ms
 8  dns.quad9.net (9.9.9.9)  23.368 ms !X  22.918 ms !X  23.219 ms !X
[ludo@Oulanl ~]$
avril 04 21:59:34 Oulanl unbound[25177]: [25177:0] info: start of service (unbound 1.6.8).
avril 04 21:59:43 Oulanl unbound[25177]: [25177:0] info: validation failure core.us-west-2.appsvcs-generic.nubis.allizom.org. A IN
avril 04 21:59:44 Oulanl unbound[25177]: [25177:0] info: validation failure wiki.mozilla.org. A IN
With logging enabled unbound barfs on :
avril 04 22:31:41 Oulanl unbound[27049]: [27049:0] info: start of service (unbound 1.6.8).
avril 04 22:32:10 Oulanl unbound[27049]: [27049:0] info: validation failure <appsvcs-generic.nubis.allizom.org. NS IN>: no NSEC3 closest encloser from 2600:1401:2::f0 for DS nubis.allizom.org. while building chain of trust
avril 04 22:32:10 Oulanl unbound[27049]: [27049:0] info: validation failure <core.us-west-2.appsvcs-generic.nubis.allizom.org. A IN>: no NSEC3 closest encloser from 184.85.248.65 for DS nubis.allizom.org. while building chain of trust
avril 04 22:32:10 Oulanl unbound[27049]: [27049:0] info: validation failure <wiki.mozilla.org. A IN>: key for validation nubis.allizom.org. is marked as invalid because of a previous validation failure <core.us-west-2.appsvcs-generic.nubis.allizom.org. A IN>: no NSEC3 closest encloser from 184.85.248.65 for DS nubis.allizom.org. while building chain of trust
Seems a little strange our SOA is a RFC1918 address and is not in an NS record. This should be "OK" as long as the nameservers can reach the SOA server at that address (unsure if that is the case).

$ dig @8.8.8.8 -t soa mozilla.org +short
infoblox1.private.mdc2.mozilla.com. sysadmins.mozilla.org. 2019040116 180 180 1209600 60

$ dig @8.8.8.8 -t ns mozilla.org +short
ns5-65.akam.net.
ns7-66.akam.net.
ns1-240.akam.net.
ns4-64.akam.net.

$ dig @8.8.8.8 infoblox1.private.mdc2.mozilla.com +short
10.50.75.120

Versus something like google.com

$ dig @8.8.8.8 -t soa google.com +short
ns1.google.com. dns-admin.google.com. 191622961 900 900 1800 60

$ dig @8.8.8.8 -t ns google.com +short
ns3.google.com.
ns4.google.com.
ns2.google.com.
ns1.google.com.

$ dig @8.8.8.8 ns1.google.com +short
216.239.32.10
(In reply to Ludovic Hirlimann [:Usul] from comment #4)
> With logging enabled unbound barfs on :
> avril 04 22:31:41 Oulanl unbound[27049]: [27049:0] info: start of service
> (unbound 1.6.8).
> avril 04 22:32:10 Oulanl unbound[27049]: [27049:0] info: validation failure
> <appsvcs-generic.nubis.allizom.org. NS IN>: no NSEC3 closest encloser from
> 2600:1401:2::f0 for DS nubis.allizom.org. while building chain of trust
> avril 04 22:32:10 Oulanl unbound[27049]: [27049:0] info: validation failure
> <core.us-west-2.appsvcs-generic.nubis.allizom.org. A IN>: no NSEC3 closest
> encloser from 184.85.248.65 for DS nubis.allizom.org. while building chain
> of trust
> avril 04 22:32:10 Oulanl unbound[27049]: [27049:0] info: validation failure
> <wiki.mozilla.org. A IN>: key for validation nubis.allizom.org. is marked as
> invalid because of a previous validation failure
> <core.us-west-2.appsvcs-generic.nubis.allizom.org. A IN>: no NSEC3 closest
> encloser from 184.85.248.65 for DS nubis.allizom.org. while building chain
> of trust

If I followed correctly, this is expected (as desribed by :digi). Our DS record passes a flag that lower delegations will not pass DNSSEC validation once we shift over to the amazonaws delegation [1].

[1] - https://tools.ietf.org/html/rfc5155#section-6
delegation is fine per https://zonemaster.net/test/39a314485b0424b8

Maybe we're hitting an unbound bug.
FWIW my log (after running "unbound-control verbosity 3" and "unbound-control flush_zone allizom.org") is http://paste.debian.net/hidden/efab4fde/
Running unbound 1.6.7.
Unsurprisingly "unbound-control insecure_add appsvcs-generic.nubis.allizom.org" makes things work.
$ dig +dnssec nubis.allizom.org ds @ns7-66.akam.net.

; <<>> DiG 9.11.3-1-Debian <<>> +dnssec nubis.allizom.org ds @ns7-66.akam.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56552
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;nubis.allizom.org.		IN	DS

;; AUTHORITY SECTION:
r5ui8gqs780ip552dgk1nl5u5tcsf3o3.allizom.org. 3600 IN NSEC3 1 0 1 6367D67C2F1FDC27 R5VS339K4QTGAVS5QN5RFDFFMU3FT0CP  CNAME RRSIG
r5ui8gqs780ip552dgk1nl5u5tcsf3o3.allizom.org. 3600 IN RRSIG NSEC3 7 3 3600 20180407125954 20180404115954 15617 allizom.org. YtuB1WpxXs0cthXwfqgNtYzvPs+tTQ9knoFRDbU1nMMWbTaE+CZfhLW+ tvH4RoxeMo9kcbE+4GnKqSJJ7Pfyiji1NSpWAsXTrrRnpfdZD9zcLLRA 8Xa4B2EnnBjb5WS1g+WEzqEpLDKrDli2UVLuDFPg+rfeZbTfDFUKjKG7 UAI=
allizom.org.		3600	IN	SOA	infoblox1.private.mdc2.mozilla.com. sysadmins.mozilla.org. 2018032724 180 180 1209600 3600
allizom.org.		3600	IN	RRSIG	SOA 7 2 3600 20180407125954 20180404115954 15617 allizom.org. UQ6bqng3XH49ngf5hxIWkoooeaKIhA7qR3o5o0+N18E7bNGAVdI8hc8b Uvpe84g5dqnoMev8VmwnretjXW2Vm69/f+HQPjJQFNCpAHfDyaRhtWUP 4mNuFAIHMnel4fpPqf2IU8vSGNGWLsXzvRAxfVEOLpT5zAEbpxFlMJvz tx8=

;; Query time: 14 msec
;; SERVER: 96.7.49.66#53(96.7.49.66)
;; WHEN: Thu Apr 05 11:24:53 CEST 2018
;; MSG SIZE  rcvd: 563

That NSEC3 record does not seem to have the opt-out flag set, unless I'm missing something?
Jd can you guys have a look, please? This breaks planet and wiki for anyone who's checking dnssec like France's 3 biggest ISP (free.fr).
Flags: needinfo?(jcrowe)
(In reply to Julien Cristau [:jcristau] from comment #9)
> That NSEC3 record does not seem to have the opt-out flag set, unless I'm
> missing something?

The opt-out flag can be observed with:

> $ dig +dnssec www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org @ns7-66.akam.net
> ;; AUTHORITY SECTION:
> appsvcs-generic.nubis.allizom.org. 3600	IN NS	ns-1743.awsdns-25.co.uk.
> appsvcs-generic.nubis.allizom.org. 3600	IN NS	ns-1109.awsdns-10.org.
> appsvcs-generic.nubis.allizom.org. 3600	IN NS	ns-795.awsdns-35.net.
> appsvcs-generic.nubis.allizom.org. 3600	IN NS	ns-426.awsdns-53.com.
> qjohc55d28rt0132v0vio8ikqd5s2n6o.allizom.org. 3600 IN NSEC3 1 0 1 B8B7DB93C1F45C6B QTJ5A1OVC94I6KTUE9VJEM2U8J7AD5DU NS
> qjohc55d28rt0132v0vio8ikqd5s2n6o.allizom.org. 3600 IN RRSIG NSEC3 7 3 3600 20180408125954 20180405115954 15617 allizom.org. F6DbE/S1gS2YECMjGB2+6+aNnoEE3BHVqtYgV6M+MdL0nX/7UQCqDbKc A9o8PdU1YQ/GF6pGFqtzyG5rlOWyHMA5typwIcqUmU4hvLYGxHowGfow wNUGRSWSeWeXkNZu3XahxbS4mk5dEEvjNUXUuhXXAPW/b6ZAQegp8LTj /O8=

This marks the first NS record that's returned from mozilla.org authorities - delegating the remainder of the query to route53. I think dnssec is a red herring here.

What's odd is a dig +trace of www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org shows an initial delegation of appsvcs-generic.nubis.allizom.org to route53, and a subsequent delegation of us-west-2.appsvcs-generic.nubis.allizom.org to a second set of route53 nameservers. Despite being unusual I don't think this is the cause.
Summary: wiki.mozilla.org doesn't resolve on quad9 → Issues resolving DNS records for wiki.mozilla.org
(In reply to Brian Hourigan [:digi] from comment #11)
> qjohc55d28rt0132v0vio8ikqd5s2n6o.allizom.org. 3600 IN NSEC3 1 0 1 B8B7DB93C1F45C6B QTJ5A1OVC94I6KTUE9VJEM2U8J7AD5DU NS

That also seems to have opt-out cleared?  Unless I'm misreading, nsec3 rdata is hash algorithm, flags, iterations, salt, next hashed owner name, and a list of rr types.
Incidentally, a query for core.us-west-2.appsvcs-generic.nubis.allizom.org results in NXDOMAIN, which is wrong since that is non-terminal.  Which is likely to cause further issues with qname minimization.
from http://dnsviz.net/d/wiki.mozilla.org/dnssec/

    amazonaws.com to us-west-2.elb.amazonaws.com: No SOA RR was returned with the NODATA response. (205.251.192.27, 205.251.195.199, 2600:9000:5300:1b00::1, 2600:9000:5303:c700::1, UDP_0_EDNS0_32768_4096)
    amazonaws.com to us-west-2.elb.amazonaws.com: The Authoritative Answer (AA) flag was not set in the response. (205.251.192.27, 205.251.195.199, 2600:9000:5300:1b00::1, 2600:9000:5303:c700::1, UDP_0_EDNS0_32768_4096)
    appsvcs-generic.nubis.allizom.org to us-west-2.appsvcs-generic.nubis.allizom.org: No SOA RR was returned with the NODATA response. (205.251.193.170, 205.251.195.27, 205.251.196.85, 205.251.198.207, 2600:9000:5301:aa00::1, 2600:9000:5303:1b00::1, 2600:9000:5304:5500::1, 2600:9000:5306:cf00::1, UDP_0_EDNS0_32768_4096)
    appsvcs-generic.nubis.allizom.org to us-west-2.appsvcs-generic.nubis.allizom.org: The Authoritative Answer (AA) flag was not set in the response. (205.251.193.170, 205.251.195.27, 205.251.196.85, 205.251.198.207, 2600:9000:5301:aa00::1, 2600:9000:5303:1b00::1, 2600:9000:5304:5500::1, 2600:9000:5306:cf00::1, UDP_0_EDNS0_32768_4096)
    www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org/CNAME: A query for www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org results in a NOERROR response, while a query for its ancestor, core.us-west-2.appsvcs-generic.nubis.allizom.org, returns a name error (NXDOMAIN), which indicates that subdomains of core.us-west-2.appsvcs-generic.nubis.allizom.org, including www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org, don't exist. (205.251.192.60, 205.251.194.207, 205.251.197.154, 205.251.198.90, 2600:9000:5300:3c00::1, 2600:9000:5302:cf00::1, 2600:9000:5305:9a00::1, 2600:9000:5306:5a00::1, UDP_0_EDNS0_32768_4096)
Can we get wiki/planet CNAMEs pointed directly to the amazonaws.com names until the bogusness is sorted out on the allizom.org side?
(In reply to Julien Cristau [:jcristau] from comment #13)
> Incidentally, a query for core.us-west-2.appsvcs-generic.nubis.allizom.org
> results in NXDOMAIN, which is wrong since that is non-terminal.  Which is
> likely to cause further issues with qname minimization.

I'm with Julien.

I sent an email out-of-band to some folks last night with my final thoughts. If you scrutinize the dnsviz.net output, you can see how the .com properly has the out-out bit set (just as a reference), but not on the NSEC3 record for nubis.allizom.org, and that is where my bind nameserver starts getting upset.

named[12300]: error (no valid RRSIG) resolving 'nubis.allizom.org/DS/IN'

$ dig com +noadditional +dnssec +multiline|grep NSEC3
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 893 IN RRSIG NSEC3 8 2 86400 (
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 893 IN NSEC3 1 1 0 - (
                                NS SOA RRSIG DNSKEY NSEC3PARAM )

$dig nubis.allizom.org +noadditional +dnssec +multiline|grep NSEC3
tk97ggh9fv2362h7bvpqr44gb50v9a50.allizom.org. 3587 IN RRSIG NSEC3 7 3 3600 (
tk97ggh9fv2362h7bvpqr44gb50v9a50.allizom.org. 3587 IN NSEC3 1 0 1 FF54DABE98DC4522 (

(Clearing NI for :jd, nubis team has been consulted and there is no further info from them)
Flags: needinfo?(jcrowe)
Interesting, but nubis.allizom.org doesn't actually exist as a domain, in theory, it's the subdomains that will have NS records pointing to AWS / Route53

However, why is allizom.org reporting a private host in its SOA record ?

$> dig allizom.org  soa

; <<>> DiG 9.11.3-RedHat-9.11.3-2.fc27 <<>> allizom.org soa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37192
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0

;; QUESTION SECTION:
;allizom.org.                   IN      SOA

;; ANSWER SECTION:
allizom.org.            3580    IN      SOA     infoblox1.private.mdc2.mozilla.com. sysadmins.mozilla.org. 2018032733 180 180 1209600 3600

$> host infoblox1.private.mdc2.mozilla.com
infoblox1.private.mdc2.mozilla.com has address 10.50.75.120
Just a nudge that this still exists, and users notice it.

[04-20 13:48:46] <andrew> fauweh is wikimo down?
[04-20 13:48:58] <andrew> https://irccloud.mozilla.com/pastebin/ze5WxlAJ/
I don't think this is a issue with either DNSSEC or Akamai.
Assignee: infra → nobody
Component: Infrastructure: Other → Infrastructure: AWS
I'm out of my depth, but if comment 16 and <https://lutz.donnerhacke.de/Blog/Outsourcing-mit-Hindernissen> are correct, it's an issue with how the allizom.org nameservers respond to queries for nubis.allizom.org.

"dig nubis.allizom.org" certainly fails with every validating resolve I've checked. (It should be a valid negative response, not an error.) And that doesn't involve Route 53 in any way.

I don't know how your DNS stack works, but it seems like a bug with how the signer, or Akamai's authoritative DNS software, or Mozilla's authoritative DNS software (Infoblox?) handles signed empty non-terminals.

If so, perhaps the zone just needs to be resigned, or the software needs to be fixed, or the software needs to be upgraded.

And maybe it would be possible to work around it by adding a classic "Do Not Remove" TXT record for nubis.allizom.org.
(In reply to Matt Nordhoff (aka Peng on IRC & forums) from comment #21)
> And maybe it would be possible to work around it by adding a classic "Do Not Remove" TXT record for nubis.allizom.org.
Please try that rather soon.^^
Your dns server problem sounds a bit like https://github.com/bluejekyll/trust-dns/issues/53 (but not exactly the same).
As of 20 minutes ago:

<http://dnsviz.net/d/nubis.allizom.org/WtqLog/dnssec/>

Queries for "nubis.allizom.org." returned an NSEC3 record with the hash for some other name. ("0ge0tg7g8mkoml9al6qcts98u20adkd7.allizom.org." instead of "up32ens15h11olfrlhq04p3b94rbsa88.allizom.org.".)

Since it was not proof of whether or not "nubis.allizom.org." existsed validators considered it bogus.

(A query for "appsvcs-generic.nubis.allizom.org." did seem to return its own correct NSEC3 record.)

(Opt-out is orthogonal to whether or not a zone has insecure delegations. A zone that doesn't use opt-out just needs to sign the referral properly.)

As of 5 minutes ago:

<http://dnsviz.net/d/nubis.allizom.org/WtqPIQ/dnssec/>

The zone has been resigned with a different salt and it's using the correct new hash ("3fg7kbvo3d99gfjalvnf3fls3tepqemr.allizom.org.").
(In reply to Matt Nordhoff (aka Peng on IRC & forums) from comment #23)
> Queries for "nubis.allizom.org." returned an NSEC3 record with the hash for
> some other name. ("0ge0tg7g8mkoml9al6qcts98u20adkd7.allizom.org." instead of
> "up32ens15h11olfrlhq04p3b94rbsa88.allizom.org.".)
> 
> Since it was not proof of whether or not "nubis.allizom.org." existsed
> validators considered it bogus.

Thank you, this was very helpful. We outsource zone signing and we're working to resolve this with them.
This is broken again for me now.
from #moc 
<jcristau> can't resolve planet or wiki again today...
http://dnsviz.net/d/wiki.mozilla.org/dnssec/
Down for me, too. (I'm so thankful that our PowerDNS just works out of the box.) P-256 or P-384 instead of RSA 1024 bit would be great. A CNAME to the unsecured *.amazonaws.com reduces your DNS authenticity to absurdity: Is there a way to directly set A/AAAA records? Thanks.
It's the same issue as before.

http://dnsviz.net/d/nubis.allizom.org/Wur4UA/dnssec/

$ dig @ns1-240.akam.net +dnssec +norecurse nubis.allizom.org

allizom.org.            3600    IN      SOA     infoblox1.private.mdc2.mozilla.com. sysadmins.mozilla.org. 2018032779 180 180 1209600 3600
allizom.org.            3600    IN      RRSIG   SOA 7 2 3600 20180505125954 20180502115954 42617 allizom.org. MM1PYs20A7cGAekOluAhD8I3H00sMPcmAlwMFMAwcNTHFGymSB3rdYwN vOkE6h1sInacER1v5iTI9Ysm3DcnNaffCFZIN1bTqN0ksKbvKOmSDRJW pvl7KUmXtEyBJAS4WW96itVw9KhMexQS2W0XrZt8cEnADOrh964K4kp6 QjQ=
ufbvu3fk9s21287d050gp8eohtue8g9b.allizom.org. 3600 IN NSEC3 1 0 1 1ECE85B1485ECC72 UIQPLPQM8DMCCMUMP782VMR3BGMOGVNL CNAME RRSIG
ufbvu3fk9s21287d050gp8eohtue8g9b.allizom.org. 3600 IN RRSIG NSEC3 7 3 3600 20180505125954 20180502115954 42617 allizom.org. BvZbg1MYoKaFi+uqkkFAqhyAuqBnY8DvBxVVMq4+YaxtiJ2ZO/nC4gxd fHAtdVVqwl1Bjb8ZyXoBaNWjlKb57G9r6LtVtmZMSIecneBDGNANeafi Zw/uYHI95Sjj+qBvUOXRNCGlOFocfntUa7JAT0PMCXPwYIhRTE+fEzd+ 1RE=

$ ldns-nsec3-hash -s 1ECE85B1485ECC72 -a 1 nubis.allizom.org.
j3t17gfviv9c9meh3jbg9norfg1g81vb.

j3t17gfviv9c9meh3jbg9norfg1g81vb != ufbvu3fk9s21287d050gp8eohtue8g9b
Due to user reports from both :jcristau and :Peng_, and a recommendation from :gozer, I've changed the DNS public/private CNAME records for wiki.mozilla.org from www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org to www.wiki.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org as a temporary fix
Yikes, hit enter too fast: changed *to* wiki-prod-1394614349.us-west-2.elb.amazonaws.com
Same for Web of Things Gateway? http://iot.mozilla.org/gateway
Changed DNS on OPENNIC. Gateway works again. Might be unrelated.
Yes, it's the same thing for iot.mozilla.org. It has an equivalent DNS setup.

iot.mozilla.org.        18      IN      CNAME   www.haul.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org.
www.haul.prod.core.us-west-2.appsvcs-generic.nubis.allizom.org. 258 IN CNAME haul-web-prod-1392505985.us-west-2.elb.amazonaws.com.

(DNSSEC validation disabled for that query.)

Your new DNS resolver probably doesn't support DNSSEC.
This is apparently also affecting voice.mozilla.org.
Hi there, I'm working on voice.mozilla.org. I'm not seeing this issue personally, but do we know how many people are seeing this issue?

The reason I am asking is, we made a big announcement today, and are expecting an important influx of traffic.
https://blog.mozilla.org/blog/2018/06/07/parlez-vous-deutsch-rhagor-o-leisiau-i-common-voice/
Just note, voice.mozilla.org appears to be working now for the affected users thanks to :rtucker in bug 1467419. Big thanks!
A bug has been discovered related to non-empty terminal records, that users and the general public helped identify, and it's currently being addressed. The permanent fix for this problem is scheduled to be released on June 25th. Workarounds should continue to be used until we can verify the fix.

Empty non-terminal records are "domain names that own no resource records, but have subdomains that do." For example, if a zone has a record a.b in the zone customer.com (full name a.b.customer.com), but doesn't have a b record (full name b.customer.com), an empty non-terminal record b.customer.com is created that returns NODATA/NOERROR. The empty non-terminal creation is necessary for dnssec conformance.
 
When a zone has dnssec, a question for an empty non-terminal should result in a NODATA/NOERROR response and the response should have a corresponding NSEC3 record. A validating resolver uses the NSEC3 record to authenticate the NODATA/NOERROR response. In some cases, the NSEC3 record returned for empty non-terminals were failing dnssec validation due to a bug in generating the NSEC3 records for empty non-terminals.
We have discussed and approved a change to add a TXT record under nubis.allizom.org to work around this bug. While larger sites have already received a alternate workaround this change will extend a workaround to the remainder of Nubis service owners and users.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.