Closed Bug 1582413 Opened 5 years ago Closed 5 years ago

DoH resolution fails first attempt for some domains

Categories

(Core :: Networking: DNS, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: ke5trel, Unassigned)

References

Details

(Whiteboard: [trr])

Attachments

(1 file)

72.02 KB, application/octet-stream
Details
Attached file trr-failure.moz_log

STR:

  1. Create a new Nightly profile (Ubuntu 19.04) and set:
    network.trr.mode = 3
    network.trr.bootstrapAddress= 1.1.1.1
  2. Go to about:networking#dnslookuptool and resolve wikizilla.org.
  3. If it succeeds, wait a few hours and try again.

Expected:
IP address

Actual:
NS_ERROR_UNKNOWN_HOST

The first attempt fails and subsequent attempts work until some time has passed. Disabling the DNS cache (network.dnsCacheEntries = 0) makes no difference. I have been consistently seeing this with a few specific domains like wikizilla.org since DoH was added to Nightly.

This may sound similar to Bug 1540618 but the reporter there said the default mozilla.cloudflare-dns.com worked and did not mention it only affecting specific domains.

2019-09-19 09:28:17.310667 UTC - [Parent 12078: Main Thread]: D/nsHttp nsHttpChannel 0x7fa3ab170000 calling OnStopRequest
2019-09-19 09:28:17.310685 UTC - [Parent 12078: Main Thread]: D/nsHostResolver TRR:OnStopRequest 0x7fa3aaeba000 wikizilla.org 1 failed=0 code=0
2019-09-19 09:28:17.310707 UTC - [Parent 12078: Main Thread]: D/nsHostResolver doh decode wikizilla.org 138 bytes
2019-09-19 09:28:17.310725 UTC - [Parent 12078: Main Thread]: D/nsHostResolver TRR Decode wikizilla.org RCODE 2
2019-09-19 09:28:17.310741 UTC - [Parent 12078: Main Thread]: D/nsHostResolver TRR::On200Response DohDecode 80004005
2019-09-19 09:28:17.310758 UTC - [Parent 12078: Main Thread]: D/nsHostResolver TRR:OnStopRequest 0x7fa3aaeba000 status 0 mFailed 0

RCODE 2 means SERVFAIL - there's little we can do in the client code.
However, could you also try and see if you can still reproduce this using 104.16.248.249 as the bootstrap address?
Technically 1.1.1.1 is not the correct IP for mozilla.cloudflare-dns.com which might be causing Bug 1540618 too.

Flags: needinfo?(ke5trel)

It still happens with network.trr.bootstrapAddress = 104.16.248.249.

I tested non-DoH Cloudflare DNS and wikizilla.org takes unusually long to resolve the first time (5 seconds) indicating some server issue there but importantly it does not fail like it does with DoH. With TRR-only the timeout is 30 seconds according to network.trr.request_timeout_mode_trronly_ms so this delay should not be enough for timeout.

Quad9 and Google DoH resolve this domain quickly for me without issues.

Flags: needinfo?(ke5trel)

Looks like we have nothing to make it better.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME

I reached out to CloudFlare with this bug and they found "an issue with another provider that's affecting our data centers". CloudFlare has contacted the other provider for a resolution.

Thanks for the follow up, first resolution is now working quickly and reliably.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: