Closed Bug 1558495 Opened 6 years ago Closed 6 years ago

force retry after a TRR server was down intermittently (in mode 3 only)

Categories

(Core :: Networking: DNS, defect, P2)

67 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla69

People

(Reporter: mozilla-bugzilla, Assigned: valentin, NeedInfo)

References

Details

(Whiteboard: [necko-triaged][trr][mode3])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

  1. Run an own DoH server setup in Firefox (e.g. https://dns.example.com/dns-query)
  2. Set network.trr.mode to 3
  3. stop the DoH server, wait 5 seconds, start it again
    4 try to resolve an address (e.g. https://example.com)

Actual results:

Firefox tells me it "has problems finding this page".

Hint: Due to the configured fallback address, I can open the Rest interface of the DoH server and see the successfully resolved address (on https://dns.example.com/dns-query?name=example.com&type=A). But Firefox doesn't take that as a signal to consider the previously dead DoH server as healthy again.

Expected results:

Firefox opens the page

Component: Untriaged → Networking: DNS
Product: Firefox → Core
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Summary: TRR force retry in mode 3 (only) → force retry after a TRR server was down intermittently (in mode 3 only)
Whiteboard: [necko-triaged]
Assignee: nobody → valentin.gosu
Whiteboard: [necko-triaged] → [necko-triaged][trr][mode3]

Thanks for the report.
One question, did you also set the network.trr.bootstrapAddress pref to the IP of the DoH server?

Flags: needinfo?(mozilla-bugzilla)

This patch adds:

  • tests that we restart the TRR connection if it gets abnormally shut down
  • a way to terminate the TRR connection when attempting to resolve closeme.com
  • makes sure that resolving excluded domains with the DISABLE_TRR flag does
    not fail. Before this we would return an error code without checking the
    excluded domains first.

Yes, network.trr.bootstrapAddress is set to the IPv4 address of the DoH server. Without mode 3 didn't work at all.

The reported bug appears, when the DoH server was offline and therefore returned a 404. (Because I use a container behind Traefik).

The interesting part, as mentioned above, due to the bootstrapAddress, as soon as the DoH server is back, I'm able to resolve the DNS request using the HTTP API. But the TRR implementation of Firefox seems to not pick up that the server is available.

Flags: needinfo?(mozilla-bugzilla)

Thanks for the info. It may be that yours is a different issue than what I'm able to reproduce with the unit tests.

Could you gather some logs for me using these instructions?
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

You can either upload them here, or send them to my private email address.
Thanks!

Flags: needinfo?(mozilla-bugzilla)
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/84ef0513116f Make sure we retry the TRR connection if it fails r=agrover
Keywords: leave-open

(In reply to Pulsebot from comment #5)

Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/84ef0513116f
Make sure we retry the TRR connection if it fails r=agrover

This patch, which landed in 69 seems to have fixed bug 1556194.

See Also: → 1556194

Closing this. Landed in Firefox 69.
If anyone can reproduce the situation in comment 3 and gather logs, please file a new bug.

Status: NEW → RESOLVED
Closed: 6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: