force retry after a TRR server was down intermittently (in mode 3 only)

NEW
Assigned to

Status

()

defect
P2
normal
2 months ago
2 months ago

People

(Reporter: mozilla-bugzilla, Assigned: valentin.gosu, NeedInfo)

Tracking

({leave-open})

67 Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [necko-triaged][trr][mode3])

Attachments

(1 attachment)

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

  1. Run an own DoH server setup in Firefox (e.g. https://dns.example.com/dns-query)
  2. Set network.trr.mode to 3
  3. stop the DoH server, wait 5 seconds, start it again
    4 try to resolve an address (e.g. https://example.com)

Actual results:

Firefox tells me it "has problems finding this page".

Hint: Due to the configured fallback address, I can open the Rest interface of the DoH server and see the successfully resolved address (on https://dns.example.com/dns-query?name=example.com&type=A). But Firefox doesn't take that as a signal to consider the previously dead DoH server as healthy again.

Expected results:

Firefox opens the page

Component: Untriaged → Networking: DNS
Product: Firefox → Core
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Summary: TRR force retry in mode 3 (only) → force retry after a TRR server was down intermittently (in mode 3 only)
Whiteboard: [necko-triaged]
Assignee: nobody → valentin.gosu
Whiteboard: [necko-triaged] → [necko-triaged][trr][mode3]

Thanks for the report.
One question, did you also set the network.trr.bootstrapAddress pref to the IP of the DoH server?

Flags: needinfo?(mozilla-bugzilla)

This patch adds:

  • tests that we restart the TRR connection if it gets abnormally shut down
  • a way to terminate the TRR connection when attempting to resolve closeme.com
  • makes sure that resolving excluded domains with the DISABLE_TRR flag does
    not fail. Before this we would return an error code without checking the
    excluded domains first.

Yes, network.trr.bootstrapAddress is set to the IPv4 address of the DoH server. Without mode 3 didn't work at all.

The reported bug appears, when the DoH server was offline and therefore returned a 404. (Because I use a container behind Traefik).

The interesting part, as mentioned above, due to the bootstrapAddress, as soon as the DoH server is back, I'm able to resolve the DNS request using the HTTP API. But the TRR implementation of Firefox seems to not pick up that the server is available.

Flags: needinfo?(mozilla-bugzilla)

Thanks for the info. It may be that yours is a different issue than what I'm able to reproduce with the unit tests.

Could you gather some logs for me using these instructions?
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

You can either upload them here, or send them to my private email address.
Thanks!

Flags: needinfo?(mozilla-bugzilla)
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/84ef0513116f
Make sure we retry the TRR connection if it fails r=agrover
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.