Closed Bug 1616982 Opened 5 years ago Closed 4 years ago

Consider moving DoH heuristics into necko

Categories

(Core :: Networking: DNS, task, P3)

task

Tracking

()

RESOLVED INVALID

People

(Reporter: grover, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

The functionality in the doh-rollout addon should be integrated into the platform, allowing the addon to eventually be removed. Opening this BZ now, in anticipation of this, and so we can talk about if/when this should happen.

  1. Changing TRR behavior by runtime modification of trr.mode is a hack. trr.mode (or its successors) should be only changed from about:preferences (or about:config) by the USER (or a policy), and then the platform can contain logic to use DoH or not, based on that pref and other considerations. Toggling trr.mode from the addon is only needed because of its lack of integration into the platform. This toggling is visible to users and potentially confusing.
  2. Integration of heuristics into the platform will allow finer-grained control over TRR usage. Platform already has ability to use TRR per-request or per-webshell, but heuristics in addon cannot use.
  3. There's not just ONE doh rollout, it is a likely a series of rollouts to different geos, with some geos maybe never getting it. Long-lived code should be in platform. Heuristics, while maybe not around forever, are also going to be around for years.

Do we have any architectural ideas for this already in mind? Will the TRR service run heuristics and maintain state separate from the mode pref?

Bug 1603779 is indirectly related. Depending on timelines, this bug might compete with that, in which case that bug can be duped or otherwise closed. However, if the platform-ification happens much later, it's still worth converting the add-on to a JSM.

This bug is something I'd like to try working on (with help) if we have some breathing room (I wouldn't be the fastest to get this done).

See Also: → 1603779

(In reply to Nihanth Subramanya [:nhnt11] from comment #1)

Do we have any architectural ideas for this already in mind? Will the TRR service run heuristics and maintain state separate from the mode pref?

Bug 1603779 is indirectly related. Depending on timelines, this bug might compete with that, in which case that bug can be duped or otherwise closed. However, if the platform-ification happens much later, it's still worth converting the add-on to a JSM.

This bug is something I'd like to try working on (with help) if we have some breathing room (I wouldn't be the fastest to get this done).

I can write up how this should look like, an architectural design, next week.

IMO, this requires reworking network.trr.mode pref values, and UI changes are likely to follow. I think we should not start implementation without discussion with Product and UX.

It's definitely something we should do, though, so let's have that discussion!

(In reply to Nhi Nguyen (:nhi) from comment #3)

IMO, this requires reworking network.trr.mode pref values, and UI changes are likely to follow. I think we should not start implementation without discussion with Product and UX.

It's definitely something we should do, though, so let's have that discussion!

I think we can make it work in the same way addon is working without any UI change. Give me a day or 2 to write something up next week.

Design:

  • We should have a service that is performing heuristics. (It does not need to have a xpcom interface). I will call it TRRHeuristicsService in the following text, but we should find a better name.
  • Add a new pref to disable/enable heuristics: network.trr.run_heuristics
  • TRRService already observer network change events and captive portal events and should trigger heuristic check. Heursitics check will be run only if TRRServcie is enabled (mode2 or mode3) and the network.trr.run_heuristics pref is set. (We should move this into something like mainTRRSerivice when we implement running multiple TRR services at the same time, but we are not there yet.)
  • TRRHeuristicsService will be only used by TRRService, but it may be used on different threads.
  • Add check for if TRR should be disabled due to heuristics here: https://searchfox.org/mozilla-central/rev/4d9cd186767978a99dafe77eb536a9525980e118/netwerk/dns/nsHostResolver.cpp#1408
    (I think we need refactoring all TRR checks there are spread on too many places, but that is really a different bug)
  • Use nsIDNSService asyncResolveNative and cancleAsyncResolveNative (here native means no js, not native resolver).

Let me know if you need help.

Summary: Decomission doh-rollout addon → Decommission doh-rollout addon
Blocks: doh-rollout
No longer depends on: doh-rollout

Retitling this bug since the add-on no longer exists.

To be honest at this point I'm not sure how I feel about moving the heuristics code to necko.

(In reply to Andy Grover [:grover] from comment #0)

The functionality in the doh-rollout addon should be integrated into the platform, allowing the addon to eventually be removed. Opening this BZ now, in anticipation of this, and so we can talk about if/when this should happen.

  1. Changing TRR behavior by runtime modification of trr.mode is a hack. trr.mode (or its successors) should be only changed from about:preferences (or about:config) by the USER (or a policy), and then the platform can contain logic to use DoH or not, based on that pref and other considerations. Toggling trr.mode from the addon is only needed because of its lack of integration into the platform. This toggling is visible to users and potentially confusing.

This has been fixed by the introduction of doh-rollout.mode and doh-rollout.uri. Further improvement is possible here, e.g. moving to something like the SetDetectedTrrURI API.

  1. Integration of heuristics into the platform will allow finer-grained control over TRR usage. Platform already has ability to use TRR per-request or per-webshell, but heuristics in addon cannot use.

Not sure I fully understand this. Frontend code is able to set TRR flags on individual requests IIUC - it even does this already for captive portal login page tabs.

  1. There's not just ONE doh rollout, it is a likely a series of rollouts to different geos, with some geos maybe never getting it. Long-lived code should be in platform. Heuristics, while maybe not around forever, are also going to be around for years.

I think this point was more about long-lived code not living as an add-on, but I think the new home as a frontend component is completely sustainable.

Summary: Decommission doh-rollout addon → Move DoH heuristics into necko
Summary: Move DoH heuristics into necko → Consider moving DoH heuristics into necko

Adding a bit to what I said in comment 6, I'm not sure there's any clear benefit at all of moving the heuristics into necko, whereas the cost is very significant - we have a ton of logic + test coverage now e.g. migrations, TRR selection, responding to network changes, etc.

Ok, I'm going to go ahead and close this, then. Thanks Nihanth.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.