Closed Bug 1651682 Opened 5 years ago Closed 5 years ago

DoH heuristics sometimes don't run when network goes up

Categories

(Firefox :: Security, defect, P1)

defect

Tracking

()

VERIFIED FIXED
Firefox 80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox78 --- unaffected
firefox79 + verified
firefox80 --- verified

People

(Reporter: nhnt11, Assigned: nhnt11)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Found this thanks to @dcicas and @bmaris. Sometimes, when the network goes up, we don't re-run heuristics.

The reason is that we receive a signal that the network link is up, but captive portal state is unknown, so we don't run heuristics. Later, we never receive a captive portal state change event. This implies that the observer notification ipc:network:captive-portal-set-state which is used by the captive portal web extension API is unreliable. However, I verified that network:captive-portal-connectivity IS sent, so that one should work.

We should change the captive portal handler to use onConnectivityAvailable rather than onStateChanged, at least for now. Once bug 1603779 lands we should solve this network handling issue once and for all.

I'll attach a patch specifically for beta - this bug does not manifest the same way in Nightly and I'd rather solve it in central after bug 1603779 lands.

Found some more stuff:

  1. We have a typo in background.js[1], it should be onStateChanged with a d at the end. This is failing silently because it's wrapped in a try/catch that's trying to swallow the case when the API throws when captive portal service is disabled.
  2. Even so, it seems like ipc:network:captive-portal-set-state is unreliable - I can't detect it by adding an observer in the console. I'm not sure what's going on since there's a test for that[2]. Maybe in real-world circumstances we are hitting an error when ContentParent.cpp tries to respond to the notification[3] and then it stops propagating? Not sure... Basically I tried fixing the typo and it didn't help.
  3. We should just use onConnectivityAvailable, which seems to be working as expected.

[1] https://searchfox.org/mozilla-central/rev/85ae3b911d5fcabd38ef315725df32e25edef83b/browser/extensions/doh-rollout/background.js#369,476
[2] https://searchfox.org/mozilla-central/rev/85ae3b911d5fcabd38ef315725df32e25edef83b/toolkit/components/extensions/test/xpcshell/test_ext_captivePortal.js
[3] https://searchfox.org/mozilla-central/rev/8d55e18875b89cdf2a22a7cba60dc40999c18356/dom/ipc/ContentParent.cpp#3282-3293

Assignee: nobody → nhnt11
Status: NEW → ASSIGNED

[Tracking Requested - why for this release]:

The patch in bug 1630093 is likely causing us to never run heuristics in certain scenarios that occur with significant frequency. This fix is important to restore the integrity of telemetry data especially because there is an ongoing analysis on the ratio of TRR vs native DNS lookups in the wild.

Severity: -- → S3
Root Cause: --- → Coding: Semantic Error
Priority: -- → P1
Summary: Heuristics sometimes don't run when network goes up → DoH heuristics sometimes don't run when network goes up

I think that I remember that you have made a patch that runs heuristics on each network change event "network up" and on captive portal events. Have you uplifted that patch to beta?

(In reply to Dragana Damjanovic [:dragana] from comment #5)

I think that I remember that you have made a patch that runs heuristics on each network change event "network up" and on captive portal events. Have you uplifted that patch to beta?

Yes. I'm going to request uplift on that one as well as this one. Now that I think about it, there's no reason not to uplift them together. We can abandon the [Beta-only] patch on this bug.

Attachment #9162482 - Attachment is obsolete: true
Pushed by nhnt11@gmail.com: https://hg.mozilla.org/integration/autoland/rev/fe785e0f468d React to captive portal connectivity available instead of state change. r=dragana
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 80

Comment on attachment 9162485 [details]
Bug 1651682 - React to captive portal connectivity available instead of state change. r=dragana!

Beta/Release Uplift Approval Request

  • User impact if declined: We won't be running DoH heuristics when we should, and the the overall benefit of the DoH rollout is reduced.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Mostly this is a simplification, and I've tested it manually on beta as well as central. The worst unexpected negative impact of this would be that we don't run heuristics as often as we should, but that's already the case.
  • String changes made/needed:
Attachment #9162485 - Flags: approval-mozilla-beta?

Comment on attachment 9162485 [details]
Bug 1651682 - React to captive portal connectivity available instead of state change. r=dragana!

Approved for 79.0b7.

Attachment #9162485 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hello,

I can confirm that this issue is fixed in Fx79.0b7 and in Fx 80.0a1 (buildID:20200714083249) on Win10 x64 ,mac OS 10.15 and Ubuntu 18.04.

Status: RESOLVED → VERIFIED
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: