Closed Bug 1472662 Opened 6 years ago Closed 6 years ago

TRR: DNS_TRR_NS_VERFIFIED is wrong

Categories

(Core :: Networking: DNS, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: bagder, Assigned: bagder)

Details

(Whiteboard: [necko-triaged][trr])

Attachments

(1 file)

current TMO results show DNS_TRR_NS_VERFIFIED evaluate False (failure) on 99.46% on nightly-63, while it is expected to show a majority to be True (successful).

The wrong measurement has also been seen in live nightlies for which TRR works. This is data that doesn't reflect the functionality!
The NS verification is done once - at startup. Most startups are session restores when LOTS of name resolves are being issued almost at once. 

Current theory: the confirmation state variable is accessed and updated from several threads. The TRRService::CompleteLookup() function first assigns the state variable and then subsequently reads it again within the same method. There's then a risk that the variable contents gets modified by another thread between the assignment and the reading when it updates the DNS_TRR_NS_VERFIFIED boolean, a race condition. But is the risk really *that* high to almost always trigger this?

By avoiding the re-reading of the variable we will avoid the race condition.
my laptop does not have an entry for DNS_TRR_NS_VERIFIED at all in about:telemetry.. and yet its obviously using TRR (mode 2) based on both about:networking and other telemetry. How does this explain that?
What threads do you feel can write mConfirmationState ?

afaict (mt is main thread):

enabled() <- isTrrBlackListed mt
readprefs() mt
observe() mt
maybeconfirm() <- {enabled-mt, observe-mt, notify-mt)
notify() mt
* completelookup() mt

the telem is all handled in completelookup()
Comment on attachment 8989149 [details]
bug 1472662 - only send DNS_TRR_NS_VERIFIED telemetry once

https://reviewboard.mozilla.org/r/254220/#review261036
Attachment #8989149 - Flags: review?(mcmanus) → review-
The reason for this crazy number in TMO is that the "captive portal clear" message keeps getting received during a session, and if TRR failed to verify the NS domain, it will try again on every subsequent CP clear message. In a long-going session, that can sum up to a substantial amount of (failed) retries and for every such failed retry it will send a Telemetry::DNS_TRR_NS_VERIFIED false...

A simple fix is to just report success/failure once. Maybe send another report if the TRR URI is updated?
Comment on attachment 8989149 [details]
bug 1472662 - only send DNS_TRR_NS_VERIFIED telemetry once

https://reviewboard.mozilla.org/r/254220/#review263540

I think this still just papers over the telemtry.

in the case of captivepassed -> captivepassed where we are not in CONFIRM_OK, this just keeps trying to confirm the NS but doesn't report the telemetry, right?

the bug is that we keep trying to confirm the NS when there hasn't been any change in the cap-port state. The telemetry is showing us the bug.
Attachment #8989149 - Flags: review?(mcmanus) → review-
Comment on attachment 8989149 [details]
bug 1472662 - only send DNS_TRR_NS_VERIFIED telemetry once

https://reviewboard.mozilla.org/r/254220/#review263548
Attachment #8989149 - Flags: review?(mcmanus) → review+
Pushed by daniel@haxx.se:
https://hg.mozilla.org/integration/autoland/rev/f73236dac9ec
only send DNS_TRR_NS_VERIFIED telemetry once r=mcmanus
https://hg.mozilla.org/mozilla-central/rev/f73236dac9ec
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: