Open Bug 1315968 (captive-portal-telemetry) Opened 8 years ago Updated 2 years ago

Investigate how to include information about captive portal state in telemetry data/error reports

Categories

(Firefox :: General, defect)

defect

Tracking

()

Tracking Status
firefox52 --- fix-optional
firefox53 --- fix-optional
firefox54 --- affected

People

(Reporter: nhnt11, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxprivacy])

      No description provided.

Hi Romain,
Currently, we don't have any telemetry about the Captive Portal. If we're going to do it, we hope to add probes that can be truly helpful, that people would be interested in monitoring and investigating.

In a recent discussion, you asked, "Do we have telemetry on captive portal usage/failures? I'm curious about the retention impact it can have."
Could you please elaborate on your thinking of the relationship between usage/failures and retention impact? Or could you help us figure out specific questions we want to get answers/insight from the telemetry for Captive Portal?

Flags: needinfo?(rtestard)

(In reply to Ethan Tseng [:ethan] from comment #1)

Hi Romain,
Currently, we don't have any telemetry about the Captive Portal. If we're going to do it, we hope to add probes that can be truly helpful, that people would be interested in monitoring and investigating.

In a recent discussion, you asked, "Do we have telemetry on captive portal usage/failures? I'm curious about the retention impact it can have."
Could you please elaborate on your thinking of the relationship between usage/failures and retention impact? Or could you help us figure out specific questions we want to get answers/insight from the telemetry for Captive Portal?

I was seeing reports of captive portal issues and was trying to better understand the share of users impacted.
A failure to connect through captive portal is a good reason to push users away from Firefox and reconnect with another browser, which likely leads to churn.
My ask around captive portal telemetry would be:

  • Opt out telemetry probe on release (beta won't be representative for this)
  • Share of DAU/MAU attempting to use the captive portal feature / per OS, per Firefox version, per country
  • Among users attempting to use the captive portal feature, how many times do they attempt to use it daily? (frequency of attempts should be a good proxy for problem detection when trended over time)
  • Share of DAU/MAU failing to connect through captive portal / per OS, per Firefox version, per country
  • Retention impact of using the captive portal feature succesfully and failing to connect through captive portal (i.e relative new and existing user retention of users using the captive portal feature VS not)

I see that captive portal can be implemented several ways (ICMP redirect, DNS redirect, HTTP redirect) and it may also be useful to collect more technical data that helps segment connections into useage buckets to help identify issues when they come-up. I'm unsure about what could make sense in terms of useful technical indicators

Flags: needinfo?(rtestard)
Alias: captive-portal-telemetry

The UJET team would like to get some telemetry in for 92 for a gradually rolled out vpn infobar promotion so that we can at least look and see how what the current state of affairs looks like, and if the promotion changes anything, on the (expected quite unlikely) chance that rolling it out breaks something around the captive portal functionality.

What Romain proposes sounds like it could be pretty helpful. I've poked around the code some, and it doesn't appear to me that the event model exactly matches that proposal, but since we'd like to get this in for 92, it seems like we'd do pretty well to simply sending out metrics centered around what the existing code does.

Here are some not-yet-complete notes of notifications/actions that could be telemetry events:

  • captive-portal-login (state: LOCKED_PORTAL)
    ** portal has been detected and login is required

  • captive portal infobar login button clicked (does this miss stuff like autologin, cert error page, others...; if so, is there an easy way to send it?)
    ** As I understand it, the button isn't always displayed; is that correct?

  • captive-portal-login-abort
    ** what does this represent? Just that the user pressed (X) on the notification bar, possibly because they used the OS-level dialog to login or possibly for some other reason?

  • captive-portal-login-success

Valentin, I'd love...

  • your thoughts on collecting this set of data
  • any insight you could shed on my questions (or really anything else relevant :-).
  • would you accept a patch implementing this? If so, where would you prefer it be collected? browser-captivePortal.js?

The idea is that this would be an opt out telemetry probe on release (beta won't be sufficiently representative). It would presumably gated on the "cat 1/2 data collection" pref UI. Assuming we use event telemetry, and since these would seem to be part of FHR, can we easily correlate these with OS, Firefox version, country, and date for free on the analysis side? Or do we need to encapsulate some or all of that data in our events themselves? Andrei?

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(andrei.br92)

Also, Romain, I was thinking of doing this work here, but if you'd rather I put it in a different bug, that's fine too...

Flags: needinfo?(rtestard)

(In reply to Dan Mosedale (:dmose, :dmosedale) from comment #4)

Also, Romain, I was thinking of doing this work here, but if you'd rather I put it in a different bug, that's fine too...

This sounds totally fine.
One comment regarding the infobar : with MR1 we had intended to capture all infobar telemetry. https://bugzilla.mozilla.org/show_bug.cgi?id=1690390#c8 did not land but describes an approach for implementing telemetry on infobars which may be useful here.

Flags: needinfo?(rtestard)

(In reply to Dan Mosedale (:dmose, :dmosedale) from comment #3)

  • captive-portal-login-abort
    ** what does this represent? Just that the user pressed (X) on the notification bar, possibly because they used the OS-level dialog to login or possibly for some other reason?

No, this happens when the captive portal check gets aborted due to too many failed checks. Usually happens when there is no actual connectivity, for example if the user turns off wifi. Normally it doesn't tell us anything about the captive portal state except that there is no check being done anymore.
Clicking X doesn't really stop the captive portal checks. We've had this kind of problems with misbehaving captive portals (or ISPs that hijack traffic without being a CP - see bug 1717954). We should try to rectify this as shipping the promotion in the CP notification bar might seem as an annoying persistent ad instead of a bug/misbehaving network.

  • your thoughts on collecting this set of data

I think it's definitely worth collecting this data.
I can't answer the questions about the UI interactions, since I have very little knowledge of how that is implemented.
If we want to have event telemetry that records all of the CP checks I think the best place to have them is in CaptivePortalService
If we have current telemetry events that could be improved by knowing the captive portal state at the moment they're recorded we could add that as an additional field to the probe similar to this using nsICaptivePortalService

  • would you accept a patch implementing this? If so, where would you prefer it be collected? browser-captivePortal.js?

I assume telemetry recording interactions with the captive portal bar would go there, yes.
Otherwise recording the observer notifications could be done in the CaptivePortalService or in a separate module. I don't have a strong opinion on that.

Flags: needinfo?(valentin.gosu)

OK, so we've scaled back our ambitions slightly, and in the interest of getting things in this week, and I'm going to move the ongoing discussion over to bug 1722834. The idea is twofold:

  • only implement out the exact subset of this stuff that will be most useful for our experiment
  • not pollute this bug, which contains useful discussion about captive portal telemetry in the large
  • our bugzilla triage tool only knows how to show us stuff in our bugzilla component :-)
Flags: needinfo?(andrei.br92)
See Also: → 1722834
No longer blocks: 1721730

Thanks for the quick reply!

(In reply to Valentin Gosu [:valentin] (he/him) from comment #6)

(In reply to Dan Mosedale (:dmose, :dmosedale) from comment #3)

  • captive-portal-login-abort

** what does this represent? Just that the user pressed (X) on the notification bar, possibly because they used the OS-level dialog to login or possibly for some other reason?

No, this happens when the captive portal check gets aborted due to too many failed checks. Usually happens when there is no actual connectivity, for example if the user turns off wifi. Normally it doesn't tell us anything about the captive portal state except that there is no check being done anymore. Clicking X doesn't really stop the captive portal checks. We've had this kind of problems with misbehaving captive portals (or ISPs that hijack traffic without being a CP - see bug 1717954).

OK, that's really helpful; thanks!

We should try to rectify this as shipping the promotion in the CP notification bar might seem as an annoying persistent ad instead of a bug/misbehaving network.

The intent is to show the promo notification bar only after either the first successful captive portal login, or maybe only after one succesful login every N (eg 5) weeks.

Does that alleviate your concern?

If we have current telemetry events that could be improved by knowing the captive portal state at the moment they're recorded we could add that as an additional field to the probe similar to this using nsICaptivePortalService

That's really interesting! I don't think we have any telemetry needs that really work like that at the moment, though.

I'm going to make my responses to your comments about where to put this stuff over in bug 1722834, as I think that given our slight strategy shift, they'll be more useful to have there.

(In reply to Dan Mosedale (:dmose, :dmosedale) from comment #8)

The intent is to show the promo notification bar only after either the first successful captive portal login, or maybe only after one succesful login every N (eg 5) weeks.
Does that alleviate your concern?

I wasn't too concerned, but the more I read about this the more confident I am we can deliver it in a way that is non-intrusive and actually useful for the user. 👍‍‍

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.