Open Bug 1630384 Opened 5 years ago Updated 4 years ago

Add telemetry for users that see "FirstStartup"

Categories

(Firefox :: Messaging System, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox76 --- wontfix

People

(Reporter: k88hudson, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: nimbus)

In order to verify if enrolment actually happened during the first start-up phase, we'd like to include in the enrolment ping. We could do this by passing in a value to the onRecipe function (or _run in Normandy actions)

Iteration: --- → 77.2 - Apr 20 - May 3
Priority: -- → P1
Priority: P1 → P2
Iteration: 77.2 - Apr 20 - May 3 → 78.1 - May 4 - May 17
Iteration: 78.1 - May 4 - May 17 → 78.2 - May 18 - May 31
Summary: Add telemetry about whether enrollment happened during the "FirstStartup" → Add telemetry for users that "FirstStartup"
Summary: Add telemetry for users that "FirstStartup" → Add telemetry for users that see "FirstStartup"
Assignee: nobody → najiang
Iteration: 78.2 - May 18 - May 31 → 79.1 - June 1 - June 14
Severity: -- → S3
Priority: P2 → P1

Looks like we already have some probes in FirstStartup.jsm.

It includes statusCode and elapsed time duration of the first startup, I think those are sufficient for us to understand how many users see the first startup.

Note that those probes will retire in Firefox 80.

Nan, I don't think the existing probes are sufficient for what we need, since they're only reported in main pings.

We're going to need this in new profile pings (https://docs.telemetry.mozilla.org/concepts/choosing_a_dataset.html?highlight=new,profile#new-profile-ping), (and would be useful as well in experiment enrollment pings).

The experiments team will need this specifically because we need to know what the number of new profiles are that are on firstStartup to get the baseline population to determine if enrollment is correct.

It will also be useful for general analysis.

Unless there is significant cost to implementing this, we definitely should add implement in new profile pings (and possibly enrollment pings).

For sure, we can still add probes for this for better accuracy (compared to the existing ones).

The experiments team will need this specifically because we need to know what the number of new profiles are that are on firstStartup to get the baseline population to determine if enrollment is correct.

FYI, we are landing bug 1642455 to improve the enrollment for the first-run experiments. Particularly, this allows us to precisely enroll users only during the first startup other than relying on indirect filters such as profile age and pref value.

Blocks: 1641286

Reset the iteration and priority based on the triage meeting. We will revisit this feature for the Firefox 80 iteration.

Assignee: najiang → nobody
Iteration: 79.1 - June 1 - June 14 → 80.1 - June 29 - July 12
Priority: P1 → P2
Iteration: 80.1 - June 29 - July 12 → 80.2 - July 13 - July 26

Given that our targeting improvements worked well, let's deprioritize this.

Iteration: 80.2 - July 13 - July 26 → ---
Priority: P2 → P3
Component: Messaging System → Nimbus Desktop Client
Component: Nimbus Desktop Client → Messaging System
Keywords: nimbus

FirstStartup targeting is stable with fix of https://bugzilla.mozilla.org/show_bug.cgi?id=1642455. NI Nan to help triage and close this if this telemetry is no longer needed.

Flags: needinfo?(najiang)

Was there validation for stable FirstStartup targeting working as expected? I imagine this is where this telemetry would be useful.

Was there validation for stable FirstStartup targeting working as expected? I imagine this is where this telemetry would be useful.

Yes, we validated the FirstStartup targeting in the multi-stage v1/v2/v3 experiments, and the results showed that the user enrollments across branches matched against the first-run impressions with the associated screen id in the experiment.

Multi-stage v1: https://sql.telemetry.mozilla.org/dashboard/messaging-system-experiment-multi-stage-about-welcome-in-firefox-78
Multi-stage v2: https://sql.telemetry.mozilla.org/dashboard/messaging-system-experiment-multi-stage-about-welcome-v2-in-firefox-79
Multi-stage v3: https://analytics.amplitude.com/mozilla-corp/notebook/71nd852

Now that this targeting turns out to be effective, I'd recommend closing this for now.

Flags: needinfo?(najiang)

Nan, I'm not sure I'm following the validation logic here. I'm interpreting what your saying to mean: we enrolled people into some experiments, and they saw the treatment about:welcome screen they were supposed to.

But not all users enrolled into the experiment see the correct about:welcome message, or any about:welcome message. Currently I believe a non-negligible part of new user experiment populations still experience this (~10%).

The point of this telemetry is to give us clarity on why they're not seeing this, is it because they're not enrolled early enough? is it a race condition? is it something else?

So I don't think we've actually validated the FirstStartup targeting is working as intended (which this telemetry would accomplish).

But not all users enrolled into the experiment see the correct about:welcome message, or any about:welcome message. Currently I believe a non-negligible part of new user experiment populations still experience this (~10%).

Right, we just found that around 1-3% of treatment users ended up getting the default first-run experience in all the first-run experiments (multi-stage v1/v2/v3). Note that users in the control group never got the incorrect experience, so we don't believe this was caused by the incorrect enrollment during the first startup.

So I don't think we've actually validated the FirstStartup targeting is working as intended (which this telemetry would accomplish).

Given all the observations so far, we're inclined to believe the user enrollment (including the FirstStartup targeting) is working as expected. The issue(s) are more likely to take place after the experiment enrollment. Specifically, when Firefox tries to render the about:welcome based on the experiment data. We're suspecting that for a certain amount of users (i.e. slow clients), the experiment data (which requires disk io) could be unavailable, so it has to fall back to the default first-run, and that's what causes the above issue.

Punam and I had discussed the possible solutions as follows:

  • Record a telemetry event when detecting the experiment data is unavailable and the fallback is happening
  • For first-run experiment users, we could unenroll them if the expected experience can't be fulfilled during first-run

What do you think, Su?

A new telemetry probe called activation will be added to Nimbus in bug 1675104. I believe that should allow us to understand what exactly happens to the experiment users.

See Also: → 1675104

Hey Nan,

That's interesting about the I/O error. Having more telemetry to detect those experiment data events will be super helpful, I think that's a great idea.

I don't think we should unenroll if the expected experience can't be fulfilled though, because:

  • one, we'll be introducing sampling bias at that point, and the experiment results are no longer truly "random" (this breaks the experiment)
  • two, if we have some probe that tells us the user got enrolled in the experiment but didn't correct experient, we could potentially correct for it.

It makes sense to measure, but not intervene.

But to the original point about firstStartup, so users seeing the correct about:welcome doesn't indicate that they're on firstStartup, which is the behavior we're trying to validate, right? Like all those users who were enrolled in V1/2/3 experiments, how do we know they're specifically firstStartup users, versus other new profiles (profile manager, etc.).

Confirming that everyone enrolled into those 3 experiments were new profiles that were on the firstStartup (and don't include new profiles that are on other new profile protocols) run, and that the enrollment happend during firstStartup (as opposed to the second checkin with Remote settings) is thing thing we want to know. And about:welcome can happen when either of conditions aren't true, so the presence of the correct about:welcome doesn't validate what we're trying to know.

(In reply to Su-Young Hong from comment #12)

I don't think we should unenroll if the expected experience can't be fulfilled though, because:

  • one, we'll be introducing sampling bias at that point, and the experiment results are no longer truly "random" (this breaks the experiment)
  • two, if we have some probe that tells us the user got enrolled in the experiment but didn't correct experient, we could potentially correct for it.

It makes sense to measure, but not intervene.

That's a good point, Su.

But to the original point about firstStartup, so users seeing the correct about:welcome doesn't indicate that they're on firstStartup, which is the behavior we're trying to validate, right? Like all those users who were enrolled in V1/2/3 experiments, how do we know they're specifically firstStartup users, versus other new profiles (profile manager, etc.).

Confirming that everyone enrolled into those 3 experiments were new profiles that were on the firstStartup (and don't include new profiles that are on other new profile protocols) run, and that the enrollment happend during firstStartup (as opposed to the second checkin with Remote settings) is thing thing we want to know. And about:welcome can happen when either of conditions aren't true, so the presence of the correct about:welcome doesn't validate what we're trying to know.

Wanted to clarify that the targeting isFirstStartup allows us to only enroll users to the first-run experiments during the first startup run. To my understanding, the first startup run only takes place when one installs the browser via the Firefox installer on Windows. Since this is the only way to trigger the first startup run, I think the first-run experiment enrollment ping is essentially equivalent to the telemetry of experiment users who see "FirstStartup". Does that make sense to you?

(In reply to Nan Jiang [:nanj] from comment #10)

But not all users enrolled into the experiment see the correct about:welcome message, or any about:welcome message. Currently I believe a non-negligible part of new user experiment populations still experience this (~10%).

Right, we just found that around 1-3% of treatment users ended up getting the default first-run experience in all the first-run experiments (multi-stage v1/v2/v3). Note that users in the control group never got the incorrect experience, so we don't believe this was caused by the incorrect enrollment during the first startup.

Investigating deeper into 1-3% treatement users seeing default experience , it seems a browser restart right after install and loading about:welcome
seems to be the reason. During this subsequent restart, user stays enrolled but experiment data is not available for about:welcome resulting in fall back to default experience. Created bug https://bugzilla.mozilla.org/show_bug.cgi?id=1678516 to log this investigation and fix the issue

You need to log in before you can comment on or make changes to this bug.