Closed Bug 1627247 Opened 5 years ago Closed 4 years ago

Consider a more explicit strategy for unenrollment of experiments

Categories

(Firefox :: Nimbus Desktop Client, enhancement, P3)

enhancement

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- fix-optional

People

(Reporter: k88hudson, Unassigned)

References

Details

I have concerns that our current strategy of unenrolling users by relying on the absence of the recipe (including when targeting is no longer satisfied) is somewhat fragile and difficult to debug, and I would like to consider some possible alternatives. Could we rely on a more explicit concept of start and end dates for recipes for example?

I know there are are some specific requirements around undesired pref states, but I'd like to take another look in the context of messaging experiment requirements.

Assignee: nobody → khudson
Priority: -- → P1
Priority: P1 → P2
Assignee: khudson → nobody

I agree with this assessment. I think capability for specifying the end date of the experiment in the recipe upon delivery will be worthwhile to add to our toolbox.

A couple considerations for the differentiation between the recipe check enrollment strategy vs specifying at enrollment:

periodic recipe checks:

  • allows us to remotely hit the kill switch if the experiment is doing something bad
  • allows us to change experiment specs (keep it going longer, keep enrollment going longer, etc.) after it's been deployed
  • downside is it can be fragile

specifying at enrollment:

  • less fragile (hopefully)
  • can't change certain experiment specs after being deployed

For onboarding messaging experiments (about:welcome and triplets), where the intervention is really brief, happens once, and happens at the beginning of experiments, specifying specifying experiment length at enrollment makes sense.

Would be worthwhile to see if using this strategy does reduce some of the strangeness with unenrollment that we see coming in the wild.

I think a lot of the fragility of periodic recipe checks comes from three sources. The first is that preferences are hard, which doesn't apply to messaging experiments. The second comes from failures in filter evaluation. This one is more interesting. The third is complex targeting, which I think we have other plans to fix.

The main filter evaluation failure in recent history that comes to mind is that location filtering can fail, which caused unenrollment. With Normandy's new Suitabilities system (bug 1604363, effectively available Firefox 77), this is much less of a concern. For compatible recipes (currently Pref and Add-on experiments, but not Messaging), infrastructure failures will not cause unenrollment. Clients will re-try for up to 7 days before giving up and finally unenrolling. This treatment of temporary failures will make Normandy much more resilient to the second type of fragility. We will also receive better telemetry in this case.

The third type of fragility isn't related to manual or automatic recipe ending. It's related to ongoing eligibility for experiments. In my opinion this ongoing eligibility check is important. The place where it falls short is that some targeting criteria are only enrollment criteria, not ongoing eligibility criteria. In the past we assumed everything was ongoing. Today we have tools to handle enrollment-only criteria well. Given that we are planning to significantly limit the choices that users can make about targeting, we can make sure to get the difference between ongoing and enrollment-only filters right and then re-use that choice.

Additionally, we can choose both. We can encode start and end dates in the recipe's targeting, and in fact have done this on several occasions. If we want to go all-in on this, we can then also mark all other targeting criteria as enroll-only, so the only ongoing eligibility checks would be the date filter. This doesn't provide any clear route for things like "expire 7 days after enrolling" (though this may be possible as well). This is nice because it doesn't require any client-side changes.

Historically, there are two reasons why the Normandy experiments system has not done date based enrollment. The first is that launch dates and end dates are slippery, and keep moving around in many cases. We often don't know when the end date for an experiment is until it's actually over, and we often end early both for failure cases but also for success cases ("we've learned everything we need, we can end early"). The second is that it makes it harder for external analysis systems (like Experimenter and Telemetry analysis) to decide when an experiment was actually available to users. With a simple "is enabled" boolean, the job is a lot easier for automated system.s

Component: Messaging System → Nimbus Desktop Client
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.