Closed Bug 1819456 Opened 1 year ago Closed 1 year ago

ASRouter doesn't update messages when natural experiment enrollment happens during runtime

Categories

(Firefox :: Messaging System, defect, P2)

defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox112 --- fixed
firefox113 --- fixed

People

(Reporter: aminomancer, Assigned: aminomancer)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

This bug is about changing the way ASRouter detects experiment enrollment. Right now we only hear about experiment enrollment at startup and after force enrollment. But I think it's possible for natural experiment enrollment to happen during startup, but after ASRouter initializes.

This doesn't seem right, it should either listen for natural enrollment at any point at runtime, or it should await resolution of whatever the experiment API is doing to find all the experiments it's gonna enroll in and enroll in them. I suspect this is the reason we are encountering some oddities while testing the patch for bug 1804480. That may also be influenced by variance in how different triggers are handled, I haven't looked too closely at that yet but it's worth considering.

As far as I can tell we don't have a meta bug for general desktop experiment API or ASRouter changes. Do you think we should make one?

I also have an example gist with a draft of my initial knee-jerk reaction to this problem. I'm not sure we really want to listen to experiment unenrollment, mainly because it's possible that might "orphan" some behaviors in more complex experiments that are expected to set prefs or queue up other messages. If the experiment gets unenrolled midway through the planned flow then cleaning it up might be a problem. So probably we should just leave messages where they are. I don't think it's that likely that users will actually wind up unenrolling from experiments while they are still in progress, but this protection doesn't cost us anything — actually it makes this more performant since it's one fewer notification to observe. Actually, without listening to unenrollment, it means we're listening to the same number of notifications. Instead of listening for forced enrollment, we'd just listen for enrollment in general. So we're still just watching one notification. The only difference is that notification will happen more frequently, since it will fire both for forced enrollment and for natural enrollment.

One of the ways I'm testing this is to compare how messages get loaded after enrolling in two different experiments: a multi-message experiment containing two nthTabClosed-triggered messages, and a single message experiment containing just one nthTabClosed-triggered message.

The multi-message experiment

  • Experimenter link
  • Enrollment instructions:
    1. Run mach run and then open about:config and set messaging-system.rsexperimentloader.collection_id to nimbus-preview
    2. Quit out of Firefox
    3. Run mach run --setpref app.normandy.user_id=b0fd9396-3785-4809-a129-c17f825bf6dc
    4. Open a new tab and close it
    5. A message should appear, dismiss it
    6. Open another new tab and close it
    7. A final message should appear, dismiss it

The single-message experiment

  • Experimenter link
  • Enrollment instructions:
    1. Run mach run and then open about:config and set messaging-system.rsexperimentloader.collection_id to nimbus-preview
    2. Quit out of Firefox
    3. Run mach run --setpref app.normandy.user_id=2e5de8f5-59f3-4e96-a8f6-5773b75acef9
    4. Open a new tab and close it
    5. A message should appear, dismiss it
See Also: → 1804480
Priority: -- → P2
See Also: → 1817319

I have some updates about this idea. I didn't realize this when I filed the bug, but It seems that ASRouter doesn't update messages on a timer — instead, when messages would be updated for other reasons (like a trigger), ASRouter checks if a minimum amount of time has elapsed since each provider was last updated, and skips updating if not. So this means messages are allowed to update hourly, but updates are not triggered hourly. Updates at runtime require prefs to be changed or something to call sendTriggerMessage, so generally I think in normal operation, when users are enrolled in an experiment, its messages will not be loaded without some user interaction. Which kind of defeats the purpose of bug 1817319, which was meant to give us the ability to avoid waiting for user events to show a message or record reach. If new messages can't be loaded without user events, then the new trigger doesn't really change the math there.

Possible solutions:

  • Give ASRouter an "enrollment listener" that subscribes to experiment enrollment changes and updates messages accordingly
  • Set ASRouter to update messages on an interval

Basically all of our dynamic messages are coming from nimbus, so loading on that event would make sense. Would be good to be careful about multiple concurrent/back-to-back updates as we can deploy messages across many "nimbus features" and there could be potential issue if refreshing on say "feature 1" then again on "feature 5." Not sure if there might be any issues within a given feature too, e.g., switching among experiments or rollouts, if we were to update "too frequently."

(In reply to Ed Lee :Mardak from comment #4)

Basically all of our dynamic messages are coming from nimbus, so loading on that event would make sense. Would be good to be careful about multiple concurrent/back-to-back updates as we can deploy messages across many "nimbus features" and there could be potential issue if refreshing on say "feature 1" then again on "feature 5." Not sure if there might be any issues within a given feature too, e.g., switching among experiments or rollouts, if we were to update "too frequently."

We could use the same updateCycleInMs/shouldProviderUpdate approach for this, so that, like with triggers causing a refresh, the enrollment listener can only cause updates once per hour per provider (or faster for tests that change prefs). Throttling updates would bias our reach measurements, so a faster option might be to 'debounce' updates instead of throttling them, like basically just preventing concurrent updates. So if update 1 is running while update 2 wants to happen, we could just make update 2 await a promise resolved by update 1. I'll see if I can do some testing and identify issues with that nearly unbounded approach.

Ah, relying on shouldProviderUpdate might result in the first messaging nimbus feature onUpdate while other messaging features need to wait until the next update. But maybe by the time nimbus would notify that a feature is enrolled, other features would be ready to check anyway. I'm not actually sure how quickly they could come in or if they could even overlap anyway.

I think what we want is to wait for some notification that nimbus has finished processing the current batch of experiments across all features (including the one feature for forced enrollment?? maybe not as it might honor the updateCycle and seemingly not see the new message…). We could be smart and conditionally load messages only when a messaging feature is changed, but maybe that's not actually necessary?

Oh I see what you mean. Yeah I like the idea of batching all the features and notifying ASRouter about them all at once.

Forced enrollment currently does its own thing, it enrolls and notifies ASRouter, and ASRouter immediately reloads the entire experiments provider in response. Regardless of the update cycle (if we pass an array of providers to loadMessagesFromAllProviders, it doesn't bother checking the last update time).

For natural enrollment, it looks like the updateRecipes method iterates over all the recipes, awaiting manager.onRecipe each time, which is where the enrollment happens and presumably where we'd fire our notification to ASRouter. So if we just did that, it would mean the first recipe's notification causes an update and the others don't even happen until that's done and has blocked further updates.

But we could refactor that method a bit so that, for the last recipe in the loop specifically, it passes a special parameter to onRecipe that determines whether onRecipe will fire a notification to ASRouter. Kinda hard to describe in english but I can write a draft for it.

Assignee: nobody → shughes
Status: NEW → ASSIGNED
Blocks: fxms-infra
Attachment #9321748 - Attachment description: Bug 1819456 - Update ASRouter messages on enrollment changes. r=#omc-reviewers → Bug 1819456 - Update ASRouter messages when experiment recipes are updated. r=#omc-reviewers
Pushed by shughes@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8d5e4dfff15e
Update ASRouter messages when experiment recipes are updated. r=omc-reviewers,barret

Backed out for causing xpcshell failures on test_RemoteSettingsExperimentLoader.js

Backout link

Push with failures

Failure log

Flags: needinfo?(shughes)
Pushed by shughes@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a0d0f5b92150
Update ASRouter messages when experiment recipes are updated. r=omc-reviewers,barret
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch

Comment on attachment 9321748 [details]
Bug 1819456 - Update ASRouter messages when experiment recipes are updated. r=#omc-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: We are currently running a two-part experiment to measure the "reach" of messaging triggers both before and after the resolution of this bug. The first part of the experiment targets Nightly users and we analyze the data by separating them into buckets based on whether their build includes this patch or not. The second part of the experiment will target the Release channel and should give us a more accurate assessment of the relative reach of triggers. It doesn't make the same comparison before/after this patch, but it does really want the release candidate to include this patch, since it increases the reach of many messages. Missing this patch in Fx112 release would have a minor user effect in that users enrolled in experiments may not see messages immediately, but the main impact is for our experiment linked above. It would mean we collect less accurate data and might have to redo the experiment later after the patch ships to the release channel. So, we're requesting beta uplift so that it can ride the train to Fx112 release instead of 113.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The behavioral changes are restricted to a module that loads messages from experiments, so it can only impact users who are enrolled in messaging experiments. The effects have also been experimentally verified in Nightly. The worst case scenario would have no user-facing effects.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(shughes)
Attachment #9321748 - Flags: approval-mozilla-beta?

Comment on attachment 9321748 [details]
Bug 1819456 - Update ASRouter messages when experiment recipes are updated. r=#omc-reviewers

Approved for 112.0b3

Attachment #9321748 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: