Open Bug 1810208 (reduce-impression-gc) Opened 2 years ago Updated 2 years ago

reduce complexity and increase robustness by making impression GC much more conservative

Categories

(Firefox :: Messaging System, task, P3)

task

Tracking

()

People

(Reporter: dmosedale, Unassigned)

References

(Blocks 1 open bug)

Details

We should consider reducing or stopping garbage collection of FxMS impressions from the database.

When impressions are GCed "too soon", that means that frequency caps or lifetime caps of messages or groups are ignored, and users get shown stuff they've already seen that either we don't want them to see again at all, or that we don't want them to see that soon because it would be too annoying.

There are at least a couple of cases of "too soon" that we've run across (or had make our discussions more complicated), including:

  • frequent reappearance of an already-seen doorhanger was caused by inadvertent cleanups (in bug 1795037, which did require an uplift)
  • something gets GCed because of Nimbus experiment/rollout override semantics interact with FxMS impression collection semantics in a way that we don't expect because of the complexity of reasoning about the factors in play when discussing upcoming experiment structure and sizing.

Stopping GCing would:

  • improve robustness by making both of the kinds of cases non-issues
  • simplify Product/Data Science/Engineering structuring/sizing/analysis of experiments by removing one class of variables that is currently part of the reasoning matrix

The amount of data a given user has, even over the long term, is likely to be measured in kilobytes (ie not large). We could also consider adding some sort of more conservative GC based on last read or last written information.

As we consider what to do here, we may also want to decide to support running experiments on feature-ids that already have a rollout.

What's the "bug in our impression code" referring to?

Apologies, that was not clearly framed. I was referring to the frequent reappearances of an already-seen doorhanger having been caused by inadvertent cleanups (bug 1795037, which did require an uplift). I've updated comment 0 with that info for clarity.

Okay, that bug 1795037 was fixing a larger issue of messages getting removed when using private browsing but yes it had symptoms of impressions disappearing. Although if the impressions weren't cleaned up, the underlying bug of disappearing messages would have been much harder to spot.

Have there been examples of "impressions are GCed… caps of … groups are ignored" or is that theoretical?

(In reply to Ed Lee :Mardak from comment #3)

Okay, that bug 1795037 was fixing a larger issue of messages getting removed when using private browsing but yes it had symptoms of impressions disappearing. Although if the impressions weren't cleaned up, the underlying bug of disappearing messages would have been much harder to spot.

A fair point.

Have there been examples of "impressions are GCed… caps of … groups are ignored" or is that theoretical?

This is really a reaction to the complexity (and what feels like error-proneness) of our rollout/optimization experiment/sizing discussions over the past few weeks; it has not yet bitten us concretely. That said, maybe this stuff gets more straightforward as we get more experience.

Given the above, another way to think about this ticket is as a proposal to make it possible to experiment on featureIds with rollouts on them as a way to remove an error state from the system, and looked at that way, it does feel premature.

We had talked in one of those meetings about getting a smaller cross-discipline group together to discuss complexity & simplification. I'm now thinking this is better discussed as part of the big picture there.

It's also possible that that meeting wants to be framed more broadly as a retrospective specifically focused on managing rollouts and optimization experiments together after we've finished this current round of experiments and rollouts (eg mid-110).

Thoughts?

Type: defect → task
Priority: -- → P3
Blocks: 1815956
See Also: → 1859302
See Also: 1859302
See Also: → 1858010
You need to log in before you can comment on or make changes to this bug.