Skylight production messages section includes some experiment messages
Categories
(Firefox :: Messaging System, defect, P2)
Tracking
()
People
(Reporter: dmosedale, Unassigned)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [omc])
Messages that are distributed by the messaging system directly (ie not Nimbus experiments or Rollouts) don't currently know how to exclude counting messages/actions (eg in their dashboards) that are caused by Nimbus experiments or Rollouts with the same message ID. My belief is that this isn't a huge problem:
a) very few messages are deployed directly by Remote Settings these days
b) once messages land in-tree and ride the release train, rollouts ** (and most holdback experiments except for maybe the long-term holdbacks?) are generally turned off. That said, a dashboard that spans both of those periods of time will show the hits from both inside and outside any experiments/rollouts.
This probably needs some more investigation & may want to be split up into multiple tickets.
@Aminomancer, do you have any thoughts here on the long-term holdback stuff?
Updated•1 year ago
|
Comment 1•1 year ago
•
|
||
Ideally, we incorporate the branch slug or some similar identifier into each screen.id
for experimental messages. At least my squad has been doing this, and this seems to give us one more reason to make it a standard practice. The screen id is incorporated into the message_id
value that is used for telemetry, so message ID does differ in this case. The motivation here is that we might run an exploratory experiment testing a message, then run it again as a rollout, then ultimately ship it in tree, and we don't want to show a user the same message 3 times. This gives us the ability to treat all of them as the same message for the purposes of frequency capping and blocking, while still being able to differentiate them in telemetry.
So at least for surfaces that use the aboutwelcome bundle, it is theoretically possible for Skylight to add a filter to the look that restricts it to just production messages. Since Skylight knows the entire content for production messages (or indeed for all messages), it can determine what the final message_id
will be for each screen and filter on that. And the same could be done for the experiment messages too.
You're right that rollouts and holdbacks aren't really an issue, especially long-term holdbacks which restrain us from landing the treatments in tree until they're finished. I think the more likely issue is where a production message already exists in tree, and a new variant of it is shipped via an experiment. We've done this several times with feature callouts, particularly the PDF.js annotation messages and Review Checker callouts. (Unfortunately in those cases we weren't able to include an identifier in the screen id.) We're likely to continue working in this way, iterating on the default experience with messages. So I think it would hinder the utility of Skylight if, when you visited the Dashboard for a production message, it included data from experimental variants.
That said, it seems like an opportunity to make the dashboards more sophisticated. The dashboard showing data for all messages with that ID is potentially useful, but only if we pivot on the message source. Then you could click one dashboard link and see how the default production message compares to any contemporaneous experimental variants. So rather than simply filtering on the screen id component of the message id, we could pivot on it instead.
Edit: Also, something else we could do is just include a string metric in all message telemetry to indicate the message's source. This seems relatively straightforward. ASRouter puts a provider
property on every message indicating where it came from. If it's from an experiment, it'll say messaging-experiments
; if it's from remote settings it should say cfr
(though could theoretically be something else if we used other collections); and if it's local it should say onboarding
(currently that's the only local provider we use). So, wherever message telemetry is collected, we can just include that message property as Glean metric called message_provider
.
Reporter | ||
Comment 2•1 year ago
|
||
Part of this is likely to be documenting various different message id creation/selection processes we've seen in the past and are in play now and then deciding if they are adequate.
Updated•1 year ago
|
Comment 3•11 months ago
|
||
Do we need to begin a spike on this work? I do think that it's going to be important for us to eventually be able to delineate the difference between data for in-production messages and experiment variants. It sounds like there are several different ways that this can be done. Is the next step to evaluate the options and select one of the paths?
Updated•11 months ago
|
Comment 4•1 month ago
•
|
||
Just to summarize some more context and findings from an initial attempt at this bug:
Glean metrics are added here, based on what's in event_context
, which is provided by the things that call submitGleanPingForPing
. submitGleanPingForPing
is either called directly or is called by sendTelemetry
. The solution here requires us to track down the callers of submitGleanPingForPing
, trace the code path to the last method that is aware of the message itself, and then make sure it passes the message's provider on. We would also need to add the new provider metric into asrouter/metrics.yaml
if we decide to unfold it from event_context
in our pings.
It might be easier to just find everything that defines event_context
and add the provider info there, but some of those are only aware of of the messageId
rather than the entire message, so they need to be traced further back. So callers like this need to be restructured to pass message
instead of messageId
.
In addition to adding the message_provider
to telemetry, it would also be nice to add the nimbus experiment and branch slugs. When a message from an experiment shows, ASRouter adds the slugs to the message making these properties exist on the message, just like provider does.
This is the most optimal solution but it will be tedious tracking down every code path that ends in submitGleanPingForPing
. The other approach is pivoting the looks on the screenId
component as described in Shane's previous comment, but that is relying on screenId
being unique, which is a convention that is not widely documented/enforced.
Description
•