Add more telemetry data around Normandy filters
Categories
(Firefox :: Normandy Client, enhancement)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox67 | --- | fixed |
People
(Reporter: glasserc, Assigned: glasserc)
Details
Attachments
(4 files)
The Normandy system is a complicated one with many components. Recipes/experiments are adjusted on a server; those recipes and experiments are delivered to the browser; client code decides which recipes to execute, and then executes them; those recipes hopefully change the behavior of Firefox; eventually, these recipes end and the client unwinds them. Because of the complexity of the system and its role in performing experiments, it can be hard to "see inside" and make sure things are working the way we intended and that the results we get are valid.
We already have some uptake telemetry around Normandy, which gives us some visibility. However, there are a few questions that are hard to answer today.
- Is a given recipe being considered by clients, and what ratio of our clients are choosing to apply it? -- Being able to answer this question allows our data scientists to ensure they have the right level of statistical power to estimate the significance of their results. Right now this question is hard to answer because we only report the outcome of recipes that we receive AND decide to apply, and we don't have a way to answer:
- Is Normandy client getting the right recipes? We've never had a problem with this per se, but it presents a confounding factor above and it serves as a valuable health check to have confidence in this system's behavior going forward.
- Is there any problem with the filter? We've had some bugs (e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1477156) where a filter was itself invalid. Most things in Normandy already report errors, but this particular code path isn't instrumented.
To address this bug, I would like to do the following:
- Add uptake telemetry to report a status for recipes that we consider and do not match the filter for. Because we already report uptake telemetry for recipes that we consider and do match the filter for, this will solve #1, allowing us to calculate the ratio of clients that are choosing to apply it. Additionally, this is the main "hole" in uptake telemetry for recipes so it would allow us to get some insight into point #2, are we getting the right recipes.
- Add a keyed histogram of recipe ID and recipe last_modified. This might be excessively paranoid but it helps ensure that #2 is addressed, because we will be recording not only the recipe IDs but also gives us information indicating the freshness of that recipe.
- Wrap the Jexl filter execution in a try/catch and log in uptake telemetry a CONTENT_ERROR for the recipe if the Jexl filtering threw an exception.
| Assignee | ||
Comment 1•6 years ago
|
||
Hi :chutten, I know you have a little bit of the context here but feel free to pass review to someone better suited to it.
Comment 2•6 years ago
|
||
| Assignee | ||
Comment 3•6 years ago
|
||
This will make it easier to report recipe freshness.
| Assignee | ||
Comment 4•6 years ago
|
||
Depends on D22016
| Assignee | ||
Comment 5•6 years ago
|
||
Report when recipes don't match the filter. Report when Jexl filters
themselves fail, with an added test.
The existing test for remote-settings usage had a bug, so fix that
too.
Depends on D22017
| Assignee | ||
Comment 6•6 years ago
|
||
Comment 8•6 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/fa9146245803
https://hg.mozilla.org/mozilla-central/rev/0d1b3156e7c8
https://hg.mozilla.org/mozilla-central/rev/10333e6aa4b0
Updated•6 years ago
|
Description
•