Closed Bug 1548631 Opened 5 years ago Closed 5 years ago

Add capability checking for recipes

Categories

(Firefox :: Normandy Client, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 70
Tracking Status
firefox70 --- fixed

People

(Reporter: mythmon, Assigned: mythmon)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

As we add new filter functionality, like we did in bug 1543484, or new actions like we are doing in bug 1547034 and bug 1536644, we are introducing backwards incompatible features into Normandy. Because all recipes are sent to all users, recipes that use newer features will cause older clients to raise errors or behave incorrectly.

As we move forwards, it is clear that sending all recipes to all clients is a very valuable detail of Normandy. It makes scaling the service very easy, and simplifies the execution model. We are actively moving to distribute recipes via Remote Settings in bug 1513854, which strongly prefers every client having the same data. In the future we may explore the idea of preloading recipes on clients, which requires all clients to have access to all recipes.

So instead we need some way for clients to filter out recipes that they are not compatible with. We could silently ignore features that aren't recognized, but that leaves room for typos and other silly mistakes to cause problems. Instead I think it would be better for each recipe to carry compatibility information that clients can check.

Publishing a minimum version could work, but it would be tedious to maintain as we would have to track features carefully between several environments (servers, admin panels, and clients). It also doesn't allow us to backport features or run experiments that change capabilities for some clients. Further, version filters are already a part of filter expressions, and I think that duplicating that functionality in a different place would be a mistake.

Instead I propose that each recipe carries a set of required capabilities. These capabilities would be simple strings like action:multi-pref-experiment or filter-transform:versionCompare. Each client would have a list of known capabilities. Recipes that carry an unrecognized capability would be ignored.

In the future, we may provide ways for system add-ons or other off-train code deployment tools to extend the capabilities of a client, but that is out of scope for this bug.

These capability checks should happen before any processing of the recipe (except possibly signature verification), to maximize the scope of areas we can cover by capabilities. They explicitly not be a part of the filtering phase, so that we can extend our filtering capabilities.

Blocks: 1562267

It would be so awesome if we could have JSON files published for each version that we could use to build a little caniuse type tool.

Assignee: nobody → mcooper
Status: NEW → ASSIGNED
Blocks: 1563564

I filed bug 1563564 to export capability data, like Rehan mentioned in comment 2.

(In reply to Michael Cooper [:mythmon] from comment #0)

Publishing a minimum version could work, but it would be tedious to maintain as we would have to track features carefully between several environments (servers, admin panels, and clients). It also doesn't allow us to backport features or run experiments that change capabilities for some clients. Further, version filters are already a part of filter expressions, and I think that duplicating that functionality in a different place would be a mistake.

I'm a bit confused. Remote Settings supports filtering functionality that could be used to ensure that we run in client versions that support the requisite functionality.

Instead of changing the recipe format and implementation, wouldn't a tool on the server / admin panel that automatically analyses what features a recipe needs, and produces a minimum version for the filter function, suffice for what we're trying to do here? If not, why not?

To me, Normandy is already pretty complex, and adding more complexity to the recipe format and thus more combinatorial risk, seems less than ideal. I'd prefer that logic to live on the server. It seems we want to eat the cost for having a server-side tool anyway per comment #2 / bug 1563564, so I'm not convinced we should be doing anything on the client...

Flags: needinfo?(mcooper)

Thanks for looking at this. There are two reasons that I want to do this filtering client side: it doesn't model the situation my team plans to implement accurately, and sending all recipes to all clients is a really powerful property I'd like to keep.

The first is rather theoretical: in the near-ish future we plan to have more Normandy clients. Namely, we want a Android Component version, and potentially an iOS version after that. Given how different these platforms are, I don't expect a simple scalar version number will work.

For example, if I could summon a version of Normandy for Fenix today, it wouldn't support filtering by add-ons even if it is the same version of the underlying code as on desktop. This is because Fenix doesn't support add-ons. So it isn't sound in this model to say that version N does not support add-on filtering and version N+1 does for any given N. Further, the Android Component client could actually have different capabilities depending on which mobile application it is running. As the two clients evolve further, the Venn-diagram of overlapping and unique features will likely become more interesting and less scalar.

As a potential alternative we could do some sort of capability modeling on the server, where we know which versions of which clients on which platforms support which features. That's exhausting just to write out though, much less implement. I'm sure that would inevitably get out of sync with the actual implementations. Generating these capabilities in the client seems more reliable long term to me.

The second reason I touched on in comment 0, but I can expand it a bit. Having all recipes available on all clients is really powerful, and I consider it one of Normandy's most important properties. We It means that we can use conceptually simple delivery mechanisms like CDNs or Remote Settings, simple caching which helps us scale the servers, and simple storage. If a client changes over time (say, upgrades to a new version of Firefox), that client could in theory take advantage of a previously incompatible recipe immediately. Adding any level of dynamism to the recipe serving would be a fairly large course change in how we are implementing the server side components of Normandy.

As an aside, you said "Remote Settings supports filtering functionality". This is true, however that filtering is done in a very similar way Normandy's existing filter expression based filtering. All data is delivered to every client, and the clients run their own filters. Often those filters are even JEXL, just like Normandy. My patch in this bug actually removed RS-level filtering of the recipe collection that we are already doing because it was too complicated to implement this two-stage filtering approach with the RS involvement. I believe that the existing RS based filtering would be vulnerable to the same backwards-compatibility issue that I'm trying to fix here.

You're right that Normandy is already quite complex. I think that the existing situation lacks backwards-compatibility affordances, and so extra work is hoisted on the Normandy operators (often me) to know which version are compatible and which aren't, and sometimes we make mistakes. I think that any solution to the backwards-compatibility problem will make the system-in-code more complex. Pushing it to the server doesn't remove that multiplicative factor, just move it out of m-c. However, I think it makes the system-in-practice simpler, because it writes down more of the constraints instead of leaving them implicit.

I'm currently convinced that this is the lowest risk option, but I'm happy to work to refine the idea. There is no great urgency to get this out now, and I'd rather take the time to make something we're all happy with.

eat the cost for having a server-side tool

That server side tool would be out of the execution work flow, so has a much lower cost than fully integrating compatibility checking into the server. If that compatibility tool breaks, then existing recipes continue working and new recipes are harder to write. If a dynamic delivery compatibility tools breaks, the Normandy server breaks and starts sending bad data to clients.

Flags: needinfo?(mcooper)

As an aside, you said "Remote Settings supports filtering functionality". This is true, however that filtering is done in a very similar way Normandy's existing filter expression based filtering.

I don't think this was addressed: why is it preferable to implement a new capability field rather than e.g. implementing a new hasCapability JEXL filter? Maybe it's part of the rollout strategy?

There are two main problems with implementing this as a JEXL transform (transforms being the shape that functions take in JEXL). The implementation of hasCapability I'm thinking of is something like "jexl.transform.versionCompare"|hasCapability.

First, transforms are not backwards-compatible friendly. If a client encounters a transform that it doesn't recognize, it throws an exception. Solving this problem with another transform doesn't actually make the situation better. Admittedly, capabilities have a similar weakness, and in fact the problem is worse with capabilities, as clients that don't support capabilities will happily ignore the whole system.

However, I have some other plans for that that involve segmenting the recipes so clients that don't understand capabilities only see "basic" recipes. This is easier to do with capabilities as simple strings than with hasCapability because it doesn't require parsing any JEXL.

Second, JEXL expressions don't have any form of short-circuiting logic. We can't say something like hasCapability && otherFilters and expect otherFilters to only be evaluated when the client is compatible. Both sides of the && are evaluated every time. So if the right side is incompatible, then the whole thing crashes.

I thought of another implementation of hasCapability that you might have meant: A context variable like normandy.capabilities.jexl.transform.versionCompare. This could take advantage of JEXL's dot operator "forwarding" undefineds, so that if normandy.capabilities isn't present, the whole expression is undefined, and thus falsy. This would bypass the first problem, but not the second. It also still isn't something we can easily parse from a recipe if it is possible for it to be in JEXL.

Both of the problems above are things we could change. We could make transforms not crash-on-unknown, and make && short circuiting. Those seem like useful changes. I think capabilities is a larger system that has more benefits though, and I'd still like to include it.

Thanks for the explanation. I'm OK with the addition of capabilities as an explicit field of recipes rather than as something that has to be inferred from JEXL strings.

Pushed by mcooper@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/47d39253dbc2
Add capability checks before evaluating Normandy recipes r=Gijs,glasserc

Backed out changeset 47d39253dbc2 (bug 1548631) for failing at /browser_RecipeRunner.js on a CLOSED TREE.

Backout link: https://hg.mozilla.org/integration/autoland/rev/020b4a73328ab03ee24746f800bcf5485fd013d6

**Push with failures:**https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=258523435&resultStatus=testfailed%2Cbusted%2Cexception&revision=47d39253dbc2e02ed29689e8dca4f49879c4d6ba

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258523435&repo=autoland&lineNumber=1782

Log snippet:
[task 2019-07-26T17:12:22.748Z] 17:12:22 INFO - TEST-PASS | toolkit/components/normandy/test/browser/browser_RecipeRunner.js | Filtered-out recipes should be reported - [[{"id":"noMatch","action":"noMatchAction","filter_expression":"false"},"backoff"]] deepEqual [[{"id":"noMatch","action":"noMatchAction","filter_expression":"false"},"backoff"]] -
[task 2019-07-26T17:12:22.749Z] 17:12:22 INFO - Leaving test bound testRun
[task 2019-07-26T17:12:22.750Z] 17:12:22 INFO - Entering test bound test_run_includesCapabilities
[task 2019-07-26T17:12:22.751Z] 17:12:22 INFO - Buffered messages finished
[task 2019-07-26T17:12:22.753Z] 17:12:22 INFO - TEST-UNEXPECTED-FAIL | toolkit/components/normandy/test/browser/browser_RecipeRunner.js | getCapabilities should be called -
[task 2019-07-26T17:12:22.754Z] 17:12:22 INFO - Stack trace:
[task 2019-07-26T17:12:22.755Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:test_ok:1576
[task 2019-07-26T17:12:22.756Z] 17:12:22 INFO - chrome://mochitests/content/browser/toolkit/components/normandy/test/browser/browser_RecipeRunner.js:test_run_includesCapabilities:297
[task 2019-07-26T17:12:22.756Z] 17:12:22 INFO - chrome://mochitests/content/browser/toolkit/components/normandy/test/browser/head.js:wrappedTestFunction:306
[task 2019-07-26T17:12:22.757Z] 17:12:22 INFO - chrome://mochitests/content/browser/toolkit/components/normandy/test/browser/head.js:wrappedTestFunction:293
[task 2019-07-26T17:12:22.758Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1346
[task 2019-07-26T17:12:22.759Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1381
[task 2019-07-26T17:12:22.760Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:nextTest/<:1203
[task 2019-07-26T17:12:22.760Z] 17:12:22 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:803
[task 2019-07-26T17:12:22.762Z] 17:12:22 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-07-26T17:12:22.762Z] 17:12:22 INFO - TEST-UNEXPECTED-FAIL | toolkit/components/normandy/test/browser/browser_RecipeRunner.js | capabilities should be passed to loadRecipes -
[task 2019-07-26T17:12:22.762Z] 17:12:22 INFO - Stack trace:
[task 2019-07-26T17:12:22.763Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:test_ok:1576
[task 2019-07-26T17:12:22.763Z] 17:12:22 INFO - chrome://mochitests/content/browser/toolkit/components/normandy/test/browser/browser_RecipeRunner.js:test_run_includesCapabilities:298
[task 2019-07-26T17:12:22.764Z] 17:12:22 INFO - chrome://mochitests/content/browser/toolkit/components/normandy/test/browser/head.js:wrappedTestFunction:306
[task 2019-07-26T17:12:22.765Z] 17:12:22 INFO - chrome://mochitests/content/browser/toolkit/components/normandy/test/browser/head.js:wrappedTestFunction:293
[task 2019-07-26T17:12:22.766Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1346
[task 2019-07-26T17:12:22.766Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1381
[task 2019-07-26T17:12:22.767Z] 17:12:22 INFO - chrome://mochikit/content/browser-test.js:nextTest/<:1203
[task 2019-07-26T17:12:22.768Z] 17:12:22 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:803
[task 2019-07-26T17:12:22.770Z] 17:12:22 INFO - Leaving test bound test_run_includesCapabilities
[task 2019-07-26T17:12:22.771Z] 17:12:22 INFO - Entering test bound testReadFromRemoteSettings
[task 2019-07-26T17:12:22.772Z] 17:12:22 INFO - Console message: [JavaScript Error: "Unknown collection "main/normandy-recipes"" {file: "resource://services-settings/RemoteSettingsClient.jsm" line: 344}]
[task 2019-07-26T17:12:22.774Z] 17:12:22 INFO - sync@resource://services-settings/RemoteSettingsClient.jsm:344:13
[task 2019-07-26T17:12:22.775Z] 17:12:22 INFO - asyncget@resource://services-settings/RemoteSettingsClient.jsm:290:22
[task 2019-07-26T17:12:22.776Z] 17:12:22 INFO - async
loadRecipes@resource://normandy/lib/RecipeRunner.jsm:359:51
[task 2019-07-26T17:12:22.777Z] 17:12:22 INFO - invoke@resource://testing-common/sinon-7.2.7.js:2609:51
[task 2019-07-26T17:12:22.778Z] 17:12:22 INFO - proxy@resource://testing-common/sinon-7.2.7.js:2464:22

Flags: needinfo?(mcooper)
Attachment #9075843 - Attachment description: Bug 1548631 - Add capability checks before evaluating Normandy recipes r?Gijs!,glasserc! → Bug 1548631 - Add capability checks before evaluating Normandy recipes
Pushed by mcooper@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d1281ba0a85f
Add capability checks before evaluating Normandy recipes r=Gijs,glasserc
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 70
Flags: needinfo?(mcooper)
Blocks: 1587608
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: