Closed Bug 1458920 Opened 7 years ago Closed 6 years ago

Custom target filtering function

Categories

(Firefox :: Remote Settings Client, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 62
Tracking Status
firefox62 --- fixed

People

(Reporter: leplatrem, Assigned: leplatrem)

References

Details

Attachments

(4 files)

In Bug 1451031 we want to implement JEXL to mimic what is done in Normandy.

But legacy use-cases like blocklists have target information in fields like `targetApp` etc.

Since we are not ready on the UI side to support JEXL on entries, it would be simpler to be able to specify a custom target filtering function on the client side. 

For blocklists, this function would mimic what was done on the server in Bug 1434302
Assignee: nobody → mathieu
In RemoteSettings.get() why can't blocklists simply write their own filter for `targetApp`?  When we add JEXL won't adding this custom target filtering be confusing and immediately deprecated?
> why can't blocklists simply write their own filter for `targetApp`?

It would have to be called on sync events data, on .get() results etc. It makes it simpler to always consider the data to be filtered by target. Beyond blocklist remote settings clients initialization their usage will be exactly the same as any other use case.

Generally, I consider the blocklists use case a bit specific since it is a legacy package that we are transitioning to the new solution.


> When we add JEXL won't adding this custom target filtering be confusing and immediately deprecated?

I don't plan on documenting this custom target filtering. It would only be part of the migration from legacy (like other advanced options BTW)

Otherwise, yes we could plan to deprecate it indeed, but that won't be super obvious. 
For example:
- we don't have have a UI where JEXL rules can be edited yet, whereas we have a UI in prod where blocklists target can be expressed
- we need to be able to express the same rules using JEXL as the current formalism (minTargetApp=* etc.), and as of today I can't say if we'll have bijection between the two 
- editors will have to migrate existing targetApp rules to JEXL, and the client behaviour will have to be validated for each migrated entry (it makes it difficult for old entries, or some use cases like Gfx where only Jorge has an editor hat, etc.)
- In case a FFx version ships with this custom filtering but no JEXL capabilities, then we'll loose those clients when migrating

But, if we have that custom filtering function:
- we can serve the needs of Bug 1257565 
- we don't need to migrate the old blocklist entries
- when we introduce JEXL support, we can modify that function with a simple if statement: if entry has jexl then use jexl filter, otherwise use legacy targetApp filter
- when the new UI is ready, blocklist editors will be able to define targets using JEXL and both mechanism will be supported by clients
- once the new stack (UI + client JEXL filter) has shipped, we can modify the blocklists JSON schema on kinto-admin to hide the targetApp field and promote the new JEXL textarea
Blocks: 1451031
Comment on attachment 8974063 [details]
Bug 1458920 - Update RemoteSettings documentation

https://reviewboard.mozilla.org/r/242108/#review248454

::: services/common/docs/RemoteSettings.rst:185
(Diff revision 1)
>  The internal IndexedDBs of remote settings can be accessed via the Storage Inspector in the `browser toolbox <https://developer.mozilla.org/en-US/docs/Tools/Browser_Toolbox>`_.
>  
>  For example, the local data of the ``"key"`` collection can be accessed in the ``main/key`` IndexedDB store at *Browser Toolbox* > *Storage* > *IndexedDB* > *chrome* > *main/key*.
>  
>  
> -about:remotesettings
> +\about:remotesettings

I assume the backslash here is not intentional?
Attachment #8974063 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8974064 [details]
Bug 1458920 - Add RemoteSettings custom filtering function

https://reviewboard.mozilla.org/r/242110/#review248456

::: services/common/blocklist-clients.js:140
(Diff revision 1)
> + *
> + * When landing Bug 1451031, this function will have to check if the `entry`
> + * has a JEXL attribute and rely on the JEXL filter function in priority.
> + * The legacy target app mechanism will be kept in place for old entries.
> + */
> +async function targetAppFilter(entry, { appID, version: appVersion }) {

Does this really need to be async? I think all the awaits mean we wait for a lot of microtask checkpoints, which will slow this down...

::: services/common/blocklist-clients.js:161
(Diff revision 1)
> +        if (guid == appID && matchesRange) {
> +          return entry;
> +        }
> +      }
> +    } else {
> +      const { minVersion = "0", maxVersion = "*" } = vr;

I don't think this is right for everything. My understanding is that the versionRange, confusingly, refers to the version of the item (add-on/plugin, whatever) and then the targetApplication has version range information for the application.

So the XML version of the blocklist right now e.g. has a flash entry that has:

```
      <versionRange maxVersion="27.0.0.159" minVersion="0" severity="0" vulnerabilitystatus="1"/>

```

which is about the flash version, not the target application. Same with the blocklist in indexeddb, where the RemoteSettings object returns the following `vr` for a java block:

```
{"severity":0,"maxVersion":"Java 6 Update 45","minVersion":"Java 6 Update 42","targetApplication":[{"guid":"{ec8030f7-c20a-464f-9b0e-13a3a9e97384}","maxVersion":"57.0.*","minVersion":"17.0"},{"guid":"{92650c4d-4b8e-4d2a-b7eb-24ecf4f6b63a}","maxVersion":"*","minVersion":"2.14"}],"vulnerabilityStatus":1}
```

So we shouldn't be checking the version info that's directly on the `vr` object if there are no target applications - we should just return the original entry.

The only exception seems to be the gfx list, which has 1 Firefox-version related maxVersion item, but that's limited to Windows XP anyway and we don't support that anymore... In any case, I guess we should keep the "there are no target apps" filtering but only for the gfx items, or something?

I *think* this is done correctly in the server-side python, but if not it needs changing there too...

::: services/common/remote-settings.js:453
(Diff revision 1)
> +    return data.reduce(async (accP, entry) => {
> +      const acc = await accP;
> +      const processed = await this.filterFunc(entry, environment);
> +      return processed ? acc.concat(processed) : acc;
> +    }, Promise.resolve([]));

I'm a bit confused - I don't know that `array.reduce` in JS deals with async accumulators, but in any case, wouldn't it be easier to implement this with `array.filter` ?

If we need to keep the async-ness, you can parallelize this better by doing something like:

```js
const environment = cacheProxy(ClientEnvironment);
let dataPromises = data.map(e => this.filterFunc(e, environment));
let results = await Promise.all(dataPromises);
return results.filter(v => !!v);
```

which just runs all the filters, and when all of them have returned, returns all the non-null items.
Attachment #8974064 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8974065 [details]
Bug 1458920 - Update blocklists dumps in release

https://reviewboard.mozilla.org/r/242378/#review248464

rs=me on the assumption it's not useful for me to review this manually...
Attachment #8974065 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8974066 [details]
Bug 1458920 - Filter RemoteSettings sync event data

https://reviewboard.mozilla.org/r/242380/#review248466

I don't think I'm the best reviewer for this change. I'm not sure what it's doing or how to evaluate it...
Attachment #8974066 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8974063 [details]
Bug 1458920 - Update RemoteSettings documentation

https://reviewboard.mozilla.org/r/242108/#review248454

> I assume the backslash here is not intentional?

It is, it prevents Sphinx from parsing about:remotesettings as an URL in the section title (which is ugly IMO).

See https://screenshots.firefox.com/dPsQS2X0ZkdmL2AG/localhost compared to https://firefox-source-docs.mozilla.org/services/common/services/RemoteSettings.html
Comment on attachment 8974066 [details]
Bug 1458920 - Filter RemoteSettings sync event data

https://reviewboard.mozilla.org/r/242380/#review248466

The idea here is that if we have target filtering, then the event data should also be filtered.

If among the data that was pulled during synchronization, none has a matching target, then we don't even fire the event.

This test simulates a server interaction where remote changes have target info which matches or not. And then inspects the event data.

I can ask Mark but he may say the same :) since certificates is not concerned with targets ;)
Comment on attachment 8974064 [details]
Bug 1458920 - Add RemoteSettings custom filtering function

https://reviewboard.mozilla.org/r/242110/#review248456

> Does this really need to be async? I think all the awaits mean we wait for a lot of microtask checkpoints, which will slow this down...

For this concrete patch indeed, we could make it synchronuous. But my idea was to pave the way for Bug 1451031 in which the JEXL filter will have to be async anyway.

Do you think the filter function should take a list as parameter instead? I found the single entry approach more elegant, but I may be totally wrong.

> I don't think this is right for everything. My understanding is that the versionRange, confusingly, refers to the version of the item (add-on/plugin, whatever) and then the targetApplication has version range information for the application.
> 
> So the XML version of the blocklist right now e.g. has a flash entry that has:
> 
> ```
>       <versionRange maxVersion="27.0.0.159" minVersion="0" severity="0" vulnerabilitystatus="1"/>
> 
> ```
> 
> which is about the flash version, not the target application. Same with the blocklist in indexeddb, where the RemoteSettings object returns the following `vr` for a java block:
> 
> ```
> {"severity":0,"maxVersion":"Java 6 Update 45","minVersion":"Java 6 Update 42","targetApplication":[{"guid":"{ec8030f7-c20a-464f-9b0e-13a3a9e97384}","maxVersion":"57.0.*","minVersion":"17.0"},{"guid":"{92650c4d-4b8e-4d2a-b7eb-24ecf4f6b63a}","maxVersion":"*","minVersion":"2.14"}],"vulnerabilityStatus":1}
> ```
> 
> So we shouldn't be checking the version info that's directly on the `vr` object if there are no target applications - we should just return the original entry.
> 
> The only exception seems to be the gfx list, which has 1 Firefox-version related maxVersion item, but that's limited to Windows XP anyway and we don't support that anymore... In any case, I guess we should keep the "there are no target apps" filtering but only for the gfx items, or something?
> 
> I *think* this is done correctly in the server-side python, but if not it needs changing there too...

Thanks, I got confused indeed.

I inspected the addons collection, and yes it makes sense:

http https://firefox.settings.services.mozilla.com/v1/buckets/blocklists/collections/addons/records | jq -r '.data[] | .versionRange[] '

Regarding Gfx, the versionRange is an object instead of a list, so that'll help us distinguish it :)

I read the server side code again and yes, it looks correct. I'll adjust the patch here.

> I'm a bit confused - I don't know that `array.reduce` in JS deals with async accumulators, but in any case, wouldn't it be easier to implement this with `array.filter` ?
> 
> If we need to keep the async-ness, you can parallelize this better by doing something like:
> 
> ```js
> const environment = cacheProxy(ClientEnvironment);
> let dataPromises = data.map(e => this.filterFunc(e, environment));
> let results = await Promise.all(dataPromises);
> return results.filter(v => !!v);
> ```
> 
> which just runs all the filters, and when all of them have returned, returns all the non-null items.

I thought that it could be useful for the filter function to have the ability to return a transformed version of the input, instead of just returning true/false. That's why I didn't just do array.filter.

I like your parallel version better. I used array.reduce to avoid iterating twice, but going parallel is way more efficient! Thanks
Comment on attachment 8974064 [details]
Bug 1458920 - Add RemoteSettings custom filtering function

https://reviewboard.mozilla.org/r/242110/#review248604

r=me with the test expectations clarified, and note the perf optimization for `matchesRange`. :-)

::: services/common/blocklist-clients.js:174
(Diff revision 2)
> +        return entry;
> +      }
> +      const { minVersion = "0", maxVersion = "*" } = ta;
> +      const matchesRange = (Services.vc.compare(appVersion, minVersion) >= 0 &&
> +                            Services.vc.compare(appVersion, maxVersion) <= 0);
> +      if (guid == appID && matchesRange) {

I would inline `matchesRange` here. In most of the plugin cases it seems we've re-added plugin blocks for SeaMonkey or something, and we should avoid the (relatively expensive) version comparisons if the (relatively cheap) guid comparison fails.

::: services/common/tests/unit/test_blocklist_targetapp_filter.js:123
(Diff revision 2)
> +      }]
> +    }]
> +  }]);
> +
> +  const list = await client.get();
> +  equal(list.length, 3);

Here and elsewhere, should we verify which items we get back? As it is, for some of these tests they could accidenally pass as long as they return the rigth number, even if they return the 'wrong' items.

It's also not actually clear to me off-hand what version the xpcshell appinfo returns and, therefore, which of these items we expect to return, and why. :-)

::: services/common/tests/unit/test_blocklist_targetapp_filter.js:169
(Diff revision 2)
> +      }]
> +    }]
> +  }]);
> +
> +  const list = await client.get();
> +  equal(list.length, 2);

Same thing here for the version numbers.

::: services/common/tests/unit/test_blocklist_targetapp_filter.js:173
(Diff revision 2)
> +  const list = await client.get();
> +  equal(list.length, 2);
> +});
> +add_task(clear_state);
> +
> +add_task(async function test_complex_version() {

And here.
Attachment #8974064 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8974064 [details]
Bug 1458920 - Add RemoteSettings custom filtering function

https://reviewboard.mozilla.org/r/242110/#review248456

> For this concrete patch indeed, we could make it synchronuous. But my idea was to pave the way for Bug 1451031 in which the JEXL filter will have to be async anyway.
> 
> Do you think the filter function should take a list as parameter instead? I found the single entry approach more elegant, but I may be totally wrong.

I think you're right and the single entry option is better, and if we need this to be async for jexl anyway, then sure let's do it this way.
Comment on attachment 8974066 [details]
Bug 1458920 - Filter RemoteSettings sync event data

https://reviewboard.mozilla.org/r/242380/#review248740

Tentative r+ but it would really be best if someone more familiar like Mark signed off on this, too.

::: services/common/remote-settings.js:346
(Diff revision 3)
> +      const created = await this._filterEntries(allCreated);
> +      const deleted = await this._filterEntries(allDeleted);
> +      // For updates, keep entries whose updated form is matches the target.
> +      const updatedFiltered = await this._filterEntries(allUpdated.map(e => e.new));

I'm tempted to parallelize this too:

```js
const [created, deleted, updatedFiltered] =
  await Promise.all(
    [allCreated, allDeleted, allUpdated.map(e => e.new)].map(this._filterEntries.bind(this))
  );
```

::: services/common/remote-settings.js:354
(Diff revision 3)
> +      const updatedFiltered = await this._filterEntries(allUpdated.map(e => e.new));
> +      const updatedFilteredIds = new Set(updatedFiltered.map(e => e.id));
> +      const updated = allUpdated.filter(({ new: { id } }) => updatedFilteredIds.has(id));
> +
> +      // If every changed entry is filtered, we don't even fire the event.
> +      if ((created.length + updated.length + deleted.length) > 0) {

Nit: could just do:
```js
if (created.length || updated.length || deleted.length)
```

::: services/common/remote-settings.js:358
(Diff revision 3)
> +        // Fire the event: execute callbacks in order and sequentially.
>          // If one fails everything fails.

This is pre-existing, but do we really want to let listeners break other listeners? Seems odd to me, but this isn't my code. :-)

::: services/common/remote-settings.js:361
(Diff revision 3)
> +        const current = await this._filterEntries(allData);
> +        // Fire the event: execute callbacks in order and sequentially.
>          // If one fails everything fails.
> -        const { created, updated, deleted } = syncResult;
>          const event = { data: { current, created, updated, deleted } };
>          const callbacks = this._callbacks.get("sync");

What kind of callbacks are these, and is it really going to be OK to stop sending this event if there are no new items? Do any of them use the timestamp of the last update independently, for instance? :-)

::: services/common/tests/unit/test_blocklist_clients.js:178
(Diff revision 3)
> +    await client.maybeSync(2000, Date.now() - 3000, {loadDump: false});
> +    await client.maybeSync(3001, Date.now() - 2000);
> +
> +    // In ?_since=4000 entries, no target matches. The sync event is not called.
> +    let called = false;
> +    client.on("sync", e => called = true);
> +    await client.maybeSync(4001, Date.now() - 1000);

I'm not familiar with these primitives so it's hard for me to review this... what does `maybeSync` do, and what are the timestamps used for?

Note that this is async code and you're continuously updating `Date.now()`. Especially on debug builds, it may take more than a second for these operations to complete, so this code looks fragile, though it's hard to reason about what the failure modes are if the code is fast/slow, because I'm not sure what `maybeSync` does...

::: services/common/tests/unit/test_blocklist_clients.js:198
(Diff revision 3)
> +
> +    // Since we had entries whose target does not match, the internal storage list
> +    // and the current data should differ.
> +    const collection = await client.openCollection();
> +    const { data: internalData } = await collection.list();
> +    notEqual(internalData.length, current.length, `event current data for ${client.collectionName}`);

Can you assert that the filtered data list should be smaller? :-)
Attachment #8974066 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8974066 [details]
Bug 1458920 - Filter RemoteSettings sync event data

https://reviewboard.mozilla.org/r/242380/#review248740

> This is pre-existing, but do we really want to let listeners break other listeners? Seems odd to me, but this isn't my code. :-)

I addded a mention about this behaviour and created Bug 1460525 

In our current use-case, we only have one event handler ;)

> What kind of callbacks are these, and is it really going to be OK to stop sending this event if there are no new items? Do any of them use the timestamp of the last update independently, for instance? :-)

Yes, we don't want listeners to be aware of data changes that don't concern their platform. Exactly as if the data was filtered on the server actually.

If we synchronize everything and filter on the client side, it is only because of the protocol we have in place to verify content signatures.

Also, currently the only implemented use-cases that we have are in blocklists-clients.js

When introducing the broader filtering feature (JEXL) for targeting, I'll add a detailed section about filtered data in the docs.

> I'm not familiar with these primitives so it's hard for me to review this... what does `maybeSync` do, and what are the timestamps used for?
> 
> Note that this is async code and you're continuously updating `Date.now()`. Especially on debug builds, it may take more than a second for these operations to complete, so this code looks fragile, though it's hard to reason about what the failure modes are if the code is fast/slow, because I'm not sure what `maybeSync` does...

Thanks, I'm sorry if it looks misterious.

I added some comments to help the reader. Let's see if Mark has some bandwidth to give his feedback ;)
Comment on attachment 8974066 [details]
Bug 1458920 - Filter RemoteSettings sync event data

https://reviewboard.mozilla.org/r/242380/#review248968
Attachment #8974066 - Flags: review?(mgoodwin) → review+
Keywords: checkin-needed
Target Milestone: --- → Firefox 62
Version: 57 Branch → Trunk
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 089e9abfa7c6cd21d22f84c1bb03116597ebcffc -d b4aafc195b9f: rebasing 463064:089e9abfa7c6 "Bug 1458920 - Update RemoteSettings documentation r=Gijs"
rebasing 463065:c6f813d1a0af "Bug 1458920 - Add RemoteSettings custom filtering function r=Gijs"
merging services/common/remote-settings.js
merging services/common/tests/unit/test_blocklist_clients.js
warning: conflicts while merging services/common/remote-settings.js! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/f0cc2bdcd8ef
Update RemoteSettings documentation r=Gijs
https://hg.mozilla.org/integration/autoland/rev/77b53df1f550
Add RemoteSettings custom filtering function r=Gijs
https://hg.mozilla.org/integration/autoland/rev/ce79538855b7
Update blocklists dumps in release r=Gijs
https://hg.mozilla.org/integration/autoland/rev/cfba5652a29c
Filter RemoteSettings sync event data r=Gijs,mgoodwin
Keywords: checkin-needed
Blocks: 1257565
See Also: 1257565
Depends on: 1463782
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: