Bug 1893086 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

### Background and motivation

On desktop we recently implemented a potential-impressions experiment framework that lets us run Nimbus experiments to gather usage data on keywords without making any code changes (bug 1881873, [spec](https://docs.google.com/document/d/1Gdoj0eSRo2Qu1X90LbpX2SXC2Usq0chjsvlFF4uPvSU/edit?usp=sharing)).

Management has raised the idea of using the framework for Fakespot to measure potential Fakespot suggestion impressions before we spend a lot of time and resources implementing them, but there are a few technical problems.

The major problem is the framework expects keywords to be defined in a Nimbus variable, but there is a very large number of Fakespot keywords. The framework wasn't designed with such a large number in mind. Could we put tens or hundreds of thousands of keywords in the variable? According to Barret, there's no hard limit on the size of this particular variable, so maybe? But it definitely doesn't seem appropriate. (There's a hard limit for features that set `isEarlyStartup` and variables that use `setPref`, neither of which applies here.)

A minor problem is the framework does our usual simplistic exact keyword matching, but we may end up using fts5 for Fakespot. This isn't a huge problem on its own because we could always manually tokenize the suggestions data to generate a final list of keywords, and then give that final list to the framework and let it do exact matching.

Another minor problem is the framework is desktop only. I don't think we intend to do Fakespot on mobile right now, but we might in the future. Regardless of Fakespot, it would be nice if mobile could use as much of the framework as possible.

### Implementation

I'd like to re-implement the framework in the Rust component using remote settings for keywords. That will let us store arbitrarily large keyword sets, use the same matching strategy(ies) we use for real suggestions, and allow mobile to use it. We can't implement the entire framework in the component because telemetry and Nimbus will be platform specific (I think?), but we can implement the keywords and core matching parts.

My implementation strategy is:

* Introduce a "phantom" suggestion type. (I'm not married to the name.) The idea is that these suggestions pretty much behave like real suggestions and are matched the same, but they aren't intended to be shown to the user. We can introduce new types of phantom suggestions without making any code changes.
* In remote settings:
    * All phantom suggestion records have the type `phantom-suggestions`.
    * A record's attachment can hold one or more phantom suggestions.
    * Each suggestion has a `type` string and keyword data. The `type` is unique across phantom suggestions. The keyword data is TBD and will depend on how we do matching. It might be a list of keywords for exact matching (like AMP), it might be a list of keywords where we do prefix matching on the client (like Pocket and addons), it might be a list of titles that we tokenize on the client (if we use fts5 for example).
    * If a suggestion's keyword data is very large, the suggestion can be split up into multiple records to keep attachment sizes small. In that case, the suggestion in each attachment should use the same `type` to indicate the data represents the same logical suggestion.
* On the client:
    * The suggest component has a single `Phantom` provider and treats all phantom suggestions the same regardless of individual suggestion `type`. i.e., there's at most one details table for all phantom suggestions, not per-`type` tables, and ingest and matching is the same. (Although in the future I can imagine consumers being able to specify a matching strategy if that's helpful, e.g. exact, prefix, fts.)
    * Introduce an optional `SuggestionQuery` string member called `phantom_suggestion_type`. A query will return all phantom suggestions that match that type and the query's keyword along with all the usual suggestions that match the keyword.
    * The `Phantom` suggestion type returned to consumers includes a `matched_keyword` string whose value is the full keyword that matched the query. Hopefully identifying the full keyword remains possible with more sophisticated matching strategies we might implement.

### Consumers

The required one-time code changes in suggest consumers are:

* Add a Nimbus variable whose value will be the intended phantom suggestion `type` in each experiment.
* Add telemetry event(s) or other metrics for potential exposures/impressions.
* When the Nimbus variable is defined, pass it to queries using `phantom_suggestion_type`. When the suggest component returns a phantom suggestion, record the telemetry.

To test a new set of keywords, we would do this:

* Come up with a `type` name to describe the set. e.g., "yelp", "fakespot", "sports", "finance"
* Upload a new phantom suggestion to RS with the keywords and `type`. If the keyword set is very large, it can be split over multiple records as mentioned above.
* Create a new experiment and set the Nimbus variable to the `type`.
### Background and motivation

On desktop we recently implemented a potential-impressions experiment framework that lets us run Nimbus experiments to gather usage data on keywords without making any code changes (bug 1881873, [spec](https://docs.google.com/document/d/1Gdoj0eSRo2Qu1X90LbpX2SXC2Usq0chjsvlFF4uPvSU/edit?usp=sharing)).

Management has raised the idea of using the framework for Fakespot to measure potential Fakespot suggestion impressions before we spend a lot of time and resources implementing them, but there are a few technical problems.

The major problem is the framework expects keywords to be defined in a Nimbus variable, but there is a very large number of Fakespot keywords. The framework wasn't designed with such a large number in mind. Could we put tens or hundreds of thousands of keywords in the variable? According to Barret, there's no hard limit on the size of this particular variable, so maybe? But it definitely doesn't seem appropriate. (There's a hard limit for features that set `isEarlyStartup` and variables that use `setPref`, neither of which applies here.)

A minor problem is the framework does our usual simplistic exact keyword matching, but we may end up using fts5 for Fakespot. This isn't a huge problem on its own because we could always manually tokenize the suggestions data to generate a final list of keywords, and then give that final list to the framework and let it do exact matching.

Another minor problem is the framework is desktop only. I don't think we intend to do Fakespot on mobile right now, but we might in the future. Regardless of Fakespot, it would be nice if mobile could use as much of the framework as possible.

### Implementation

I'd like to re-implement the framework in the Rust component using remote settings for keywords. That will let us store arbitrarily large keyword sets, use the same matching strategy(ies) we use for real suggestions, and allow mobile to use it. We can't implement the entire framework in the component because telemetry and Nimbus will be platform specific (I think?), but we can implement the keywords and core matching parts.

My implementation strategy is:

* Introduce a "phantom" suggestion type. (I'm not married to the name.) [Edit: I ended up calling them "exposure" suggestions in the final implementation.] The idea is that these suggestions pretty much behave like real suggestions and are matched the same, but they aren't intended to be shown to the user. We can introduce new types of phantom suggestions without making any code changes.
* In remote settings:
    * All phantom suggestion records have the type `phantom-suggestions`.
    * A record's attachment can hold one or more phantom suggestions.
    * Each suggestion has a `type` string and keyword data. The `type` is unique across phantom suggestions. The keyword data is TBD and will depend on how we do matching. It might be a list of keywords for exact matching (like AMP), it might be a list of keywords where we do prefix matching on the client (like Pocket and addons), it might be a list of titles that we tokenize on the client (if we use fts5 for example).
    * If a suggestion's keyword data is very large, the suggestion can be split up into multiple records to keep attachment sizes small. In that case, the suggestion in each attachment should use the same `type` to indicate the data represents the same logical suggestion.
* On the client:
    * The suggest component has a single `Phantom` provider and treats all phantom suggestions the same regardless of individual suggestion `type`. i.e., there's at most one details table for all phantom suggestions, not per-`type` tables, and ingest and matching is the same. (Although in the future I can imagine consumers being able to specify a matching strategy if that's helpful, e.g. exact, prefix, fts.)
    * Introduce an optional `SuggestionQuery` string member called `phantom_suggestion_type`. A query will return all phantom suggestions that match that type and the query's keyword along with all the usual suggestions that match the keyword.
    * The `Phantom` suggestion type returned to consumers includes a `matched_keyword` string whose value is the full keyword that matched the query. Hopefully identifying the full keyword remains possible with more sophisticated matching strategies we might implement.

### Consumers

The required one-time code changes in suggest consumers are:

* Add a Nimbus variable whose value will be the intended phantom suggestion `type` in each experiment.
* Add telemetry event(s) or other metrics for potential exposures/impressions.
* When the Nimbus variable is defined, pass it to queries using `phantom_suggestion_type`. When the suggest component returns a phantom suggestion, record the telemetry.

To test a new set of keywords, we would do this:

* Come up with a `type` name to describe the set. e.g., "yelp", "fakespot", "sports", "finance"
* Upload a new phantom suggestion to RS with the keywords and `type`. If the keyword set is very large, it can be split over multiple records as mentioned above.
* Create a new experiment and set the Nimbus variable to the `type`.

Back to Bug 1893086 Comment 0