Closed Bug 1848201 Opened 2 years ago Closed 1 year ago

Introduce API to allow setting an experimentation ID

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: csadilek, Assigned: travis_)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

We're in the process of integrating Nimbus into many more of our applications, following the new Nimbus on the Web architecture, which relies on server-side integration and a request to Cirrus. This request carries a unique ID and targeting context, and returns a set of active features. The unique ID cannot be Glean's client_id, because it is not currently exposed to consuming applications. It is also not desirable to expose Glean's client_id due to data integrity concerns, and to prevent accidental misuse.

Applications therefore need to generate (or derive) a new unique ID used for experimentation/enrolment, or rely on an existing one such as the Firefox Account ID. Defining this ID in our applications is desired anyway as it allows for more advanced use cases such as running a single experiment across multiple applications.

This then prompts the requirement to include the new experimentation ID in all recorded Glean events. Otherwise, partitioning of data (or experiment analysis more generally) becomes impossible. In our Firefox clients, which rely on client-side Nimbus integration, the Nimbus SDK calls Glean.setExperimentActive(experiment) to achieve this connection. However, in this new integration scenario the clients are unaware of experiment details.

In our discussions, we concluded that adding some minimal new API surface to the Glean SDK would be ideal. If we left it up to each individual client development team to "manually" add this ID to all existing and future events, we would very likely end up with diverging names and implementation gaps, which would negatively impact data quality and therefore impede experimentation and analysis.

The API discussed so far was a simple call e.g., glean.set_experimentation_id(id) to be used on the client.

We will have a follow-up discussion to verify if we need server-side API, but felt that divergence is less of a concern there e.g., it's much easier to fix and roll out changes. NB: Since server-side logic for recording events will likely run in the context of a user session, we can't rely on a "global" call to glean.set_experiment_id. The client-side solution seems simple enough to address our biggest concerns, and we will discuss and file a follow-up enhancement for the server, if needed.

Assignee: nobody → tlong
Priority: -- → P1

Christian, I have a couple of follow up questions in regards to this.

First, what is the expected persistence of this information? Will the application set this with every execution or is it expected that Glean will persist this information once it is set?

Secondly, what is the expectations around the format of the identifier? Will this always be a UUID or does there need to be more flexibility for other forms of identifiers?

Flags: needinfo?(csadilek)

First, what is the expected persistence of this information? Will the application set this with every execution or is it expected that Glean will persist this information once it is set?

Outside of any pending pings, I don't see a need for Glean to store this ID separately. I think client applications should set this ID as part of Glean initialization on startup / on load.

Secondly, what is the expectations around the format of the identifier? Will this always be a UUID or does there need to be more flexibility for other forms of identifiers?

I think we should keep this more flexible. We have use cases for experiments running across multiple applications where we'll perhaps use (or derive) an ID from the Firefox Account. Would it be acceptable for this ID to just be a String? Looks like Cirrus defines it as a String too.

Please let me know if you disagree or have any concerns. Happy to discuss more!

Flags: needinfo?(csadilek)

Thanks Christian! I think that answers my questions and should be everything I need to know as I'm working on the implementation of this.

Blocks: 1850323
Blocks: 1850479
Attachment #9350600 - Flags: data-review?(chutten)

Comment on attachment 9350600 [details]
Data Collection Request

PRELIMINARY NOTES:

As an identifier, will this be sent with other identifiers? Will this bridge Cat3+ data with identifiers that aren't allowed to be used to reach that data?

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection can be controlled through the product's preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Travis Long is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.


Result: datareview+

Attachment #9350600 - Flags: data-review?(chutten) → data-review+
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: