Open Bug 2043535 Opened 14 days ago Updated 3 days ago

Support the distribution name reset

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P1)

enhancement

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: baku, Assigned: chutten)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

There is a need to reset the distribution name stored in Glean's internal persistent storage (see bug 2027788).
At the moment, this is not possible. The only available workaround is to update the distribution name to an empty string.
This bug proposes either introducing a reset_distribution API or allowing None as a valid value for the name parameter in update_distribution.
I'm not fully aware of all the implications of this change, so if this approach is not appropriate, please let me know.

Type: defect → enhancement

original proposal for that API: https://docs.google.com/document/d/1TIZhpBeZcJSEnIZJwj0Cj9saL2Tfs0X6oi_4gFU35eM/edit?tab=t.0
Notably includes a short discussion on clearing the value.
From the doc:

Updating an attribution field with a None value will leave any existing value in-place. Attribution fields are stored persistently as though they have user lifetime.

and attached discussion:

jer: Do we need a way to unset them though? Can attribution change over (client-)time?
chutten: It would simplify testing, but outside of that I don't believe so. Except to perhaps (accidentally?) identify problems in the service maintaining attribution data between runs
jrconlin: What about in the case of an error? (e.g. a value was specified that was later determined to be incorrect or misapplied?)
chutten: That's true. We might discover through data a mistake in attribution instrumentation then ship an improved algorithm in a later version. We need to be able to overwrite later, and a subset of that is being able to clear.

seems we're at that stage now where we have a use case for clearing.

cc :chutten for visibility

Flags: needinfo?(chutten)

Yup, we can get this added. When (which versions, dates) do you need it for?

Flags: needinfo?(chutten) → needinfo?(amarchesini)

Luckily, KPI queries are built on top of legacy telemetry. But, soon is better to keep Glean and Legacy telemetry in sync after the MozillaOnline user migration.

Flags: needinfo?(amarchesini)

Okay, so design time.

Current Status:

In the Glean SDK, update_{att|dist}ribution uses None fields to signal "We're only updating some of theses fields (the ones with Some(...)). The others (the ones with None) we are leaving alone."

In FOG, the JS FOG API is presently using xpidl's behaviour of defaulting to void/empty nsCString for unsupplied arguments. FOG then coerces empty nsCString values to None.

Options

  1. We could be a little more clever about FOG's JS API design for update{Att|Dist}ribution to support updating specific fields to an empty string "".
    • This doesn't satisfy the request, as it wouldn't clear values in a way that would turn into NULL in SQL, but would be clear semantically.
    • But at that point we might as well set "<not MozillaOnline any more>" as it'd have the same effect (a sentinel value).
    • Fixing the API might be a reasonable task outside of this request, as the API is a little clunky and does artificially restrict the acceptable values by treating "" as a sentinel for None.
  2. Add clear_{att|dist}ribution which clears the stored values.
    • Easy to understand and straightforward to use.
    • Necessarily makes the attribution and distribution fields act different from typical string metrics (which cannot be cleared). Cannot be implemented against the public metric API.
  3. Change the meaning of None in the update_{att|distr}ibution API to mean "please clear this field".
    • Uses the existing API
    • Breaking change
    • How would we enable partial updates? (Do we even need that lever?)
  4. Actually, come to think on it, setting "<not MozillaOnline any more>" (or a more sensible indicator) would be quite a good thing to identify this population.

:baku, do you have opinions on which of these options (or another of your choice) would best suit your use case?

Flags: needinfo?(amarchesini)

I like option 2. It's "self-contained" (no impacts on existing methods), is easy to understand and to use. But anything works for me. Thanks!

Flags: needinfo?(amarchesini)
Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: -- → P1
Depends on: 2046193
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: