Implement the new format for Event extra keys
Categories
(Data Platform and Tools :: Glean: SDK, task, P1)
Tracking
(Not tracked)
People
(Reporter: Dexter, Assigned: janerik)
References
Details
Attachments
(6 files, 1 obsolete file)
107 bytes,
text/x-google-doc
|
Details | |
42 bytes,
text/x-github-pull-request
|
Details | Review | |
48 bytes,
text/x-github-pull-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
This is a bug for implementing the recommendations from the proposal in bug 1613944 in the Glean SDK.
Reporter | ||
Comment 1•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Comment 3•3 years ago
|
||
Travis just linked me to this during a tangentially related conversation; AIUI the proposal is that existing probes will continue emitting extras
indefinitely, which means that data science will have to look in either place depending on the specific key we're looking for. I think there's an opportunity to make that simpler.
I wonder about something like:
- As mentioned in the proposal, emit all defined
extras
as string-typedevent_properties
instead. - Stop emitting
extras
in pings altogether. - In the compatibility views for the events tables, replace each event's
extras
withCOALESCE(extras, event_properties.string)
- Tell data scientists to use event_properties for everything and deprecate extras.
May not be worth the cost, but maintaining two separate ways to define and retrieve a string key associated with an event will increase the cognitive burden of working with events.
Assignee | ||
Comment 4•3 years ago
|
||
Thanks Tim for the feedback. I fully agree that as it currently stands it brings more challenges to analysis than it solves.
On top of that it has some major implementation problem. I'm posting an update on this later today.
I think this needs some re-evaluation before we can move on with it.
Assignee | ||
Comment 5•3 years ago
|
||
So as promised: I looked into implementing this and it's hard.
I tried to summarize the problematic points in this document.
I'd like your feedback there. Did I miss a way to implement this? Do you think any of the dismissed options is viable?
I added a bunch of open questions already and your feedback there is welcome too.
However I think if we start answering those we might bump this proposal back into the process to figure out the answers.
Assignee | ||
Comment 6•3 years ago
|
||
(Oh note from my side: I'm away after tomorrow, so will only pick up the discussion once I get back 2 weeks later)
Comment 7•3 years ago
|
||
It sounds to me like we should, at least for now, split the work into "Client API" changes and "Payload" changes.
There appear to be no insurmountable problems (only bikesheds) in the proposed API. The benefits are not in dispute. Everyone's aligned with the designed approach. Work on this can and should proceed.
Payload-side we have difficult problems, bikesheds, disputed benefits, and misalignment. So we need a proposal to move these forward.
(but maybe I'm biased because I only wanted the API changes in the first place)
Comment 8•3 years ago
|
||
Is it the case that we can make the API changes, and defer the payload/database/analysis changes (and at least be no worse off than we are no over there)? Or is understanding that fully likely to feed back to changes we need to make to the API?
Comment 9•3 years ago
|
||
Only one question, otherwise I'm onboard with your proposal... Since the payload/schema changes are the painful bit, would it be less painful to wait on this for schema-v2
, or is there some pressing need to have it before then?
Reporter | ||
Comment 10•3 years ago
|
||
(In reply to Travis Long [:travis_] from comment #9)
Only one question, otherwise I'm onboard with your proposal... Since the payload/schema changes are the painful bit, would it be less painful to wait on this for
schema-v2
, or is there some pressing need to have it before then?
I second what Travis said.
I'm fine with pursuing the client API changes if that's at all possible without the schema change. Note that Bea should chime in on the API design as well, IMHO, given the consistency we should have across the Glean clients.
Comment 11•3 years ago
|
||
Sorry for the delay in responding here.
My main concern is the proposed API changes. I think the proposed API works nicely for languages that support default arguments and key=value arguments, but not otherwise. Rust doesn't support either and Javascript doesn't support key value arguments.
In Javascript the API would look like:
app.opened.record(new AppOpenedExtra({ tabCount: 10 }));
Not really sure what the purpose of the AppOpenedExtra
class would be in this case.
In Rust, I see in [:chutten]'s comment that it would look like:
app::opened.record(AppOpenedExtra{tabCount: 10})
What happens when there are multiple extras? Is it something like this?
app::opened.record(AppOpenedExtra{tabCount: 10, oneMoreExtra: None, andAnotherOne: None})
I wonder if a slight spin on [:chutten]'s suggested builder pattern API would work:
app::opened
.withQuantityExtra("tabCount", 10)
.withStringExtra("oneMoreExtra", "hello")
...
.record()
Assignee | ||
Comment 12•3 years ago
|
||
(In reply to Beatriz Rizental [:brizental] from comment #11)
Sorry for the delay in responding here.
My main concern is the proposed API changes. I think the proposed API works nicely for languages that support default arguments and key=value arguments, but not otherwise. Rust doesn't support either and Javascript doesn't support key value arguments.
In Javascript the API would look like:
app.opened.record(new AppOpenedExtra({ tabCount: 10 }));
I think JS should be what it is now, exactly because it can use "anonymous" objects easily. It's already the most convenient API.
app.opened.record({ tabCount: 10 });
Not sure about ensuring the right names though, but that's something we have to live in JS I guess.
I think I should write out all the APIs first. I want the most convenient APIs per language, not necessarily the closest match between languages.
In Rust, I see in [:chutten]'s comment that it would look like:
app::opened.record(AppOpenedExtra{tabCount: 10})
What happens when there are multiple extras? Is it something like this?
app::opened.record(AppOpenedExtra{tabCount: 10, oneMoreExtra: None, andAnotherOne: None})
or
app::opened.record(AppOpenedExtra{tabCount: 10, .. Default::default() })
but yes, not optimal. The builder pattern could make this better.
I wonder if a slight spin on [:chutten]'s suggested builder pattern API would work:
app::opened .withQuantityExtra("tabCount", 10) .withStringExtra("oneMoreExtra", "hello") ... .record()
That would require the user to use the right extra type and not mistype the extra name. IMO that makes the API worse.
What would work is this:
app::opened.record(Extras::new().with_tab_count(10).with_one_more_extra("hello"))
Not tested and might need even more code generation and new (wrapper) types:
app::opened.with_tab_count(10).with_one_more_extra("hello").record();
I like the first one better.
I'll take this to write out the APIs I see per language and then we can discuss that.
Comment 13•3 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #12)
I think JS should be what it is now, exactly because it can use "anonymous" objects easily. It's already the most convenient API.
app.opened.record({ tabCount: 10 });
Not sure about ensuring the right names though, but that's something we have to live in JS I guess.
I think I should write out all the APIs first. I want the most convenient APIs per language, not necessarily the closest match between languages.
In Typescript specifically it's possible to ensure the names. Even in Javascript the user's IDE might pick up the fields declared in the types declarations and provide some code completion in this regard. Same for type checking per property.
[...]
or
app::opened.record(AppOpenedExtra{tabCount: 10, .. Default::default() })
Ah, of course. I forgot about Default
.
but yes, not optimal. The builder pattern could make this better.
[...]
That would require the user to use the right extra type and not mistype the extra name. IMO that makes the API worse.
I agree. It's the extra code generation that comes with the "one method per extra" that made me suggest this, though.
What would work is this:
app::opened.record(Extras::new().with_tab_count(10).with_one_more_extra("hello"))
Not tested and might need even more code generation and new (wrapper) types:
app::opened.with_tab_count(10).with_one_more_extra("hello").record();
I like the first one better.
I'll take this to write out the APIs I see per language and then we can discuss that.
Thanks!
Assignee | ||
Comment 14•3 years ago
|
||
Here we go: API showcases per language
ni?travis for Swift & Kotlin
ni?bea for JS
ni?chutten for C++/JS/Rust
ni?mdboom for Python
ni?dexter for Kotlin
If we agree on the APIs we can implement them one by one.
The payload will be unchanged.
All values will be converted to strings and stored that way.
We will only support integers, strings and maybe UUIDs (if the language has a type for that).
I think it's implementable mostly backwards compatible.
Comment 15•3 years ago
|
||
Rust:
- I'm not a fan of the initializer pattern
let extra = metrics::views::LoginOpenedExtra::new(1, None);
, and builder looks verbose. - I think the type's good enough
C++
- For
mozilla::glean::views::login_opened.Record({ .tabCount = 10 });
will it really implicitly constructSome(10)
for you? Looking at Maybe.h, I'm not sure what ctor it's using. - Even if we need to
{ .tabCount = Some(10) }
it's still fairly ergonomic (especially compared to the existingnsTArray<Tuple<ExtraKey, nsCString>>
) - Do we know if
mozilla::Maybe
FFIs nicely toOption
or something? We may be saved from parallel arrays yet!
JS (FOG)
- wrt the at-runtime type checks, we'll need to generate transformation code to get from the
jsval
into something either the C++ API can speak or something the FFI can speak. We can put the checks in there, returningNS_ERROR_INVALID_ARG
if things go sour.- Ideally we'll transform from
jsval
toLoginOpenedExtra
so we get back in line with "All JS APIs implemented atop the C++ APIs".
- Ideally we'll transform from
- Alternatively we could try to make it a
class
act, generating some JS types (or xpidl types) to use... but that's not something I have much experience with so maybe it's best to not do that.
I'm happy with these however they fall out. I only have quibbles and thoughts I'm thinking out loud.
Comment 16•3 years ago
|
||
// Probably some code that can check the object at runtime?
It's unclear to me how that would look. Runtime type checks, at least for Glean.js, happen in the recording function. I guess that would be the same for type checks related to the event extras. We could potentially record errors when an invalid value is passed?
For Typescript generated code we can do something like:
class EventMetricType<T extends ExtraMap = {}> extends MetricType {
record(extra?: T): void { ... }
}
And then generated code would look like:
const myMetricType = new EventMetricType<{ tabCount: number }>(...);
Pure Javascript I am not sure that is possible, maybe we can always provide type declarations even for Javascript generated code so that IDE's can do some auto completion...
Anyways, just rambling at this point. I am satisfied with this API, the caveats I mention above can be discussed at implementation time.
Assignee | ||
Comment 17•3 years ago
|
||
(In reply to Chris H-C :chutten from comment #15)
Rust:
- I'm not a fan of the initializer pattern
let extra = metrics::views::LoginOpenedExtra::new(1, None);
, and builder looks verbose.- I think the type's good enough
Yeah, though it requires specifiying None
for unset values. The good thing is: It could be extended with the builder pattern if we ever want to.
C++
- For
mozilla::glean::views::login_opened.Record({ .tabCount = 10 });
will it really implicitly constructSome(10)
for you? Looking at Maybe.h, I'm not sure what ctor it's using.- Even if we need to
{ .tabCount = Some(10) }
it's still fairly ergonomic (especially compared to the existingnsTArray<Tuple<ExtraKey, nsCString>>
)
I actually need to test it with mozilla::Maybe
, so far I only tested it with std::optional
(because easier to do). I'll check that once more.
- Do we know if
mozilla::Maybe
FFIs nicely toOption
or something? We may be saved from parallel arrays yet!
We still convert it to the (int, char*)
list before passing it over FFI, because that API exists and allows an arbitrary number of extras already. We can't generate FFI per event.
JS (FOG)
- wrt the at-runtime type checks, we'll need to generate transformation code to get from the
jsval
into something either the C++ API can speak or something the FFI can speak. We can put the checks in there, returningNS_ERROR_INVALID_ARG
if things go sour.
- Ideally we'll transform from
jsval
toLoginOpenedExtra
so we get back in line with "All JS APIs implemented atop the C++ APIs".- Alternatively we could try to make it a
class
act, generating some JS types (or xpidl types) to use... but that's not something I have much experience with so maybe it's best to not do that.
Same here, I think the best is to generate conversion code in JS itself and pass that into the available APIs.
As this is done in the template I'm less worried about it.
(In reply to Beatriz Rizental [:brizental] from comment #16)
// Probably some code that can check the object at runtime?
It's unclear to me how that would look. Runtime type checks, at least for Glean.js, happen in the recording function. I guess that would be the same for type checks related to the event extras. We could potentially record errors when an invalid value is passed?
If we can generate better types for TypeScript that would be awesome.
For m-c however we can't do that, so conversion functions it is.
The user-facing API will still look the same, it changes when errors are visible I guess.
Comment 18•3 years ago
|
||
Python looks fine. (Just nit s/Option/Optional/
). We could use dataclasses instead in the future for a little more type-checking automation, but that requires Python 3.7, and we still support 3.6. I kinda prefer the non-magical approach in your example anyway.
Comment 20•3 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #14)
Here we go: API showcases per language
ni?travis for Swift & Kotlin
ni?bea for JS
ni?chutten for C++/JS/Rust
ni?mdboom for Python
ni?dexter for KotlinIf we agree on the APIs we can implement them one by one.
The payload will be unchanged.
All values will be converted to strings and stored that way.
We will only support integers, strings and maybe UUIDs (if the language has a type for that).
I think it's implementable mostly backwards compatible.
LGTM!
Assignee | ||
Updated•3 years ago
|
Comment 21•3 years ago
|
||
Comment 22•3 years ago
|
||
Assignee | ||
Comment 23•3 years ago
|
||
This drops the old API fully and implements support only for the new object-focused API,
regardless of type
defintions in metrics.yaml.
The new API will look like the following examples:
Rust
let extra = AnEventKeys {
extra1: Some("a-value".into()),
..Default::default()
};
category::an_event.record(extra);
(Note: not optimal yet, but we can extend this with a builder-like
pattern later)
C++
AnEventExtra extra = { .extra1 = Some("value"_ns) };
category::an_event.Record(Some(extra));
JavaScript (actually unchanged!)
let extra = { extra1: "value" };
Glean.category.anEvent.record(extra);
(Note: The JavaScript API accepts strings, booleans and integers for any
extra value and will stringify them regardless of their specified type.)
Assignee | ||
Comment 24•3 years ago
|
||
Assignee | ||
Comment 25•3 years ago
|
||
Assignee | ||
Comment 26•3 years ago
|
||
Depends on D116376
Updated•3 years ago
|
Comment 27•3 years ago
|
||
Pushed by jrediger@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/47c9e72b002e Update to Glean 39.0.0. r=chutten https://hg.mozilla.org/integration/autoland/rev/42a4427ba9e6 Update glean-parser to v3.4.0. r=chutten https://hg.mozilla.org/integration/autoland/rev/b64994205a5e Implement new event extra API. r=chutten
Comment 28•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/47c9e72b002e
https://hg.mozilla.org/mozilla-central/rev/42a4427ba9e6
https://hg.mozilla.org/mozilla-central/rev/b64994205a5e
Description
•