New Metric Type: "Surface" aka 2D Distributions aka ...
Categories
(Data Platform and Tools Graveyard :: Glean Metric Types, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: chutten, Assigned: chutten)
References
(Blocks 1 open bug)
Details
(Whiteboard: [telemetry-parity])
Attachments
(4 files)
Proposal for changing an existing or adding a new Glean metric type
Who is the individual/team requesting this change?
:chutten, for Project FOG and on behalf of generic use cases
Is this about changing an existing metric type or creating a new one?
Creating a new (perhaps compound) metric type
Can you describe the data that needs to be recorded?
To answer a question like "How many tabs does a Firefox usually have open?" we need buckets that are tab counts and whose values are a timing distribution. (for this specific question we could get away with the bucket values being timespans instead, but as soon as you know how many tabs Firefox usually has open you'll want to know if that time was spent all at once or in several small pieces over the measurement window). This is a use case for continuous-continuous surfaces where the number of tabs is a continuous distribution and the timing samples for each tab count are continuous distributions, and this is approximated in Firefox Telemetry by some really odd uses of keyed exponential histograms. (And there's MEMORY_DISTRIBUTION_AMONG_CONTENT which probably wishes it was a surface)
(And then there's VIDEO_SUSPEND_RECOVERY_TIME_MS and VIDEO_HIDDEN_PLAY_TIME_PERCENTAGE and friends which are discrete-continuous-continuous volumes)
Other data that could be handled by this sort of construct are those described in bug 1657470 and bug 1657473. They describe concrete use-cases of discrete-discrete surfaces (here meaning that both axes are discrete/discontinuous/categorical as opposed to being continuous (yes, our timing/memory distributions are actually discrete but they're approximating a continuous distribution)).
Keyed Histograms in Firefox Telemetry are an example of data that could be described by a discrete-continuous surface. (whether this is a good idea or not is not an evaluation I am prepared to make at this time).
Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)
For a session where Firefox has some distribution of times with 0 tabs, 1 tab, etc, it might look like
time_spent_with_tab_count: {0: {1: 23, 2: 42, ...}, 1: {1: 56, 2: 67, ...}, ...}
What is the business question/use-case that requires the data to be recorded?
Various, but browser engagement is the original driver.
How would the data be consumed?
GLAM could do some wicked-neato datavis stuff with surfaces, I'm sure, but I assume initial consumption will be of summary statistics of the values in each bucket displayed on something like re:dash. UDFs will almost certainly need to be written to help manage this.
Why existing metric types are not enough?
There does not exist anything that handles this well in either the Glean SDK or Firefox Telemetry.
What is the timeline by which the data needs to be collected?
Unknown.
| Assignee | ||
Updated•4 years ago
|
Comment 1•3 years ago
|
||
We have a use-case that I believe would be well suited to a discrete-continuous surface. i.e. a keyed timing distribution.
We are measuring performance timings within the network component.
But because each network request is prioritized (for example, trackers are given very low priority) it would be ideal to have the timings keyed on their classOfService rather than having all values merged.
It's not a performance concern if low-priority tracking resources are slow (in fact it's intentional), so we'd prefer to see the distributions independently grouped by priority. (There are currently eleven classOfService flags).
Comment 2•3 years ago
|
||
Another use-case in Necko is breaking down performance metrics by protocol version.
i.e. timing the same sequence, but under http1/http2/http3.
While we could make three probes for each measurement, a keyed histogram would make development and analysis easier.
| Assignee | ||
Updated•1 year ago
|
Comment 3•1 year ago
|
||
Comment 4•1 year ago
|
||
| Assignee | ||
Comment 5•1 year ago
|
||
| Assignee | ||
Comment 6•1 year ago
|
||
| Assignee | ||
Comment 7•1 year ago
|
||
There are two remaining steps to add this new metric type to the data platform, as documented here: https://mozilla.github.io/glean/dev/core/new-metric-type/platform.html
ni?myself to get to those today.
| Assignee | ||
Comment 8•1 year ago
|
||
| Assignee | ||
Comment 9•1 year ago
|
||
Actually, I'm going to skip adding labeled_{custom|memory|timing}_distribution support to lookml-generator at this time as I don't think there's a way for me to add support in a way that'd allow people to use data from those metric types correctly.
Comment 10•1 year ago
|
||
Hi Chris, looking at the patches it seems like labeled_timing_distribution's may be ready for use quite soon.
Is that right?
This would be very useful to us, e.g. bug 1907418 and bug 1908234.
| Assignee | ||
Comment 11•1 year ago
|
||
| Assignee | ||
Comment 12•1 year ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #10)
Hi Chris, looking at the patches it seems like
labeled_timing_distribution's may be ready for use quite soon.
Is that right?
This would be very useful to us, e.g. bug 1907418 and bug 1908234.
Depends on your definition of "quite soon", but it will certainly be sooner rather than later. The work between here and being useful in mozilla-central for bugs like 1907418 and 1908234 are:
- Metric implementation review and landing: https://github.com/mozilla/glean/pull/2896
- It's a little chunky of a review because it required changing how all
labeled_*metrics are constructed. Jan-Erik has already started looking at it. - After this is done, this bug will be resolved.
- It's a little chunky of a review because it required changing how all
- New Glean SDK release
- This is usually routine
- We're expecting to craft this release this week
- Vendor the new Glean SDK release into mozilla-central
- This is usually routine
- Expose the new metric types in Firefox on Glean so they're usable in Firefox
- This is necessary because e.g. Firefox is IPC-aware and has gecko-specific datatypes and requires C++ and JS APIs in addition to Rust.
- This is usually routine. This one may be a little more complex because it's labeled.
- I've already begun work on this using the under-review implementation from step 1, and will continue working on this in parallel
- You can follow this work in, and optionally block your instrumentation bugs on, bug 1907945. This bug will be closed after Step 1
So we're definitely a lot closer to them being useful than we were just a little while ago. But there's still non-trivial amounts of work to be done, even if it's mostly routine.
Comment 13•1 year ago
|
||
Thank you; we're happy to wait as this will greatly simplify our collection.
| Assignee | ||
Comment 14•1 year ago
|
||
chutten merged PR [mozilla-services/mozilla-pipeline-schemas]: bug 1657947 - New Glean metric types labeled_{custom|memory|timing}_distribution (#821) in 01ceb83.
(this actually landed 3 days ago)
| Assignee | ||
Comment 15•1 year ago
|
||
chutten merged PR [mozilla/glean]: Bug1657947 New metric types: labeled_{custom|memory|timing}_distribution (#2896) in fd68d93.
This concludes the implementation work of these new metric types in the Glean SDK (Rust bindings only). Work will continue with cutting a release and vendoring it into m-c (no bug yet), then adding Firefox Desktop-specific APIs, features, docs, and tests to make it usable (bug 1907945) (I've already got labeled_custom_distribution working against a local release of the Glean SDK, so progress is chugging along nicely).
Updated•6 months ago
|
Description
•