Closed Bug 1704846 Opened 4 years ago Closed 4 years ago

FOG: Implement `quantity` metrics in Firefox-on-Glean

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox90 --- fixed

People

(Reporter: nalexander, Assigned: chutten)

Details

(Whiteboard: [telemetry:fog:m6])

Attachments

(2 files)

Proposal for changing an existing or adding a new Glean metric type

Who is the individual/team requesting this change?

:nalexander, Install/Update team

Is this about changing an existing metric type or creating a new one?

It's about exposing an existing metric type, quantity.

Can you describe the data that needs to be recorded?

I wanted to record the exit status code for a command line program (a background task).

Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)

It'd be an integer between 0 and 127.

What is the business question/use-case that requires the data to be recorded?

We use exit codes to capture various failure states; it's important for our operational metrics to know what the various failure states we see in the wild are.

How would the data be consumed?

I guess we'd use GLAM to see proportions of failures to successes, and then to break down the types of failures.

Why existing metric types are not enough?

Well, they are, if you squint. We could convert the integers to strings, for example. Or we can do what I did, which is to make things coarser, and have two booleans, one for success and one for failure. But surely there are numbers that we actually want to record that aren't distributions or timespans or whatever.

What is the timeline by which the data needs to be collected?

I've worked around this so there's no timeline. But it's weird to see quantity in various docs, and in use in GeckoView, and then to find that it's not supported in Firefox-on-Glean.

We started allowing quantity everywhere a while back and I don't see a reason to not allow it in Firefox Desktop.
It can therefore be turned into an implementation bug (the RLB already supports it, so it's really just missing the implementation)

Component: Glean Metric Types → Telemetry
Product: Data Platform and Tools → Toolkit
Summary: Allow Glean `quantity` metrics in Firefox-on-Glean → FOG: Implement `quantity` metrics in Firefox-on-Glean
Whiteboard: [telemetry:fog:m6]
Priority: -- → P3

(In reply to Jan-Erik Rediger [:janerik] from comment #1)

We started allowing quantity everywhere a while back and I don't see a reason to not allow it in Firefox Desktop.
It can therefore be turned into an implementation bug (the RLB already supports it, so it's really just missing the implementation)

In addition to this, I wonder if there is use for a error_code metric type. Something that is a sparse distribution of numbers, that carries along meaning for some specific numbers... something like

avif:
  decode_error:
    type: error_code
    description: The error code returned by `libavif`.
    ...
    codes:
      0: success
      1: generic_error
      127: invalid_dimensions
      255: stream_failure

(This here? This is scope creep)

This'd be a specialization of labeled_counter, more or less, I guess? Anyhoo, just an idea. Implementing quantity is the obvious way forward.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: P3 → P1

(In reply to Chris H-C :chutten from comment #2)

(In reply to Jan-Erik Rediger [:janerik] from comment #1)

We started allowing quantity everywhere a while back and I don't see a reason to not allow it in Firefox Desktop.
It can therefore be turned into an implementation bug (the RLB already supports it, so it's really just missing the implementation)

In addition to this, I wonder if there is use for a error_code metric type. Something that is a sparse distribution of numbers, that carries along meaning for some specific numbers... something like

avif:
  decode_error:
    type: error_code
    description: The error code returned by `libavif`.
    ...
    codes:
      0: success
      1: generic_error
      127: invalid_dimensions
      255: stream_failure

(This here? This is scope creep)

This'd be a specialization of labeled_counter, more or less, I guess? Anyhoo, just an idea. Implementing quantity is the obvious way forward.

I quite like this proposal! Can I can get my hands on the defined codes: from consumers in some way? If not, then I have a "multiple sources of truth" problem, which isn't novel but would be pleasant to avoid.

(In reply to Nick Alexander :nalexander [he/him] from comment #3)

I quite like this proposal! Can I can get my hands on the defined codes: from consumers in some way? If not, then I have a "multiple sources of truth" problem, which isn't novel but would be pleasant to avoid.

I'm not sure I understand "Can I [...] get my hands on the defined codes: from consumers in some way?". If you mean "will glean_parser generate enums in JS, C++, and Rust for me to use" then, boy howdy would we ever! (Firefox Telemetry does this in C++ for histograms of kind: "categorical" after all, and we wouldn't want to be left behind).

And if you're wondering if we have a bug about this, we do. No timeline on implementation, though having concrete use cases lined up to use it really helps us prioritize work.

Now, if you mean the reverse (that the code already uses an existing enum and you want Glean to figure out what it means), then we have a bit of a trial ahead. We could (for strongly-typed languages) generate APIs that take only the specific enum type. But then we'd need some way of knowing what each of those enum values mean.

For example, NS_OK is 0 and NS_ERROR_FAILURE is 2147500037. We could totally write an API that in Rust and C++ takes an nsresult and in JS takes an... int? nserror? Something clever, I'm sure. But at some point we need to know that 2147500037 maps to NS_ERROR_FAILURE. Either on the client, so we can map from the value to the string and then store and transmit the string. Or on the pipeline, so we can store and transmit the value, but map it to the string in tooling.

I'm not sure how to instruct a Python codegen step to read Gecko C++ to generate a map from nsresult value to a string. And that's for nsresult where every enum value is specified. I'm not sure how to teach Python how to read C++ enum declaration syntax. Probably a surmountable problem (I haven't googled "How do I get the enum name from its value C++" yet). But probably not easy (I don't wanna think about generated enums. Need to run the preprocessor first, ye gads).

But having a single source of truth is crucial. Otherwise someone'll make a mistake, or we'll add a new value, and it all falls apart.

The advantage of codes above is that anything that isn't in the codes list gets mapped to __other__ so at least you know when you need to update your second source of truth : |

IMO we should copy that discussion to the Labeled Metrics with staticly-defined Labels bug. It's pretty good reasoning why we would want it.


This bug however is about exposing what we already have, and :chutten has that soon ready.

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c007d7b505a9 Add Quantity metric type to FOG (Rust) r=janerik https://hg.mozilla.org/integration/autoland/rev/ed8e2f8d8f6c Add Quantity metric type to FOG (C++, JS) r=janerik

Backed out 2 changesets (bug 1704846) for backout conflict with Bug 1685402. CLOSED TREE

Backout:
https://hg.mozilla.org/integration/autoland/rev/1fc80942c629854899e60dc5d8aef0d55d32a6e8

Flags: needinfo?(chutten)
Flags: needinfo?(chutten)
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fce291ab0507 Add Quantity metric type to FOG (Rust) r=janerik https://hg.mozilla.org/integration/autoland/rev/7ddd2af76d34 Add Quantity metric type to FOG (C++, JS) r=janerik
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: