Open Bug 1943642 Opened 1 month ago Updated 5 days ago

Implement JOG Dynamic Metric Registration (suitable for built-in addons)

Categories

(Toolkit :: Telemetry, enhancement)

enhancement

Tracking

()

People

(Reporter: mconley, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [hnt-trainhop])

We will need the ability to register new Glean metrics at runtime as part of the nascent effort to package New Tab as a built-in addon (and to allow it to update using the addon update mechanism).

Thankfully, it seems that the Glean gang already thought about this and designed an approach in bug 1662863. Now we just need to dust off that design and build it.

Having spoken with chutten briefly about this in late 2024, it's likely that the Glean folk already have their quarters work cut out for them, which means that New Tab folk are likely going to be working on this implementation (with some coaching).

Summary: Implement FOG Dynamic Metric Registration (Web Extensions, Build Faster, Artefact Builds) → Implement JOG Dynamic Metric Registration (built-in addons)

Actually... is there anything to do here? I'm digging through bug 1698184, and it looks like this was already implemented? https://searchfox.org/mozilla-central/rev/f41e8cacb8e8ce254204e2526e98d49d024f1674/toolkit/components/glean/tests/xpcshell/test_JOG.js shows JOG in action.

chutten, am I misunderstanding? What's left to do for a built-in addon to take advantage of this?

Flags: needinfo?(chutten)
Flags: needinfo?(chutten)
Summary: Implement JOG Dynamic Metric Registration (built-in addons) → Implement JOG Dynamic Metric Registration (suiltable for built-in addons)
Summary: Implement JOG Dynamic Metric Registration (suiltable for built-in addons) → Implement JOG Dynamic Metric Registration (suitable for built-in addons)
Blocks: 1938445

Well, it's this heading, and that design doc dates from a time before the Glean Dictionary, but, yes. What's needed is:

  1. A way for the addon to specify to JOG the metrics it intends to use (use glean_parser to generate the necessary jogfile-alike. Build in a means to pass that information at the appropriate time (and hopefully not on the main thread) to JOG)
    • This will ensure that FOG will know what's coming and be able to store and report the data when it does
  2. An API layer for the addon to use like Glean.myCategoryName.myMetricName.someOperation(args);
    • If this addon's running in a jscontext that has the Glean and GleanPings globals already, then great! Lucky builtin addon privilege. FOG should take care of this for you via namedgetters and webidl.
    • If not, you'll need to generate an API (likely in a way not dissimilar to how glean.js does it) for the addon to use which, behind the scenes, marshals the code to pierce its way over to a jscontext that does have the Glean and GleanPings globals to actually service the calls.
  3. An approach for explaining to the pipeline what sort of "thing" this addon is. Is it a library? An application? Each of the defined metrics will need columns in each of the defined pings' tables... what are they going to be named and where are they going to live?
    • Otherwise we'll go to all this effort and the data will either be dropped (unrecognized ping name) or stuffed into additional_properties (data coming in for an unrecognized metric name)
  4. A testing approach.
    • Maybe we get clever and build the addon standalone against glean.js for integration testing?
    • Maybe we use the jogfile-alike to autogen mocks for unit testing?
  5. Documentation

...that may be an incomplete list, but it's what came to mind when I tried to recall our conversation from last year on the subject.

Depends on: 1947194

The world has changed sufficiently since Comment #3 that it's not as helpful as it would've been. Consider this comment as a replacement.

Why? Because the newtab addon will remain in m-c and will ship in Nightly, and its only "addon-ness" from Glean's point of view will be that it can also be packaged and pushed to older versions of Firefox Desktop. This significantly simplifies the pipeline side of things because

  • metrics and pings for newtab will be defined and land in Nightly as per usual. The pipeline will happily accept data for those metrics and pings from earlier versions.
  • the newtab addon does not need to be considered as its own project or library or application with its own dataset namespace. It is currently and will remain a part of firefox_desktop.

There's still work to be done, mind you, but it's predominantly client-side work. In fact, as far as JOG, FOG, and Glean is concerned, the only task is:

  • Design, implement, and test a production API for runtime registration of metrics (and likely pings).
    • We have Services.fog.testRegisterRuntime{Metric|Ping} as templates, but the test prefix means that these APIs do not meet the performance, (crash) stability, or (API evolution) stability characteristics of the rest of the Glean API. They might be slow. They might crash. They might throw. And we might change the number, types, and meaning of the parameters at any time.
    • We need a production-ready non-test-prefixed version of this API for Firefox Desktop privileged system JS within the newtab builtin addon to call in situations when the addon isn't running against a version of Firefox Desktop that has that metric built in.

This will supply the necessary capability that home/newtab will need, but it won't be sufficient. We have other concerns:

  1. We need to concern ourselves with knowing when to register and when we can stop
    • Even if the addon ships with code that always calls if (!Glean?.myCategory?.myName) { register("my_category", "my_name"); } on every startup, even on versions where we know that my_category.my_name is compiled into the binary, we probably want to know when it's appropriate to remove that code. I assume there'll be a fixed and firm "minimum supported version" (perhaps everything >= the most recent ESR?), but in addition to that we'll need a rigorous and easy-to-use accounting of which versions shipped with which metrics.
    • This isn't a blocking concern, but a little effort here (perhaps a test that fails when it finds registration code for metrics present in all supported versions?) could save an untidy mess (like Histograms.json, histogram-allowlists.json) in the future.
  2. We need to concern ourselves with consistency
    • What will we put into place to ensure that the metric defined in metrics.yaml and shipping in Nightly is the exact same as the one registered via register("my_category", "my_name") which is collecting data in Release? In JavaScript especially it could be hard to catch type issues like defining a metric as a quantity but runtime-registering it as a string. It won't be until pings start arriving from old versions where the column metrics.counter.my_category_my_name is always NULL and the data for metrics.string["my_category.my_name"] is stuffed in additional_properties that we'll catch our error.
    • The data will still arrive, but capital-I Incidents have been called for less. And it's not an obvious place to look. I recommend solving this before shipping.
  3. We need to concern ourselves with evolution
    • We have new metric type called labeled_labeled_counter which we intend to ship in, say, Fx139. If newtab adds a labeled_labeled_counter to Fx139 and ships the addon to Fx138... it won't be able to successfully register that metric at runtime. Not because the parameters are wrong or anything, but because the Glean SDK in Fx138 doesn't know what a labeled_labeled_counter is.
    • In the event of registration failure, therefore, we should both fail safe (no throwing/crashing. This isn't a test-prefixed API any more) and fail extra-safe (we need to register something with the category and name from the supplied definition. And we don't know what its APIs are gonna be (besides testGetValue which should throw, allowing local testing of old versions to uncover this lack), so maybe something like a Proxy?)
    • And, naturally, a health metric should record instances where this goes awry.

No doubt I'm missing some things, but I think this is the gist: versioning, evolution, stability, fail-safety, testing, confidence, and rigour.

You need to log in before you can comment on or make changes to this bug.