Closed Bug 1767336 Opened 2 months ago Closed 2 months ago

Consider introducing new telemetry probes for the extension startupCache usage

Categories

(WebExtensions :: General, task, P2)

task
Points:
2

Tracking

(firefox102 fixed)

RESOLVED FIXED
102 Branch
Tracking Status
firefox102 --- fixed

People

(Reporter: rpl, Assigned: rpl)

References

Details

(Whiteboard: [addons-jira])

Attachments

(3 files, 1 obsolete file)

As part of introducing support for the non-persistent background page (also known as "Event Page") we have introduced:

  • persistence into the startupCache for the listener of a number of additional API (in addition to the small subset, blocking webRequest and proxy listeners, which was already being done to support the "delayed background page startup" behavior)

  • persistence of the menu items created using the contextMenus API

To better assess additional followups related to the startupCache usage, like Bug 1755818 and Bug 1748535, we are considering adding the following new scalar telemetry probes:

  • total amount of data written into the startupCache
  • errors on reading data from the startupCache (keyed by error name), and may also be worth considering doing the same for errors raised on writing the data into the startupCache
  • total time to read and deserialize the data from the startupCache
Attached file data_review_bug1767336.md (obsolete) —

Hi Chris,
I'm attaching a data review request related to the telemetry probes we are planning to add as part of this bugzilla issue.

For the timestamps related to the extensions startupCache load, as also mentioned in the data review request, in the current version of the patch that I'm about to attach I've currently opted to record them using TelemetryTimestamps.jsm.

I was initially planning to add a telemetry scalar, but TelemetryTimestamps.jsm seems to be more specifically meant for tracking timestamps.

The downside of TelemetryTimestamps vs a Scalar is that TelemetryTimestamps doesn't have a definition file (like Scalars.yml) and so there wouldn't be an explicit definition to look at for the description of the two new timestamps.

I would be definitely happy to add a note in an rst page (as we did for prefs added to the telemetry environment) but it looks that at the moment none of the other timestamps is explicitly listed anywhere in our in-tree doc pages (and instead there is a link to a searchfox query that would also include these new ones in the patch I'm going to attach, https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/main-ping.html#simplemeasurements).

Let me know wdyt, and if TelemetryTimestamps isn't a good pick for this new timestamps, I'm happy to update the patch accordingly if that is the case (and I'm also happy to add some details in an rst page if you think we should add a list of timestamps in https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/main-ping.html, even just to create that list starting from these ones).

Attachment #9274820 - Flags: data-review?(chutten)

The "fun" thing with using TelemetryTimestamps is that they end up in simpleMeasurements. Getting the information from in there into a column in a dataset for an SQL query involves schema changes, and if you want it in the Measurement Dashboard you're looking at some fiddly changes in that infra as well.

So I wouldn't recommend using TelemetryTimestamps unless it's worth it to you to go through that effort. ((And in general we hoped to migrate simple measurements into Scalars as part of the old bug 1201837 initiative (before Glean made it a little more moot), so we here in Toolkit :: Telemetry don't regard patches like this with a particularly friendly mien : ) ))


According to how you're using it and what's in the Data Collection Review... are you mostly interested in the single duration of time between the start and end of the startup cache load? If so, may I recommend using Glean's timespan metric type? It has a very friendly API, that you can mirror to Telemetry with GIFFT ( the mirror will be a Scalar of kind uint which'll be the count of milliseconds. (The Glean timespan can be a finer resolution if you'd like). ) -- this seems like it might satisfy both the "timing-specific API" and "data ends up where you can analyze it" requirements.

(In reply to Chris H-C :chutten from comment #3)

The "fun" thing with using TelemetryTimestamps is that they end up in simpleMeasurements. ...

So I wouldn't recommend using TelemetryTimestamps unless it's worth it to you to go through that effort. ...


According to how you're using it and what's in the Data Collection Review... are you mostly interested in the single duration of time between the start and end of the startup cache load? If so, may I recommend using Glean's timespan metric type? ...

Thanks a lot :chutten, as I did mention in my comment I was kind of suspecting that based on the fact that it didn't seem to have been used to add more timestamps to the simpleMeasurements recently. I'm glad that I asked you explicitly.

Would it be good to add a explicit mention in the section related to simpleMeasurements? so that the docs would be pointing out explicitly that for new timestamps Glean's timespan (eventually paired with GIFFT if necessary) is the preferred approach (and to avoid to add new usage of TelemetryTimestamps as much as possible).

In the meantime, I'm going to take a look to Glean's timespan and GIFFT, and so I'm removing the data-review? and I'll put it back once I've updated patch and data review request accordingly.

In the meantime thanks again for these details and for the link to the docs!

Attachment #9274820 - Flags: data-review?(chutten)
Attachment #9274821 - Attachment description: Bug 1767336 - Record telemetry related to extension startupCache data size, read errors names and time to load timestamps. r?mixedpuppy! → Bug 1767336 - Record telemetry related to extension startupCache data size and read errors names. r?mixedpuppy!

Hi Chris,
I looked into introducing the first Glean metric in the WebExtensions internals, and I've attached a separate patch (https://phabricator.services.mozilla.com/D145419) which:

  • introduces a new metrics.yaml file in toolkit/components/extensions/metrics.yaml
  • define a new Glean timespan metric and mirrors it into a scalar
  • reworked the previous test case for the startup cache load time telemetry to covers the Glean metric and the mirrored scalar instead of the TelemetryTimestamps

I've added you as an additional optional reviewer because this is the first Glean metric we add in this part of the tree and I'd like to confirm that it is done correctly from a Telemetry / Glean perspective.
Feel free to redirect that review request if appropriate.
(As a side note I have left an inline TODO comment in the metrics.yaml scalar, as a reminder to replace the url listed in the data_reviews property of the Glean metric definition to the actual result of the data review).

This new data review request is a revised version of the previous one attached, the only parts that are being changed are the ones related to the Glean metric and mirrored scalar.

For the mirrored scalar I did set an expire version (set to "109"), following the suggestion from https://firefox-source-docs.mozilla.org/toolkit/components/glean/user/gifft.html#glean-interface-for-firefox-telemetry-gifft of setting an expire date after 7 versions on the mirrored scalar to consider removing it in favor of using only the Glean metric.

Let me know how this looks to you (and thanks again for the feedback and pointers you provided me in comment 3 yesterday).

Attachment #9274820 - Attachment is obsolete: true
Attachment #9274955 - Flags: data-review?(chutten)

Comment on attachment 9274955 [details]
data_review_bug1767336_rev2.md

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Luca Greco is responsible for the probes not expiring in Firefox 109.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.


Result: datareview+

Attachment #9274955 - Flags: data-review?(chutten) → data-review+
Pushed by luca.greco@alcacoop.it:
https://hg.mozilla.org/integration/autoland/rev/c5976da1b860
Record telemetry related to extension startupCache data size and read errors names. r=mixedpuppy
https://hg.mozilla.org/integration/autoland/rev/62b41f3ffec2
Record extension startupCache time to load as a Glean metric mirrored into a telemetry scalar. r=chutten,mixedpuppy
Regressions: 1768482
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch

Did the pipeline bug get filed to get this new metrics file added to the pipeline? (This is an oft-overlooked step of adding a new metrics file)

Flags: needinfo?(lgreco)
Blocks: 1774218

(In reply to Chris H-C :chutten from comment #10)

Did the pipeline bug get filed to get this new metrics file added to the pipeline? (This is an oft-overlooked step of adding a new metrics file)

I have definitely missed that, I just filed it as Bug 1774218.

Thanks a lot for the reminder and needinfo!

Flags: needinfo?(lgreco)
You need to log in before you can comment on or make changes to this bug.