Open Bug 1628893 Opened 4 years ago Updated 2 years ago

Ensure new webext storage reports the same non-sync telemetry as the old storage.

Categories

(WebExtensions :: Storage, enhancement, P2)

enhancement
Points:
2

Tracking

(Not tracked)

People

(Reporter: tcsc, Unassigned)

References

Details

(Whiteboard: SACI)

Attachments

(2 files, 1 obsolete file)

Part of this is also determining the set of things that are currently reported. At a minimum, it seems like this includes histograms like WEBEXT_STORAGE_LOCAL_GET_MS, and a few similar, but I haven't dug deeply.

Note that SQLite telemetry we'll loose as part of moving from mozstorage connections to rusqlite connections is not covered here (It's covered by a different bug)

Linking to the bugzilla issue that is going to implement the new backend for storage.sync, and setting it as P2 until it is going to get assigned.

Depends on: 1623245
Priority: -- → P2

Actually, WEBEXT_STORAGE_LOCAL_* and friends are all about storage.local and not storage.sync, so it looks like this might be automatic.

We likely should extend this to gain confidence in the new system, though.

(In reply to Thom Chiovoloni [:tcsc] (please ni?) from comment #2)

Actually, WEBEXT_STORAGE_LOCAL_* and friends are all about storage.local and not storage.sync, so it looks like this might be automatic.

We likely should extend this to gain confidence in the new system, though.

In case it may be helpful to evaluate what you want to track, follows what storage.sync telemetry probes we used to have (removed in Firefox 76, Bug 1621742):

  • 3 histograms (see original definitions here):
    • STORAGE_SYNC_GET_OPS_SIZE: size of results of get() operations performed this subsession. The key is the addon ID.
    • STORAGE_SYNC_SET_OPS_SIZE: Track the size of set() operations performed by addons this subsession. The key is the addon ID.
    • STORAGE_SYNC_REMOVE_OPS: Track the number of remove() operations addons perform this subsession. The key is the addon ID.
  • and 3 scalars (see original definitions here:
    • storage.sync.api.usage.extensions_using: count of webextensions that have data stored in the chrome.storage.sync API.
    • storage.sync.api.usage.items_stored: count of items in storage.sync storage, broken down by extension ID.
    • storage.sync.api.usage.storage_consumed: count of bytes used in storage.sync, broken down by extension ID.

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #3)

In case it may be helpful to evaluate what you want to track, follows what storage.sync telemetry probes we used to have (removed in Firefox 76, Bug 1621742):

Thanks - this begs a couple of questions:

  • How was this analyzed, and did the analysis provide any value? There's probably not much value in re-measuring things that weren't useful in practice :)
  • Is this analysis, or raw data, available anywhere today? I think TMO is very aggressive about killing data from old probes which is why I can't seem to find anything for those probes on beta.

(In reply to Mark Hammond [:markh] [:mhammond] from comment #4)

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #3)

In case it may be helpful to evaluate what you want to track, follows what storage.sync telemetry probes we used to have (removed in Firefox 76, Bug 1621742):

Thanks - this begs a couple of questions:

  • How was this analyzed, and did the analysis provide any value? There's probably not much value in re-measuring things that weren't useful in practice :)

To be honest I haven't personally used that data very often (I may have used it just a couple of times and long time ago, if I'm not mistaken I was investigating a quota-related issue and I was interested into looking what was a common size for the set operation, but unfortunately I can't recall much more than that).

I'm not sure if Ethan did use it more often in the past and if he does have any opinion about how useful it was in practice.

I completely agree that just recreate what we had in the past isn't a great justification on its own and it may likely not turn out to not be as useful as we want, in a similar way it may not be enough to just replicate what we collect for storage.local, but all those probes are something to keep in mind while we discuss about this.

"looking to what to collect" based on "what we would have liked to know right now" may be a good start, e.g.

  • total amount of data stored by all the extensions using the storage.sync API (because it would have been a useful information to evaluate the data migration we are going to do now, and it maybe also a version of that keyed by addon id would be useful to keep eye on it from a per-extension quota perspective).

  • rate of the errors that prevents to sync the extensions data with the server side, also keyed by extension id (if that wasn't already going to be collected from the new syncing side of this API), and the kind of failure (e.g. quota exceeded, connections refused, http errors etc.).

  • error/success rate of the data migrations we are going to do (along with this it maybe also be reasonable to record if there was any actual data to migrate, and how much data we tried to migrate, and in case of errors which kind of error prevented the migration from happening).

  • the value of the new pref that will enable/disable the new implementation would be a useful one (e.g. part of the telemetry environment or collect its value as a scalar), because it would be useful to keep an eye on the clients stuck on the old version.

Other examples that Thom did mention in an separate email thread I've been added today:

  • Error/success rates for errors that occur outside of sync (ones that
    occur during a sync will be handled automatically).

This could be useful to know the rate of API calls that are failing and as a result the data may not be stored or retrieved as expected (e.g. because of a corrupted storage).

  • Timing of reads / writes (we currently have this for storage.local
    but not storage.sync).

In storage.local we measure the timing mainly because the amount of data stored may be big enough and even affect the browser performance, (e.g. because of the IPC)

I'm not entirely sure if we are as much worried about that for storage.sync (e.g. because the amount of data is supposed to not be that big, given the more restrictive quota applied on the server side) and I suspect that in many case the data stored in storage.sync may not be updated or accessed as frequently as the one in storage.local (frequently as in N times per seconds)

Nevertheless, it may still be a reasonable metric to collect, I may be wrong and some extensions may not be using the API as I'm describing (which is based on just a subset of extensions that I explicitly looked into in the past while investigating issues).

  • Is this analysis, or raw data, available anywhere today? I think TMO is very aggressive about killing data from old probes which is why I can't seem to find anything for those probes on beta.

I can definitely confirm that I wasn't able to get any data from the Measurement Dashboard (I used to be able from what I recall), briefly looked from Re:dash if I was able to find any data and there was some (I'll send you the "test queries" asap).

So, this (and related bugs) are now unblocked, and are still in progress. I wrote the following not knowing how long they would be blocked, but am posting it now in the hope that it makes the subsequent flurry of data-reviews go smoothly, and identify any large issues in advance. For clarity, this covers: this bug, bug 1629127, bug 1634483, and bug 1629116. Separate data reviews will be submitted for each, this is mostly to identify big issues in advance and ensure those go as smoothly as possible.


Currently, the SACI team is working on a replacement for the chrome.storage.sync web-extension API. This will need a decent amount of new telemetry, and while we were hoping to ask about it as it lands, various parts are still in-progress, and the telemetry is waiting on those. Because of this, we decided to get out ahead of things and ask for a pre-data-review (is there such a thing?) of the pieces we intend to measure.

The data comes in a few types:

  1. Migration telemetry.
  2. Additions to the sync ping.
  3. Histograms and scalars, replacing or extending the ones which were removed in bug 1621742.

Additions to the sync ping

Note: some migration data will also be added to the sync ping, see that section
for further information.

rustComponents list

We're in the process of replacing our desktop components with rust ones. We
would like to be able to filter and analyze telemetry based on the set of rust
components in use. Specifically, we would like this for error analysis to
determine which sorts of errors the rust components are experiencing/causing.

It's very likely that they will be behind preferences in some cases, and so the
browser version number is determine this.

{
    "version": 4,
    "type": "sync",
    // ...
    "payload": {
        "rustComponents": ["webext-storage"],
        // ...
    }
}

failureReason.rustError

Currently, rust code called from sync (for example, during bookmarks sync) tend
to emit errors via the otherError type, or the unexpectedError type. These
are hard to categorize as we have nothing but some string data from the error
body.

Rust errors have a errorcode available to us, that can be used to better map
distinguish errors. Additionally, while our rust code generally defines explicit
error codes, different crates do not coordinate error code values to ensure
global uniqueness. Because of this, a string indicating the source of the error
will be attached. This will happen at the same place the error is mapped to an
error code.

To support this, the list of failure reasons will be extended with a "rustError" type

This will have three properties:

  1. code: An integer error code mapped from error data, see https://github.com/mozilla/application-services/blob/webext-storage-ffi/components/webext-storage/ffi/src/error_codes.rs
  2. error: The sanitized error message, if available. This is the value we'd currently send in othererror.error.
  3. source: optional string describing the component which emitted the error, generally a crate name. For example, "webext-storage", "fxa-client". This is present in order for code to be meaningful, as without it, different components may (and do) use the same codes.

For example:

{
    "version": 4,
    "type": "sync",
    // ...
    "payload": {
        "syncs": [{
            // ...
            "failureReason": [{
                "type": "rustError",
                "source": "webext-storage",
                "code": 3,
                "error": "The database file is locked",
            }],
            // ...
        }]
        // ...
    }
}

Migration telemetry.

The current plan for this is to send information in the sync ping for sync
users, and a very broad health entry in the main ping as a histogram for
non-sync users.

Migration data in the sync ping

To the sync ping we add a new array to the payload: "migrations".

See: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/sync-ping.html

{
    "version": 4,
    "type": "sync",
    // ...
    "payload": {
        // various existing data, `syncs`, `events`, etc...
        "migrations": [{
            "type": "webext-storage",
            // The number of records detected prior to migration.
            "numDetected": integer,
            // The number of records successfully migrated
            "numSucceeded": integer,
            // The number of records we failed to migrate.
            // `numFailed + numSucceeded` should equal `numDetected`.
            "numFailed": integer,
            // The amount of time the migration took in milliseconds
            "duration": integer,
            // True if any exceptions or `Result::Err`s were produced when
            // attempting to migrate.
            "anyFailures": boolean,
            // If migration failed, indicates why. Optional
            "failureReason": /* existing sync ping error instance */,
            // True if the existing table exceeded quotas (not an error
            // for migration)
            "quotasExceeded": boolean,
        }]
    }
}

This is largely based on https://github.com/mozilla-mobile/fenix/issues/5888.

The only major notable addition is that we include a quotasExceeded property.
We allow data in the DB to to exceed the quotas after a migration (and after a
sync, but that's not what this flag indicates), in which case we'll return for
insertion operations until the size of the store.

(The previous implementation had bugs around its handling here, and we believe
the quotas will be exceeded by user databases in practice in some cases).

Non-sync ping migration data

It's possible to use this API even if the user is not a sync user. In this case
we report overall status via a single telemetry probe.

"WEBEXT_MIGRATION_SUCCESS_PERCENT" : {
    "kind": "linear",
    "keyed": true,
    "high": 100,
    "n_buckets": 50,
    "description": "Percentage of records which were successfully migrated"
    // ...
},

It will only be recorded when migration occurs, and will only record the number
of records. A total failure unrelated to record counts is a zero.

This will not be sent for sync users both as to prevent correlation, and
because the data is available in the sync ping.

Other firefox telemetry

Histograms

New histograms

  • STORAGE_SYNC_QUOTA_REQUIRED is a histogram keyed by addon id that is
    emitted on quota errors, and reports the size that would have been required in
    order to not trigger a quota error. The intention is for us to potentially
    investigate increasing the quotas if it would be beneficial to users:

    "STORAGE_SYNC_QUOTA_REQUIRED": {
        "record_in_processes": ["main", "content"],
        "products": ["firefox"],
        "kind": "exponential",
        "high": 100000,
        "n_buckets": 30,
        "keyed": true,
        "description": "Tracks the size in bytes which would have been required when a quota error is hit. The key is the addon ID.",
        "releaseChannelCollection": "opt-out"
        // ...
    },
    
  • WEBEXT_STORAGE_SYNC_{GET,SET}_MS is the equivalent of
    WEBEXT_STORAGE_LOCAL_{GET,SET}_MS but for storage.sync. It records the
    amount of time it took to produce the result for a call to
    chrome.storate.sync.get. We would like to be aware of any major
    inefficiencies in our approach, as much of it will be used by other
    components.

    "WEBEXT_STORAGE_SYNC_GET_MS": {
        "record_in_processes": ["main", "content"],
        "products": ["firefox"],
        "kind": "exponential",
        "releaseChannelCollection": "opt-out",
        "high": 50000,
        "n_buckets": 100,
        "description": "The amount of time it takes to perform a get via storage.sync",
        "releaseChannelCollection": "opt-out"
        // ...
    },
    "WEBEXT_STORAGE_SYNC_SET_MS": {
        "record_in_processes": ["main", "content"],
        "products": ["firefox"],
        "kind": "exponential",
        "releaseChannelCollection": "opt-out",
        "high": 50000,
        "n_buckets": 100,
        "description": "The amount of time it takes to perform a set via storage.sync",
        "releaseChannelCollection": "opt-out"
        // ...
    },
    

Previously removed histograms

In https://bugzilla.mozilla.org/show_bug.cgi?id=1621742 these were removed.

  • STORAGE_SYNC_GET_OPS_SIZE is a histogram keyed by addon id that indicates
    the size of a value returned by chrome.storage.sync.get

    "STORAGE_SYNC_GET_OPS_SIZE": {
        "record_in_processes": ["main", "content"],
        "products": ["firefox"],
        "kind": "exponential",
        "high": 100000,
        "n_buckets": 30,
        "keyed": true,
        "description": "Track the size of results of get() operations performed this subsession. The key is the addon ID.",
        "releaseChannelCollection": "opt-out"
        // ...
    },
    
  • STORAGE_SYNC_SET_OPS_SIZE is a histogram keyed by addon id that indicates
    the size of a value passed to chrome.storage.sync.set

    "STORAGE_SYNC_SET_OPS_SIZE": {
        "record_in_processes": ["main", "content"],
        "products": ["firefox"],
        "kind": "exponential",
        "high": 100000,
        "n_buckets": 30,
        "keyed": true,
        "description": "Track the size of results of get() operations performed this subsession. The key is the addon ID.",
        "releaseChannelCollection": "opt-out"
        // ...
    },
    
  • STORAGE_SYNC_REMOVE_OPS_SIZE is a histogram counting the number of remove operations this subsession.

    "STORAGE_SYNC_REMOVE_OPS": {
        "record_in_processes": ["main", "content"],
        "products": ["firefox""],
        "kind": "count",
        "keyed": true,
        "description": "Track the number of remove() operations addons perform this subsession. The key is the addon ID.",
        "releaseChannelCollection": "opt-out"
        // ...
    },
    

storage.sync.api.usage Scalars

In https://bugzilla.mozilla.org/show_bug.cgi?id=1621742 these were removed.

They're being re-added more or less as they were.

storage.sync.api.usage:
  extensions_using:
    description: >
      The count of webextensions that have data stored in the chrome.storage.sync API.
      This includes extensions that have not used the storage.sync API this session.
      This includes items that were not stored this session.
      This scalar is collected after every sync.
    kind: uint
    keyed: false
    release_channel_collection: opt-out
    products:
      - 'firefox'
    record_in_processes:
      - main
    # ...
  items_stored:
    description: >
      The count of items in storage.sync storage, broken down by extension ID.
      This includes extensions that have not used the storage.sync API this session.
      This includes items that were not stored this session.
      This scalar is collected after every sync.
    expires: "64"
    kind: uint
    keyed: true
    release_channel_collection: opt-out
    products:
      - 'firefox'
    record_in_processes:
      - main
    # ...
  storage_consumed:
    description: >
      The count of bytes used in storage.sync, broken down by extension ID.
      This includes extensions that have not used the storage.sync API this session.
      This includes items that were not stored this session.
      This scalar is collected after every sync.
    expires: "64"
    kind: uint
    keyed: true
    release_channel_collection: opt-out
    products:
      - 'firefox'
    record_in_processes:
      - main
    # ...
Flags: needinfo?(chutten)

Very complete explanation of what you hope to collect. Please expand upon why the removed collections are needed (to answer which questions, for instance) in the individual data reviews.

One clarification about the sanitization of error codes I'd ask for is how it's being done. My read of this is that the error strings could contain arbitrary strings if they're coming from lower crates. (Maybe the db has difficulty serializing someone's bank account number). You should be a little more clear on what strings are expected and which are possible from this collection.

While I'm here, a note on TMO: it takes the most recent definition files from the tree and uses that to figure out which probes to display. The data's still around though and can be accessed using other self-service data analysis tools like sql.tmo.

A further, cautionary note about TMO: TMO should only ever be used for exploration purposes. Product decisions should not be made by looking at TMO as it has known failure modes for producing misleading data visualizations (most notably pseudo-replication where chatty users are overrepresented). We have a successor tool (GLAM) that does not suffer the same faults and will be replacing TMO in the not-too-distant future.

Flags: needinfo?(chutten)

Thanks Thom,
This is the first time I've seen this, so I've a few comments below - but in general it looks good!

(In reply to Chris H-C :chutten from comment #7)

One clarification about the sanitization of error codes I'd ask for is how it's being done. My read of this is that the error strings could contain arbitrary strings if they're coming from lower crates. (Maybe the db has difficulty serializing someone's bank account number). You should be a little more clear on what strings are expected and which are possible from this collection.

Yeah - I think the error codes will be problematic, and just make our existing bad situation worse. Error strings in particular are problematic because we already have loads of noise from error strings which include timestamps and guids, making it extremely difficult to know how often the underlying error actually occurs. Making things worse, this is going to duplicate errors which currently are fairly well defined. Eg, errors like "dns resolution failed" or http failure responses can be identified (without strings!) but adding rust codes here will mean there are now two distinct ways these errors will be recorded - making it even more difficult to say, eg, "x% of syncs fail due to [network/server] errors".

All our current dashboards have this issue, so I opened bug 1637798. We aren't going to be able to fix that before we land this, but I think it's worth trying to work out how to not make matters worse in this patch. Eg, if we really do end up with 2 ways of representing a http 500 error, let's try and make sure the "new" way is a way that makes sense for our existing JS code, and pings we send on mobile devices, to move to in the future.

Regarding migration: there seems to be a lot of data here which we will not care about (other than idle curiosity). For example, some of those items might identify that we are migrating from malformed databases - which isn't something we will ever mitigate. In general, we should try and limit what we collect to things where there's clear action we would take. I'd be inclined to suggest num_failed and duration is likely enough (lots of zeros or a high avg duration are probably actionable. Failure reasons probably aren't really, although maybe it does make sense to collect that with the general "error strings are fairly useless" caveats above.)

Somewhat related: For WEBEXT_MIGRATION_SUCCESS_PERCENT, I'd probably be more interested in "did we migrate successfully migrate anything?" Eg, failure to migrate some records is more likely to represent bad source data rather than a bad migration process, whereas high rates of failure to migrate anything seems more likely to be a problem with the process we are measuring?

Re "other telemetry" - I think that we only need to record in the parent process, but we should check that.

Re "Previously removed histograms" - it's not clear if you are asking for them to be reinstated? I assume so - but if so, I'm wondering if we need, say, both STORAGE_SYNC_GET_OPS_SIZE and the quota related ones? I'm not sure why we'd care about the size of request under the quota, nor how that could drive changes we might make given we have no real control over how the API is used.

A possible addition we should consider is to somehow capture the rate of requests - some way of determining whether addons misusing this API might be causing the browser to slow down and force us to implement some kind of throttling mechanism?

Bug 1628893 - Add histograms timing get/set operations for storage.sync r?markh

Assignee: nobody → tchiovoloni
Status: NEW → ASSIGNED

Ah, wish I had noticed marks comments, I'll work through them shortly. I'll have the data-review up for that PR shortly. It doesn't include scalars since that needs a little backend work.

This is the first time I've seen this...

I did ping you on it in slack...

Comment on attachment 9150846 [details]
Bug 1628893 - Add telemetry histograms for webext-storage-sync. r?markh

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

  1. What questions will you answer with this data?
  • How is the performance of operations made over our js/rust/xpcom bridge?
  • How many extensions are actively using sync storage?
  • How many read/write operations are extensions performing in a session, and how large are these operations?
  • etc.
  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:
  • The SACI team intends to land many more pieces of sync and accounts
    infrastructure written in rust, and we would like to know if the performance
    overhead of the current appproach is substantial. In an ideal world, we'd have
    a baseline to measure against, but to a certain extent objective measurements
    here will suffice, especially as they can be compared against the numbers for
    the non-sync storage backends.

  • We'd like to know usage information of this service, for the purpose of
    determining if certain changes that are within our ability to make would be
    beneficial for users.

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

Some of this data was previously collected and was removed, however it is no
longer available. There's no real alternative to the timing information, as
it's a new system.

  1. Can current instrumentation answer these questions?

No.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.

Note: I'm not certain about the collection categories, as some of the histograms
are keyed on addon ID, which isn't really reflected in the category set. Maybe
somewhere between Interaction data and Web activity data would be appropriate. I
put "Interaction data" down, though, and it's possible I'm overthinking this

<table>
<tr>
<td>Measurement Description</td>
<td>Data Collection Category</td>
<td>Tracking Bug #</td>
</tr>
<tr>
<td>STORAGE_SYNC_GET_OPS_SIZE: a histogram keyed by addon id that indicates the size of a value returned by chrome.storage.sync.get</td>
<td>Interaction data, but see note above</td>
<td>1628893</td>
</tr>
<tr>
<td>STORAGE_SYNC_SET_OPS_SIZE: a histogram keyed by addon id that indicates the size of a value passed to chrome.storage.sync.set</td>
<td>Interaction data, but see note above</td>
<td>1628893</td>
</tr>
<tr>
<td>STORAGE_SYNC_REMOVE_OPS: a histogram counting the number of remove operations this subsession</td>
<td>Interaction data, but see note above</td>
<td>1628893</td>
</tr>
<tr>
<td>WEBEXT_STORAGE_SYNC_GET_MS: The amount of time it takes to perform a get via storage.sync</td>
<td>Interaction data</td>
<td>1628893</td>
</tr>
<tr>
<td>WEBEXT_STORAGE_SYNC_GET_MS: The amount of time it takes to perform a set via storage.sync</td>
<td>Interaction data</td>
<td>1628893</td>
</tr>
</table>

  1. How long will this data be collected? Choose one of the following:
  • I want this data to be collected for 6 months initially (potentially renewable).
  1. What populations will you measure?
  • Which release channels?

All.

  • Which countries?

All.

  • Which locales?

All.

  • Any other filters? Please describe in detail below.

None.

  1. If this data collection is default on, what is the opt-out mechanism for users?

Normal telemetry opt-out mechanism.

  1. Please provide a general description of how you will analyze this data.

Via telemetry.mozilla.org and redash.

  1. Where do you intend to share the results of your analysis?

Internally mostly, in meetings.

  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:

No, unless redash (stmo) counts.

  • Are you using that on the Mozilla backend? Or going directly to the third-party?

N/A

Attachment #9150846 - Flags: data-review?(chutten)

Note: I believe none of the concerns raised by mark apply to this patch in particular, although it will apply to future patches described in that preliminary review above.

It was pointed out that this is lacking tests. That's a totally fair complaint and I'll have tests for it up ASAP (probably tomorrow morning, realistically).

(The only defense for it that I can offer is that the old code it replaces didn't really have any I could find, and that testing telemetry is kind of inherently tricky).

(In reply to Thom Chiovoloni [:tcsc] (please ni?) from comment #13)

It was pointed out that this is lacking tests. That's a totally fair complaint and I'll have tests for it up ASAP (probably tomorrow morning, realistically).

(The only defense for it that I can offer is that the old code it replaces didn't really have any I could find, and that testing telemetry is kind of inherently tricky).

Hi Thom,
Shane's drive-by comment on phabricator is just meant to be a reminder that the patch should also provide some tests to be considered complete (it wasn't meant to make you feel guilty, just remind what makes the patch ready to be fully reviewed).

The test is useful to us in the first place to be sure that we are recording the telemetry when we are supposed to and that it will not regress without being noticed, and it also help us to catch earlier (during a "version bump simulation") when the telemetry is going to expire, and allow us to have some time to re-evaluate the probe (and decide if it should just be extended, or removed, or changed to become more useful).

It is definitely possible that some older code did not have tests on all the telemetry probes, but most of the webextensions probes are actually already being tests in automated tests, and so there are existing tests that you can use to know how to achieve some of the tricky parts, and some times ago TelemetryTestUtils.jsm provides some test helpers as well (and you can use our help to figure out what may not be clear just by looking to the existing code).

There are a number of existing test files that may be useful to you to get some ideas about how to test the telemetry probes on extensions, one that may be more immediately useful to you to look is likely test_ext_storage_telemetry.js (and a search like this one on searchfox may provide you some additional examples).

As an additional side note, the patch that adds the telemetry has to also be signed off from a data telemetry peer (along with the data-review sign off you already asked to chutten in comment 11).

Shane's drive-by comment on phabricator is just meant to be a reminder that the patch should also provide some tests to be considered complete (it wasn't meant to make you feel guilty, just remind what makes the patch ready to be fully reviewed).

It was more responding to a comment in a google doc -- I actually hadn't noticed the driveby. That said, I think you read too much into my comment, I'm not really discouraged by that or anything.

Thanks for your links regardless, though.

As an additional side note, the patch that adds the telemetry has to also be signed off from a data telemetry peer (along with the data-review sign off you already asked to chutten in comment 11).

Huh, okay. I've been working outside of the firefox tree for a while now (yay githhub), and don't think I've had to do this before. Any chance someone can point me in the direction of an appropriate peer?

(In reply to Thom Chiovoloni [:tcsc] (please ni?) from comment #15)

As an additional side note, the patch that adds the telemetry has to also be signed off from a data telemetry peer (along with the data-review sign off you already asked to chutten in comment 11).

Huh, okay. I've been working outside of the firefox tree for a while now (yay githhub), and don't think I've had to do this before. Any chance someone can point me in the direction of an appropriate peer?

Personally most of the times I do ask the sign-off on the patch side to :janerik, :chutten will know if there are others peers you can ask for it.

Unless you change how Telemetry works, Telemetry Peers aren't needed for review. Instrumentation (addition, removal, changes of data collections) are designed to be self-service in tree. If you'd like us to consult on best practices or help you with finding the documentation, then please do reach out.

Attached file data collection review

According to the Data Review process (https://wiki.mozilla.org/Firefox/Data_Collection) Data Reviews should be in attachments on bugs, and the data-review? flag placed on the review request.

Attachment #9152856 - Flags: data-review?(chutten)
Attachment #9150846 - Flags: data-review?(chutten)
Comment on attachment 9152856 [details]
data collection review

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is Telemetry so is documented in its definitions file [Histograms.json](https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/Histograms.json) and the [Probe Dictionary](https://telemetry.mozilla.org/probe-dictionary/).

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

No. This collection will expire in Firefox 85.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

Yes. :tcsc is responsible for renewing or removing the collection before it expires in Firefox 85.

---
Result: datareview+
Attachment #9152856 - Flags: data-review?(chutten) → data-review+

(In reply to Chris H-C :chutten from comment #17)

Unless you change how Telemetry works, Telemetry Peers aren't needed for review. Instrumentation (addition, removal, changes of data collections) are designed to be self-service in tree. If you'd like us to consult on best practices or help you with finding the documentation, then please do reach out.

Hi :chutten,
My apologies for my mistake then, we have been always asking a sign-off on the patch that adds the new telemetry and I assumed that it was required to confirm that the telemetry data collected does represents what described in the data review request and that there aren't any other potential issue from a data collection point of view (e.g. missing to notice that we may be collecting some arbitrary and potentially sensitive string, or collecting by mistake more than actually approved etc.).

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #20)

Hi :chutten,
My apologies for my mistake then, we have been always asking a sign-off on the patch that adds the new telemetry and I assumed that it was required to confirm that the telemetry data collected does represents what described in the data review request and that there aren't any other potential issue from a data collection point of view (e.g. missing to notice that we may be collecting some arbitrary and potentially sensitive string, or collecting by mistake more than actually approved etc.).

Nope, no worries. We're happy to consult. Just letting you know it isn't necessary.

Hey Thom,
I was wondering if there is anything else I can do to help you with the changes needed to complete D76375.
Feel free to ping me on phabricator or needinfo-ing me on bugzilla if there are issues that are blocking you on D76375D76375
(or also on Matrix or slack if you discussing some details or a particular issue upfront my help).

Flags: needinfo?(tchiovoloni)

Sorry, it got put on the back burner in favor of a few other bugs, but they're done so I'll be getting back to this now.

Flags: needinfo?(tchiovoloni)
Attachment #9161687 - Attachment is obsolete: true
Attachment #9161687 - Attachment is obsolete: false
Attachment #9150846 - Attachment is obsolete: true
Severity: normal → N/A
Points: --- → 2
Assignee: nobody → chiovolonit
Status: NEW → ASSIGNED
Flags: needinfo?(mixedpuppy)
Assignee: chiovolonit → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: