Closed Bug 1328974 Opened 3 years ago Closed 3 years ago

chrome.storage.sync should register metrics for usage

Categories

(WebExtensions :: General, defect)

defect
Not set

Tracking

(firefox54 fixed)

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: glasserc, Assigned: glasserc)

References

(Blocks 1 open bug)

Details

(Whiteboard: triaged)

Attachments

(1 file)

It would be good to expose metrics via e.g. Telemetry that exposed how much use chrome.storage.sync was getting, in particular number of extensions using it, total number of items being used, amount of storage being used.
Flags: needinfo?(amckay)
Blocks: 1329715
Component: WebExtensions: Untriaged → WebExtensions: General
Flags: needinfo?(amckay)
Whiteboard: triaged
Assignee: nobody → eglassercamp
Attachment #8827212 - Flags: review?(benjamin)
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

Cancelling review per IRC because this is missing pieces.
Attachment #8827212 - Flags: review?(kmaglione+bmo)
Attachment #8827212 - Flags: review?(benjamin)
Attachment #8827212 - Flags: review?(benjamin)
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review106008

I think this is fine, but I'd like to re-review updated scalar docs.

::: toolkit/components/telemetry/Scalars.yaml:172
(Diff revision 2)
> +storage.sync.api.usage:
> +  extensions_using:
> +    bug_numbers:
> +      - 1328974
> +    description: >
> +      The count of webextensions using the chrome.storage.sync API.

I think this means "the number of active webextensions who used the chrome.storage.sync API in this session". But it would be good to clarify: if an installed extension has something stored in the sync API but doesn't actually call chrome.storage.sync, will it be counted? If a disabled or uninstalled addon has something stored, what happens then?

::: toolkit/components/telemetry/Scalars.yaml:183
(Diff revision 2)
> +    release_channel_collection: opt-out
> +  items_stored:
> +    bug_numbers:
> +      - 1328974
> +    description: >
> +      The count of items stored in storage.sync, broken down by extension ID.

Does this mean "the number of items actively stored during this session" or "the total number of items passively stored in sync for this addon"? Will this continue to measure items stored for disabled or uninstalled addons? At what point in the session is this recorded (presumably short sessions may not have this included at all)?
Attachment #8827212 - Flags: review?(benjamin) → review-
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review106012

The implementation looks good, but I'll echo Benjamin's concerns about the data we're collecting, and add some of my own:

- I think we want to know about how many extensions are actively using sync storage, especially vs. how many have data stored there.
- It would be good to have (probably histogram) data on how many read/write operations add-ons are performing in a session, and how large those operations are.
- It would also be good to have an idea of how much data is in storage which has not been recently accessed. Possibly a histogram of the size of existing data, in buckets based on when it was last accessed.
- We should probably report whether a given use is using a custom server, and collate usage data based on that.
- The collection keys we're using are user-specific, so the only correlations we can draw from them are for the same user accessing the same data on multiple systems. That might be useful (or it might be a privacy concern), but I'm not sure it's the most useful way to collect the data. We sould probably be better off keying it by raw extension ID.

::: services/common/kinto-storage-adapter.js:180
(Diff revision 2)
>    "clearCollectionMetadata": `DELETE FROM collection_metadata;`,
> +
> +  "calculateStorage": `
> +    SELECT collection_name, SUM(LENGTH(record)) as size, COUNT(record) as num_records
> +      FROM collection_data
> +        GROUP BY collection_name;`,

Hm. Well, I guess this is consistent with the style in the rest of this file, but the `FROM` and `GROUP BY` clauses should really have the same indentation as each other, and usually as the `SELECT` clause as well.

::: services/common/kinto-storage-adapter.js:391
(Diff revision 2)
>    }
>  
> +  calculateStorage() {
> +    return this._executeStatement(statements.calculateStorage, {})
> +      .then(result => {
> +        return Array.from(result).map(row => {

Nit: No need for the `.map()`. That just creates a separate garbage array, and then throws it away. Instead, it should be something like `Array.from(result, row => ...)`

::: services/common/kinto-storage-adapter.js:392
(Diff revision 2)
>  
> +  calculateStorage() {
> +    return this._executeStatement(statements.calculateStorage, {})
> +      .then(result => {
> +        return Array.from(result).map(row => {
> +          return {

Nit: You can avoid some newlines and the return clause with something like:

    return this._executeStatement(statements.calculateStorage, {})
      .then(result => {
        return Array.from(result, row => ({
          collectionName: row.getResultByName("collection_name"),
          size: row.getResultByName("size"),
          numRecords: row.getResultByName("num_records"),
        }));
      });

Or, if we're OK using async functions in this file:

    let result = await this._executeStatement(statements.calculateStorage, {});

    return Array.from(result, row => ({
      collectionName: row.getResultByName("collection_name"),
      size: row.getResultByName("size"),
      numRecords: row.getResultByName("num_records"),
    }));

::: services/common/kinto-storage-adapter.js:438
(Diff revision 2)
>        });
>        yield Promise.all(promises);
>        yield conn.execute(statements.clearCollectionMetadata);
>      });
>    }
> +

Nit: Please remove.
Attachment #8827212 - Flags: review?(kmaglione+bmo)
Attachment #8827212 - Flags: review?(benjamin)
Attachment #8827212 - Flags: review?(benjamin)
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review106012

- The current version of this patch uses the keyed scalars to measure which extensions have data stored, and keyed histograms to measure how actively the API is being used. Does that seem OK to you?
- Does `get(["a", "b"])` count as one operation, or two? The current version counts it as two.
- What kind of question are you hoping to answer with this data? This is going to be challenging to collect for a couple of reasons. For one, I don't store any kind of "age" information in the database or in memory. For another, Telemetry doesn't really support storing any kind of data+age information. We'd have to come up with a clever encoding to get something like this done. In the meantime, I've left this alone.
- We sort of don't work at all with custom servers at the moment, so any data reported is against official servers. I'd prefer to defer any work on this to bug 1311764.
- No, the collection keys we use in the metrics are true extension IDs. We only use the obfuscated extension IDs as the IDs for remote collections. However, I would like a second opinion about whether this represents a privacy concern. :bsmedberg, could you maybe offer an opinion?

> Nit: No need for the `.map()`. That just creates a separate garbage array, and then throws it away. Instead, it should be something like `Array.from(result, row => ...)`

Thanks, I keep forgetting about that.
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review106008

I've revised the docs, and I've also responded to these questions in the review request, although this may count as spoilers for the patch.

> I think this means "the number of active webextensions who used the chrome.storage.sync API in this session". But it would be good to clarify: if an installed extension has something stored in the sync API but doesn't actually call chrome.storage.sync, will it be counted? If a disabled or uninstalled addon has something stored, what happens then?

I had to doublecheck, and I think my intention changed as I was working with this patch series. As written right now, this scalar counts the extensions that have data in storage.sync, so this won't count extensions that have called `get()` and then nothing else, but it will count extensions that have been disabled or uninstalled.

> Does this mean "the number of items actively stored during this session" or "the total number of items passively stored in sync for this addon"? Will this continue to measure items stored for disabled or uninstalled addons? At what point in the session is this recorded (presumably short sessions may not have this included at all)?

This will continue to measure items stored for disabled or uninstalled addons, as their storage is never cleared (at present). I've also added a sentence explaining when the metric is recorded (after syncing).
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review107190

I'd still like metrics on how much stale/unused data we're storing, particularly for add-ons that are no longer installed, but I suppose this is enough for now.
Attachment #8827212 - Flags: review?(kmaglione+bmo) → review+
(In reply to Ethan Glasser-Camp (:glasserc) from comment #9)
> Comment on attachment 8827212 [details]
> Bug 1328974: Record metrics for chrome.storage.sync usage,
> 
> https://reviewboard.mozilla.org/r/104962/#review106012
> 
> - The current version of this patch uses the keyed scalars to measure which
> extensions have data stored, and keyed histograms to measure how actively
> the API is being used. Does that seem OK to you?

Yes, that looks good.

> - Does `get(["a", "b"])` count as one operation, or two? The current version
> counts it as two.

I could make an argument either way, but this seems fine.

> - What kind of question are you hoping to answer with this data? This is
> going to be challenging to collect for a couple of reasons. For one, I don't
> store any kind of "age" information in the database or in memory. For
> another, Telemetry doesn't really support storing any kind of data+age
> information. We'd have to come up with a clever encoding to get something
> like this done. In the meantime, I've left this alone.

I'm mainly hoping to understand how many extensions store data that they don't
actually use, or leave stale data behind rather than cleaning it up. And I'm
particularly interested in data left behind by extensions after they're
uninstalled.

As far as how to store it, it seems like something we should be able to
represent as a histogram, but I guess there isn't an easy way to actually
achieve it.

> - We sort of don't work at all with custom servers at the moment, so any
> data reported is against official servers. I'd prefer to defer any work on
> this to bug 1311764.

Sounds good, as long as we can add that information down the road without
causing problems for the existing dataset.
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review107864

::: toolkit/components/telemetry/Histograms.json:10586
(Diff revision 5)
> +    "alert_emails": ["eglassercamp@mozilla.com"],
> +    "expires_in_version": "58",
> +    "bug_numbers": [1328974],
> +    "kind": "count",
> +    "keyed": true,
> +    "description": "Track the number of get() (i.e. read) operations addons make."

This description and all the other keyed metrics need to include details about the key. Addon ID?

::: toolkit/components/telemetry/Histograms.json:10586
(Diff revision 5)
> +    "alert_emails": ["eglassercamp@mozilla.com"],
> +    "expires_in_version": "58",
> +    "bug_numbers": [1328974],
> +    "kind": "count",
> +    "keyed": true,
> +    "description": "Track the number of get() (i.e. read) operations addons make."

I presume this is the number of get operations performed during this subsession? It's worth being specific about the timeframe.

::: toolkit/components/telemetry/Histograms.json:10596
(Diff revision 5)
> +    "bug_numbers": [1328974],
> +    "kind": "exponential",
> +    "high": 100000,
> +    "n_buckets": 30,
> +    "keyed": true,
> +    "description": "Track the size of results of get() operations."

I don't think you actually need both the count and the histogram. The histogram will show you both, because you can count the number of measurements. Same comment for the set count/histogram.
Attachment #8827212 - Flags: review?(benjamin) → review-
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review106012

:bsmedberg points out that telemetry already sends information about what add-ons are installed, so we don't need to worry about sending that information again here.
Comment on attachment 8827212 [details]
Bug 1328974: Record metrics for chrome.storage.sync usage,

https://reviewboard.mozilla.org/r/104962/#review108270

data-r=me either way

::: toolkit/components/telemetry/Histograms.json:10583
(Diff revision 6)
>      "expires_in_version": "never",
>      "bug_numbers": [1276006],
>      "kind": "count",
>      "description": "Tracking the total number of opened Containers."
>    },
> +  "STORAGE_SYNC_GET_OPS_SIZE": {

These aren't marked opt-out, but I suspect you intended them to be. I suggest making them opt-out.
Attachment #8827212 - Flags: review?(benjamin) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/b6e9e58ac9d7
Record metrics for chrome.storage.sync usage, r=bsmedberg,kmag
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b6e9e58ac9d7
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.