1598247 - Make it easier for applications to manage metrics emitted by dependencies

Reporter

Description

•

6 years ago

Glean makes it possible for re-useable component libraries to define and instrument their own metrics. An existing example of this is the sync-telemetry support component, but we're interested in instrumenting other re-usable components such as the app-services logins store.

This raises some interesting considerations around the handling of data-review and metrics opt-out for applications consuming these components.

We expect that any application with a Glean-using dependency anywhere in its dependency tree, will obtain data review for the collection of that dependency's metrics in the context of that app. IIUC the current mechanism for enforcing this is documentation-driven: glean-using components must include a notice like this one in their documentation to inform consuming apps about their responsibilities.

This is a good starting point, but seems like it could "fail open" in a number of ways:

Application developers may simply miss the notice in the docs and end up emitting surprise metrics, particularly if it's in a dependency that is several levels deep in their tree.
New versions of the component might add new metrics. Data review of that change risks missing some of the consuming applications, if we don't have a canonical list of all consumers. So applications updating to the new version might get unreviewed changes to their metrics.
The application may want to use the functionality of a component without emitting its metrics, perhaps because they're not relevant for that app, or because they wouldn't actually pass data review or because we want to give the user more granular choice. Allowing this currently requires the component to support some sort of metrics-enablement flag in its API, not visible to Glean.

That's all manageable, but it seems like an opportunity for Glean to provide some facility to make it easier to do the right thing, and harder to do the wrong thing.

Some ideas that got thrown around in slack, which may or may not actually be good ideas in practice:

Could the application-level metrics.yaml file have some easy way to explicitly opt-in to metrics from dependencies, and they otherwise don't get emitted?
Could the component-level metrics.yaml file somehow declare which applications have received data review for emitting these metrics?

Let's discuss! :-)

Alessio Placitelli [:Dexter]

Updated

•

6 years ago

Whiteboard: [telemetry:glean-rs:m?]

Alessio Placitelli [:Dexter]

Comment 1

•

6 years ago

(In reply to Ryan Kelly [:rfkelly] from comment #0)

We expect that any application with a Glean-using dependency anywhere in its dependency tree, will obtain data review for the collection of that dependency's metrics in the context of that app. IIUC the current mechanism for enforcing this is documentation-driven: glean-using components must include a notice like this one in their documentation to inform consuming apps about their responsibilities.

Thank you very much for filing this bug and taking time to talk about this! It was a great starting discussion over Slack :)

This is a good starting point, but seems like it could "fail open" in a number of ways:

We acknowledge that the current process can be error prone. We will definitely need to do better with respect to enforcing products to be intentional about what they are collecting.

Application developers may simply miss the notice in the docs and end up emitting surprise metrics, particularly if it's in a dependency that is several levels deep in their tree.

This is a great point and this will become a much bigger pain point once more products start using the Glean SDK. It would be great if we could detect "glean enabled dependencies" at build-time and fail if no data review was provided for them.

That would definitely make it clear to the product that data review is required, in addition to the documentation.

New versions of the component might add new metrics. Data review of that change risks missing some of the consuming applications, if we don't have a canonical list of all consumers. So applications updating to the new version might get unreviewed changes to their metrics.

When adding metrics to shared components (same as when adding metrics to the Glean SDK itself!), data review should be very specific about it being a "blanket review". See, for example, the review request for the baseline ping in the Glean SDK from bug 1512938 comment 2.

I personally think that components should not know about products using them. If some component needs to know that, maybe this is a good indication that the metric should be defined in the product, rather than in the shared component.

We had an example for that, with the activation ping: it was general enough for it to live in "Android Components", but Fenix was really the only product requesting it so it ended up being implemented there.

In case any controversial metric needs to be collected in a shared component, I think that this should be mentioned in the component changelog, for applications to consider when updating to new versions.

The application may want to use the functionality of a component without emitting its metrics, perhaps because they're not relevant for that app, or because they wouldn't actually pass data review or because we want to give the user more granular choice. Allowing this currently requires the component to support some sort of metrics-enablement flag in its API, not visible to Glean.

In general, this might have lots of implications along the whole stack: from the collection to the analysis. Let's break down the cases you suggested:

[metrics] they're not relevant for that app: in this case, this might be an indication that the metric itself should not live in the shared component but rather be a product specific metric. But maybe I'm missing something here: was there a specific case you were thinking of?
[metrics] they wouldn't actually pass data review: that's a good (and scary!) point. I'd expect metrics to not pass data-review at all for any product if there were PII issues. One potential case I could think of is sync, for example, adding some "personal id" to the built-in metrics ping in the Glean SDK (which sends the client id). This maybe wouldn't be a problem for some products, but might be for others. But, for that specific case, I'd suspect things would be sent on custom pings, which would solve the problem. Were you thinking of some specific case?
we want to give the user more granular choice: as Mike wrote in the Fenix issue: "This may have implications that make analysis much more complicated. For example, when correlating measurements that may or may not have been collected, will we have a way of telling whether something isn't there because it didn't happen or because the user opted out? I am not a data science, but I'd strongly encourage getting some in the loop on this.".

If products/components have concerns about sending telemetry (even if the user enabled telemetry), a reasonable way forward could be to release telemetry-free components. We have bug 1525904 for that ;)

That's all manageable, but it seems like an opportunity for Glean to provide some facility to make it easier to do the right thing, and harder to do the wrong thing.

Yes! This is definitely a good opportunity to "encode" best practices :) Thanks again for bringing this up!

Some ideas that got thrown around in slack, which may or may not actually be good ideas in practice:

Could the application-level metrics.yaml file have some easy way to explicitly opt-in to metrics from dependencies, and they otherwise don't get emitted?

Could the component-level metrics.yaml file somehow declare which applications have received data review for emitting these metrics?

I'd be up for a third option: what if we had a way, at build time, to detect glean-enabled components? We could fail at build time if we find some dependency using glean without an accompanying data-review.

See Also: → https://github.com/mozilla-mobile/fenix/issues/5626, 1525904

Ryan Kelly [:rfkelly]

Reporter

Comment 2

•

6 years ago

Were you thinking of some specific case?

I did not have any specific cases in mind for their of these.

I like the suggestion of an explicit "blanket review" in order to simplify things here. Thanks for providing a bit of prior art for reference on that front.

Ryan Kelly [:rfkelly]

Reporter

Updated

•

6 years ago

Updated

•

6 years ago

Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m11]

Ryan Kelly [:rfkelly]

Reporter

Comment 3

•

6 years ago

The discussion in Bug 1600399 about which apps are covered by which privacy policy, makes we wonder whether it would be useful for Glean to reify the notion of a "policy" to intermediate between apps and components. You can imagine a data review for a component saying that it's OK for metrics to be collected in apps covered by privacy-policy "X" and declaring this in its metrics.yaml file. You can imagine an app declaring that it is covered by policy "X" as part of its glean config. And you can maybe imagine Glean doing something clever with those two pieces of information in order to do the right thing by default.

Just a thought-bubble that I wanted to share while it popped into my head :-)

Ryan Kelly [:rfkelly]

Reporter

Comment 4

•

6 years ago

Sharing for context, here's some ad-hoc logic we're looking to add in the logins component to give consuming apps a way to opt in to gathering telemetry from this dependency:

https://github.com/mozilla/application-services/pull/2325

I would love to remove this at some point in the future, in favour of a glean-native solution (but I'm hoping that it the ad-hoc logic will get us up and running in the meantime, and I don't have any expectation that a glean-native solution would look anything like this code in practice).

Alessio Placitelli [:Dexter]

Comment 5

•

6 years ago

(In reply to Ryan Kelly [:rfkelly] from comment #4)

Sharing for context, here's some ad-hoc logic we're looking to add in the logins component to give consuming apps a way to opt in to gathering telemetry from this dependency:

https://github.com/mozilla/application-services/pull/2325

As stated in the PR, this has implications for documentation, ingestion and analysis and I would encourage not to do that.

Is there a specific concern/case that this is trying to solve? Is this a requirement coming from a product? Can we have a meeting about this?

Flags: needinfo?(rfkelly)

Ryan Kelly [:rfkelly]

Reporter

Comment 6

•

6 years ago

Thanks for taking a look! I left a more detailed response here:

https://github.com/mozilla/application-services/pull/2325#discussion_r353938097

TL;DR: I think I may have described this poorly. The idea is not to have a general opt-in-or-out-at-any-time mechanism, but to give consuming apps a controlled way to "flip the switch" and enable the new metrics in their application after they've done the necessary data reviews etc. In other words, it's specifically focused on this concern from Comment 0:

Application developers may simply miss the notice in the docs and end up emitting surprise metrics,
particularly if it's in a dependency that is several levels deep in their tree.

Of course, it's possible that it's still a bad idea to do this! But hopefully that clarifies the scope/intent a little bit.

Flags: needinfo?(rfkelly)

Alessio Placitelli [:Dexter]

Comment 7

•

6 years ago

The conversation is spread across a number of PRs/issues (1, 2). Leaving the message here so that we can put pieces together, if needed.

Alessio Placitelli [:Dexter]

Updated

•

6 years ago

Whiteboard: [telemetry:glean-rs:m11] → [telemetry:glean-rs:m?]

Alessio Placitelli [:Dexter]

Comment 8

•

6 years ago

For our own sanity and future selves. From the various discussions, the problem boils down to, from what I understood:

how can we make sure products know that their components are collecting telemetry?
do we need a new data-review any time some new metric is added?

(1) can be addressed by failing at build time as described in comment 1, (2) will need to be determined by data stewards and products.

Alessio Placitelli [:Dexter]

Updated

•

6 years ago

Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:backlog]

Alessio Placitelli [:Dexter]

Updated

•

6 years ago

Comment 9

•

5 years ago

The answer to #2 is: The component does, but the product doesn't. We have a policy now.

The answer to #1 may end up being the Glean Dictionary.

But this bug still needs an answer for turning on/off collection per-component, which likely depends on component namespacing of metrics (bug 1578383)

Depends on: 1578383

Alessio Placitelli [:Dexter]

Comment 10

•

5 years ago

Closing. Explanation in comment 9.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WORKSFORME

Bugzilla

Make it easier for applications to manage metrics emitted by dependencies

Categories

(Data Platform and Tools :: Glean: SDK, enhancement, P3)

Tracking

(Not tracked)

People

(Reporter: rfkelly, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [telemetry:glean-rs:backlog])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Updated

Updated

Comment 9

Comment 10