Closed Bug 1581647 Opened 5 years ago Closed 5 years ago

Add telemetry to mozregression

Categories

(Testing :: mozregression, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: wlach)

References

Details

Attachments

(2 files)

We should add some basic telemetry for mozregression, so we can monitor how often this app is used and thus help make decisions on how much in the way of resources we should put towards maintaining it. At minimum, it's probably worth sending events when:

  1. A bisection is started.
  2. A bisection is finished.

Interesting dimensions for this would probably be:

  • operating system
  • architecture
  • version of mozregression
  • whether this is the command line version or the gui version of mozregression
  • dates for the bisection parameters
  • clientid (can be mozregression specific)

We should use our existing data pipeline for processing this data into a bigquery table-- I doubt we need any derived datasets for it, since the volume of mozregression pings per day should be fairly low.

We can probably base our implementation at least partly on the build system telemetry

My inclination would be to make this telemetry opt-out, but should probably double check that's ok with the data stewards before proceeding.

Depends on: 1592006

If we're going to do this, we might as well use Glean (for which Python bindings are coming soon) from the start

I have started work on this! Promising initial progress as described in this blog post:

https://wlach.github.io/blog/2020/02/this-week-in-glean-special-guest-post-mozregression-telemetry-part-1/

One of the main blockers to wide deployment is the fact that much of the world still uses python2 to run mozregression. In particular, the mach and GUI variants only support python2.

  • Porting the GUI to python3 is covered by bug 1581633 (work is in progress!).
  • Mach is (I think!) covered by bug 1616584.
Assignee: nobody → wlachance
Status: NEW → ASSIGNED
Type: task → enhancement
Depends on: 1616584
Depends on: 1622909

Ok, I think we're fully formed enough to request data review. Executive summary:

  1. After this patch is submitted, mozregression (all variants) will collect telemetry including variant of mozregression used (gui, mach command, or console) and application bisected (Firefox, gecko view example, thunderbird, etc.)
  2. Telemetry is opt-out. You can opt out by changing a setting in the config file (or opening up the advanced preferences in the mozregression GUI and unclicking the telemetry tick box).

You can see the exact glean data we're gathering in the linked pull request.

:chutten seems like a natural person to ask for review on this

=== Data Collection Review Request ===

  1. What questions will you answer with this data?

Understanding how many people are using the different variants of the internal mozregression tool (GUI, mach, and console) and which applications they are bisecting.

  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

mozregression is an internal QA tool. It is expected that the data provided by telemetry will be used to prioritize:

  • How many resources are put into maintaining mozregression
  • Which aspects of mozregression we should put more effort into maintaining/updating.
  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

We could probably get some crude estimates of usage by looking at HTTP logs of archive.mozilla.org and taskcluster, but this is likely to be pretty coarse and would require a fair bit of post-processing to understand and put into context. It is also quite likely that our methodology to do this would need to change over time, as our release process inevitably changes. This is the only reasonably cost-effective way of solving this problem.

  1. Can current instrumentation answer these questions?

See above.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

category 1: technical:

  • variant of mozregression used (currently gui, mach, or console)
  • general information on application (operating system, architecture, etc.) as provided by a glean ping

category 2: usage:

  • app being bisected (e.g. firefox, gecko view example)
  1. How long will this data be collected? Choose one of the following:
  • I (William Lachance) want to permanently monitor this data.
  1. What populations will you measure?

All users of mozregression

  1. If this data collection is default on, what is the opt-out mechanism for users?

Either editing a config file or a GUI setting, depending on the variant of mozregression used. Instructions will be provided (see: https://github.com/mozilla/mozregression/pull/572).

  1. Please provide a general description of how you will analyze this data.

A simple dashboard will be created on stmo showing a rough breakdown of the user population and activity day-to-day. The number of dimensions is relatively small.

  1. Where do you intend to share the results of your analysis?

I plan on providing an initial report on my blog (https://wlach.github.io/)

  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:

No, just glean-based telemetry.

Flags: needinfo?(chutten)
Depends on: 1624695
Attached file data collection review

Moving to attachment with data-review? per https://wiki.mozilla.org/Firefox/Data_Collection

Flags: needinfo?(chutten)
Attachment #9135795 - Flags: data-review?(chutten)
Comment on attachment 9135795 [details] data collection review Before I can proceed with the review I'll need a url at which documentation of this collection will live. `glean_parser` autogenerates documentation that's often placed in the `docs/` folder of a repo, but I don't see that in the PR.
Attachment #9135795 - Flags: data-review?(chutten)

(In reply to Chris H-C :chutten from comment #6)

Comment on attachment 9135795 [details]
data collection review

Before I can proceed with the review I'll need a url at which documentation
of this collection will live. glean_parser autogenerates documentation
that's often placed in the docs/ folder of a repo, but I don't see that in
the PR.

Generation of documentation for these types of python projects seems to be an unsolved problem per https://github.com/mozilla/mozregression/pull/569#pullrequestreview-380189918

Given the simplicity of the collection here, I was suggesting using manually generated documention. You can see the content here: https://github.com/mozilla/mozregression/pull/572/files (I linked to this in the review request) -- when merged, this documentation should be accessible here: https://mozilla.github.io/mozregression/documentation/telemetry.html

Let me know how I should proceed here

Flags: needinfo?(chutten)
Comment on attachment 9135795 [details] data collection review DATA COLLECTION REVIEW RESPONSE: Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate? Yes. This collection is presently documented in [mozregression's docs](https://mozilla.github.io/mozregression/documentation/telemetry.html). Is there a control mechanism that allows the user to turn the data collection on and off? Yes. This collection can be disabled by command-line switch or configuration file option. If the request is for permanent data collection, is there someone who will monitor the data over time? Yes, :wlach is responsible. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under? Category 2, Interaction. Is the data collection request for default-on or default-off? This data collection is default-on. Does the instrumentation include the addition of any new identifiers? No. Is the data collection covered by the existing Firefox privacy notice? Yes. Does there need to be a check-in in the future to determine whether to renew the data? No. This collection is permanent. --- Result: datareview+
Flags: needinfo?(chutten)
Attachment #9135795 - Flags: data-review+

I wouldn't mind some clarification that mozregression is indeed collecting data under the Firefox Privacy Notice (and if not, under which Privacy Notice it is collecting data). Do you have a Trust/Legal review comment in a bug someplace? (usually done by :agray)

(In reply to Chris H-C :chutten from comment #10)

I wouldn't mind some clarification that mozregression is indeed collecting data under the Firefox Privacy Notice (and if not, under which Privacy Notice it is collecting data). Do you have a Trust/Legal review comment in a bug someplace? (usually done by :agray)

Not yet, the Firefox privacy notice does not seem to apply here, since mozregression isn't part of Firefox. I'd be happy to have the copy in https://github.com/mozilla/mozregression/pull/572/files reviewed by legal though, do you know how best to do that?

(In reply to William Lachance (:wlach) (use needinfo!) from comment #7)

Generation of documentation for these types of python projects seems to be an unsolved problem per https://github.com/mozilla/mozregression/pull/569#pullrequestreview-380189918

Documentation itself can already be generated and should be already generated. You can run the glean parser manually, see my comments

Given the simplicity of the collection here, I was suggesting using manually generated documention. You can see the content here: https://github.com/mozilla/mozregression/pull/572/files (I linked to this in the review request) -- when merged, this documentation should be accessible here: https://mozilla.github.io/mozregression/documentation/telemetry.html

I'd recommend not doing this, as discussed over the PR: the format of the docs we generate is consistent across all products and was vetted by the various teams working in the Glean ecosystem. They convey most of the information required for both analysts and users to know what's going on. Some of these things are missing from the docs that you are submitting. Please do generate the docs using the glean_parser tool.

Flags: needinfo?(wlachance)

(In reply to William Lachance (:wlach) (use needinfo!) from comment #11)

(In reply to Chris H-C :chutten from comment #10)

I wouldn't mind some clarification that mozregression is indeed collecting data under the Firefox Privacy Notice (and if not, under which Privacy Notice it is collecting data). Do you have a Trust/Legal review comment in a bug someplace? (usually done by :agray)

Not yet, the Firefox privacy notice does not seem to apply here, since mozregression isn't part of Firefox. I'd be happy to have the copy in https://github.com/mozilla/mozregression/pull/572/files reviewed by legal though, do you know how best to do that?

Alicia's pretty good about seeing and responding to needinfo : )

Alicia, mozregression is a developer tool that Mozilla develops. It would like to collect some usage data using the Glean SDK. Mostly Cat 1/2 data as well as (I only now notice) a persistent user identifier (a client_id). What are the rules for how it should point to our privacy policies?

Flags: needinfo?(agray)

(In reply to Alessio Placitelli [:Dexter] from comment #12)

(In reply to William Lachance (:wlach) (use needinfo!) from comment #7)

I'd recommend not doing this, as discussed over the PR: the format of the docs we generate is consistent across all products and was vetted by the various teams working in the Glean ecosystem. They convey most of the information required for both analysts and users to know what's going on. Some of these things are missing from the docs that you are submitting. Please do generate the docs using the glean_parser tool.

Ok great, I'm automatically generating this report now, and referencing it from the mozregression docs.

Flags: needinfo?(wlachance)

(In reply to Chris H-C :chutten from comment #13)

(In reply to William Lachance (:wlach) (use needinfo!) from comment #11)

(In reply to Chris H-C :chutten from comment #10)

I wouldn't mind some clarification that mozregression is indeed collecting data under the Firefox Privacy Notice (and if not, under which Privacy Notice it is collecting data). Do you have a Trust/Legal review comment in a bug someplace? (usually done by :agray)

Not yet, the Firefox privacy notice does not seem to apply here, since mozregression isn't part of Firefox. I'd be happy to have the copy in https://github.com/mozilla/mozregression/pull/572/files reviewed by legal though, do you know how best to do that?

Alicia's pretty good about seeing and responding to needinfo : )

Alicia, mozregression is a developer tool that Mozilla develops. It would like to collect some usage data using the Glean SDK. Mostly Cat 1/2 data as well as (I only now notice) a persistent user identifier (a client_id). What are the rules for how it should point to our privacy policies?

HI there. Seems like the same question fielded for the telemetry instrumentation question for moz-phab, too. I'm NI'ing MFeldman here to help guide where he thinks this would fit best from a legal notices POV.

Flags: needinfo?(agray) → needinfo?(mfeldman)
Depends on: 1622883

Doesn't strictly depend on bug 1626086, but would be nice to have.

Also adding a "see also" for bug 1621025, which is doing similar things with Glean.

Depends on: 1626086
See Also: → 1621025
Flags: needinfo?(mfeldman)

(In reply to Alicia Gray from comment #15)

HI there. Seems like the same question fielded for the telemetry instrumentation question for moz-phab, too. I'm NI'ing MFeldman here to help guide where he thinks this would fit best from a legal notices POV.

I talked privately with Michael about this. Without going into detail about what was discussed, I'm going to take the following actions:

  1. Link to https://www.mozilla.org/en-US/privacy/websites/ inside the telemetry documentation.
  2. Note that I talked to legal in this comment I am currently writing.
Depends on: 1628320
Depends on: 1628340
No longer depends on: 1628340

This has landed, still need to work through some pyinstaller problems with glean before we can release but can track that seperately-- the main body of this is done.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: