Closed Bug 1621025 Opened 4 years ago Closed 4 years ago

Implement telemetry into moz-phab

Categories

(Conduit :: moz-phab, enhancement, P2)

Production
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glob, Assigned: zalun)

References

Details

(Keywords: conduit-triaged)

Attachments

(4 files)

Implement telemetry into moz-phab:

  • privacy/legal review
  • opt-in/out mechanisms
  • unique client identification
  • glean integration
  • implementing pings for specific questions
  • dashboards
Keywords: conduit-triaged
Priority: -- → P2
Type: defect → enhancement
Attachment #9132175 - Flags: data-review?(chutten)
Comment on attachment 9132175 [details]
moz-phab-data-review.md

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

I don't know. :glob, please clarify. (In the past the in-tree Firefox Source Docs have been a good place to put these sorts of things. Conduit's docs might work, too)

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection will be able to be opted into and out of.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, :glob is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

High Category. (Cat 3 at least, possibly Cat4). :agray, could we have Trust's input on the category of the user identifier and whether the mitigation (opt-out collection restricted to Mozilla Employees) is sufficient to permit its collection?

    Is the data collection request for default-on or default-off?

Default on for Mozilla employees (members of the `mozilla-employee-confidential` Bugzilla group). Default off for all others.

    Does the instrumentation include the addition of any new identifiers?

Yes, a persistent (and reversable? :glob?) user identifier, and a persistent (random) installation identifier.

    Is the data collection covered by the existing Firefox privacy notice?

Nope. (:agray, another question here about which privacy notice that would apply)

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

---
Result: datareview-, pending clarification of the identified questions by Trust and :glob.
Flags: needinfo?(glob)
Flags: needinfo?(agray)
Attachment #9132175 - Flags: data-review?(chutten) → data-review-

(In reply to Chris H-C :chutten from comment #2)

Is there or will there be documentation that describes the schema for
the ultimate data set available publicly, complete and accurate?

I don't know. :glob, please clarify. (In the past the in-tree Firefox Source
Docs have been a good place to put these sorts of things. Conduit's docs
might work, too)

Would a file added to the moz-phab repository describing the telemetry be sufficient, linked from the README.md?
I'd prefer to keep the docs for telemetry in the same repo as the source.

Does the instrumentation include the addition of any new identifiers?

Yes, a persistent (and reversable? :glob?) user identifier, and a persistent
(random) installation identifier.

For moz-phab a non-reversible identifier per-person is suitable.

In practice this is difficult to achieve as the only identifier that we can use cross systems is their email address.
Solutions such as a randomly salted hash would effectively change the per-user identifier to a per-installation.

The current plan calls for a straight hash of the email address, which is indirectly reversible.

Flags: needinfo?(glob)

(In reply to Byron Jones ‹:glob› 🎈 from comment #3)

(In reply to Chris H-C :chutten from comment #2)

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

...

Would a file added to the moz-phab repository describing the telemetry be sufficient, linked from the README.md?

Yup! Just so long as it's available publicly (I'll need a URL for the Data Review).

Does the instrumentation include the addition of any new identifiers?

...

The current plan calls for a straight hash of the email address, which is indirectly reversible.

That's what I figured, thank you for clarifying!

(In reply to Chris H-C :chutten from comment #4)

Would a file added to the moz-phab repository describing the telemetry be sufficient, linked from the README.md?

Yup! Just so long as it's available publicly (I'll need a URL for the Data Review).

It'll be https://github.com/mozilla-conduit/review/blob/master/TELEMETRY.md (currently does not exist, because of chickens and eggs).

(In reply to Chris H-C :chutten from comment #2)

High Category. (Cat 3 at least, possibly Cat4). :agray, could we have Trust's input on the category of the user identifier and whether the
mitigation (opt-out collection restricted to Mozilla Employees) is sufficient to permit its collection?

The identifier is the hashed email address, but the notes say this is indirectly reversible? Can :glob expand on this?

Does the instrumentation include the addition of any new identifiers?

Yes, a persistent (and reversable? :glob?) user identifier, and a persistent (random) installation identifier.
We've not allowed persistent (random) installation identifiers historically because there is no method for the user to control these. We have made a conscious decision not to allow this for Firefox installations vs. profiles, for instance. A couple of additional questions, tho. 1) If an employee-user opts-out of the telemetry, will this be treated as a telemetry deletion ping and remove all related data? 2) All installations will be tied to a hashed email address? There are no occurrences of non-account moz-fab access scenarios that need to be considered?

Is the data collection covered by the existing Firefox privacy notice?

Nope. (:agray, another question here about which privacy notice that would apply)
Good question. We'd have to figure out where this would be most appropriate. Let's determine the answer to the data collection piece first and then once that decision is made we can address this question.

Flags: needinfo?(agray) → needinfo?(glob)

(In reply to Alicia Gray from comment #6)

The identifier is the hashed email address, but the notes say this is indirectly reversible? Can :glob expand on this?

It isn't possible from a hash alone to determine the email address which was used to generate it.

However as we, the owners of Phabricator, have access to all user's email addresses, it would be possible to generate hashes for all Phabricator user email addresses, then use that table to match against the hash provided via moz-phab telemetry.

Yes, a persistent (and reversable? :glob?) user identifier, and a persistent (random) installation identifier.

We've not allowed persistent (random) installation identifiers historically because there is no method for the user to control these. We have made a conscious decision not to allow this for Firefox installations vs. profiles, for instance.

As it isn't unusual for Firefox developers to use multiple operating systems (eg. primary MacOS, with Windows on the side for Windows specific issues) each would need to be counted as a separate installation, capturing different data points, and impacting our planning in different ways. A persistent per-installation identifier is the only way to achieve this.

If employees wish to control their random installation identifiers that would be possible via editing a file (similar to how they configure moz-phab by editing a file). I can't think of any reasonable reason why anyone would want to do this in favour of opting out completely.

  1. If an employee-user opts-out of the telemetry, will this be treated as a telemetry deletion ping and remove all related data?

No.

We could delay the first telemetry ping for 30 minutes to ensure that if someone opts out after initially installing moz-phab we don't accidentally capture data.

  1. All installations will be tied to a hashed email address? There are no occurrences of non-account moz-fab access scenarios that need to be considered?

Correct - moz-phab's operation requires a Phabricator API token to perform its primary function, and that token then leads us to the email address.

Flags: needinfo?(glob)

I think there might be a slight communication clarification needed around the "persistent installation identifier": :glob, by this do you mean something akin to Firefox Telemetry's client_id?

(Also re-adding Alicia now that we have the answers necessary to start the hunt for a Privacy Policy)

Flags: needinfo?(glob)
Flags: needinfo?(agray)

(In reply to Chris H-C :chutten from comment #8)

I think there might be a slight communication clarification needed around the "persistent installation identifier": :glob, by this do you mean something akin to Firefox Telemetry's client_id?

For reference here's the docs on that:

client_id UUID Optional A UUID identifying a profile and allowing user-oriented correlation of data

Yes; with the goal of allowing install-oriented correlation of data, rather than per-user (assuming user means "a person").

Flags: needinfo?(glob)

Trust review completed; apologies for the time lag here.

Approved for data collection default on/opt-out for Mozilla employees and default off/opt-in for non-employees. If a user opts-out, no requirement to delete the telemetry such as we are doing with Firefox.

What is the retention time for the telemetry data set? Will you use 13 months like standard telemetry?

Flags: needinfo?(agray)

(In reply to Alicia Gray from comment #11)

Approved for data collection default on/opt-out for Mozilla employees and default off/opt-in for non-employees. If a user opts-out, no requirement to delete the telemetry such as we are doing with Firefox.

Great - thank you.

What is the retention time for the telemetry data set? Will you use 13 months like standard telemetry?

13 months should work for us.

Attachment #9133139 - Attachment description: Bug 1621025 - WIP Telemetry → WIP Telemetry

(In reply to Byron Jones ‹:glob› 🎈 from comment #12)

(In reply to Alicia Gray from comment #11)

Approved for data collection default on/opt-out for Mozilla employees and default off/opt-in for non-employees. If a user opts-out, no requirement to delete the telemetry such as we are doing with Firefox.

Great - thank you.

What is the retention time for the telemetry data set? Will you use 13 months like standard telemetry?

13 months should work for us.

That sounds good. I'm still chasing down the privacy notice question. I'll update the bug once we know for sure.

Depends on: 1623587
Depends on: 1622909
No longer depends on: 1623587

Circling back on the relevant privacy notice piece: Legal can add a bullet point to https://www.mozilla.org/en-US/privacy/websites/ under the Analytics & Optimization bullet point in the cookies preference/web analytics/optimization tools once this is set up and ready. When the UI is available with the settings to manage the data collection, please put the link here and NI me. I can then bring in product counsel for the privacy notice update.

In the meantime, please let me know if you have more questions or need anything else.

(In reply to Alicia Gray from comment #14)

Circling back on the relevant privacy notice piece: Legal can add a bullet point to https://www.mozilla.org/en-US/privacy/websites/ under the Analytics & Optimization bullet point in the cookies preference/web analytics/optimization tools once this is set up and ready.

From that page:

This privacy notice applies to Mozilla operated websites and mobile apps, which include the domains mozillians.org, mozilla.org, firefox.com, and webmaker.org, among others. This includes, for example, bugzilla.mozilla.org, reps.mozilla.org, careers.mozilla.org, developers.mozilla.org, support.mozilla.org, addons.mozilla.org, and wiki.mozilla.org

moz-phab isn't a website nor is it a mobile app - it's a command line developer tool for those working on Firefox itself (similar to mozregression and mach).

If moz-phab is added to that page Legal should probably update that sentence to something like:
This privacy notice applies to Mozilla operated websites, mobile apps, and developer tooling, which includes...

When the UI is available with the settings to manage the data collection, please put the link here and NI me

There won't be a UI to manage settings; instead moz-phab is configured by editing a text file under the user's home directory.

Flags: needinfo?(agray)

(In reply to Byron Jones ‹:glob› 🎈 from comment #15)

(In reply to Alicia Gray from comment #14)

Circling back on the relevant privacy notice piece: Legal can add a bullet point to https://www.mozilla.org/en-US/privacy/websites/ under the Analytics & Optimization bullet point in the cookies preference/web analytics/optimization tools once this is set up and ready.

From that page:

This privacy notice applies to Mozilla operated websites and mobile apps, which include the domains mozillians.org, mozilla.org, firefox.com, and webmaker.org, among others. This includes, for example, bugzilla.mozilla.org, reps.mozilla.org, careers.mozilla.org, developers.mozilla.org, support.mozilla.org, addons.mozilla.org, and wiki.mozilla.org

moz-phab isn't a website nor is it a mobile app - it's a command line developer tool for those working on Firefox itself (similar to mozregression and mach).

If moz-phab is added to that page Legal should probably update that sentence to something like:
This privacy notice applies to Mozilla operated websites, mobile apps, and developer tooling, which includes...

When the UI is available with the settings to manage the data collection, please put the link here and NI me

There won't be a UI to manage settings; instead moz-phab is configured by editing a text file under the user's home directory.

Understood; Legal thought this was the best place, but it's not set in stone. Once the instrumentation in place, let me know and I can re-review with Legal. We can always adjust the wording as necessary to incorporate additional references.

Flags: needinfo?(agray)
Attachment #9133139 - Attachment description: WIP Telemetry → Bug 1621025 - WIP Telemetry

Found this while looking for prior art re: legal notices. I think this probably depends on bug 1624695, so the deletion ping request can be sent between runs.

See Also: → 1581647

(In reply to William Lachance (:wlach) (use needinfo!) from comment #17)

Found this while looking for prior art re: legal notices. I think this probably depends on bug 1624695, so the deletion ping request can be sent between runs.

We're not deleting data when a user opts out (see comment 7).

Assignee: nobody → pzalewa
Attachment #9133139 - Attachment description: Bug 1621025 - WIP Telemetry → Added Telemetry
Status: NEW → ASSIGNED
Attachment #9133139 - Attachment description: Added Telemetry → Bug 1621025 - Added Telemetry r=glob!
Attached file mozphab data-review
Attachment #9151642 - Flags: data-review?(chutten)
Comment on attachment 9151642 [details]
mozphab data-review

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is documented in the [mozphab repo](https://github.com/mozilla-conduit/review/blob/master/TELEMETRY.md)

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is collected via the Glean SDK so it must be configurable. In this case mozphab has a configuration file where this preference can be changed.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Piotr Zalewa is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Most are Cat1/2, but there are some identifiers that are Category 4, Identifiable information.

    Is the data collection request for default-on or default-off?

Default on for Mozilla Employees, default off for everyone else.

    Does the instrumentation include the addition of any new identifiers?

Yes. One client-id-like uuid per mozphab install, one hashed email address. Both were [reviewed by Trust](https://bugzilla.mozilla.org/show_bug.cgi?id=1621025#c11).

    Is the data collection covered by the existing Firefox privacy notice?

Yes. (I think. There appears to be some conversation about clarifying something as necessary)

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

---
Result: datareview+
Attachment #9151642 - Flags: data-review?(chutten) → data-review+
Attachment #9133139 - Attachment description: Bug 1621025 - Added Telemetry r=glob! → Bug 1621025 - Added Telemetry r=glob! r=zeid!
Attachment #9133139 - Attachment description: Bug 1621025 - Added Telemetry r=glob! r=zeid! → Bug 1621025 - Added Telemetry r=glob!,zeid!
Attachment #9133139 - Attachment description: Bug 1621025 - Added Telemetry r=glob!,zeid! → Bug 1621025 - Added Telemetry r=dexter,glob,zeid!
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Attachment #9133139 - Attachment is obsolete: true
Attachment #9133139 - Attachment is obsolete: false
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: