Closed Bug 1587095 Opened 4 months ago Closed 2 months ago

Add support for sending a deletion request when a client opts out of data collection

Categories

(Data Platform and Tools :: Glean: SDK, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: janerik)

References

(Blocks 1 open bug)

Details

(Whiteboard: [telemetry:glean-rs:m11])

Attachments

(2 files)

Similar to Bug 1585410 on Desktop, we need a mechanism for sending a data deletion request when a user opts out of data collection.

Per discussion with @Dexterp37, there is an existing API for disabling data collection: Glean.setUploadEnabled(false), we may want to send a deletion request at this point or design a new mechanism for sending a deletion request.

This functionality is needed by the end of 2019.

Priority: -- → P3
See Also: → 1585410
Whiteboard: [telemetry:glean-rs:m?]
Duplicate of this bug: 1546947
Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m11]
Blocks: 1598720
Assignee: nobody → jrediger
Priority: P3 → P1

Alright, I took a first look at this and how it could work.
First: having pings.yaml is great!

Now on to the difficulties:

  1. Glean won't persist a collected ping if it would be empty (that is: no metrics and no events)
    1. Solution a) Add new send_if_empty field to pings to skip this
    2. Solution b) Add a dummy metric for the deletion-request ping (glean.deletion_request.optout=true for all pings)
  2. On disabling upload we clear out the whole pending pings directory and all its files
    1. For now I think we need to start looking into the files to determine what kind of ping it is (or include that ping name in the file name)
    2. Later, with the changes to the ping uploader we might be smarter about it
  3. On upload disabled we stop all upload tasks.
    1. We need to keep them working for a bit, to pick up the last ping to send (the deletion-request ping itself).
  4. (On Android) The upload task checks Glean.isUploadEnabled() itself and refuses work if it isn't. We either need to remove that or special-case for the one ping

I think part 1 can be quickly resolved with a decision here.
The rest might have some bigger implications, thus might require a bit more design work.

Definitely want :chutten's input here with the knowledge from the Desktop side, and :Dexter's input from the Glean side.

Flags: needinfo?(chutten)
Flags: needinfo?(alessio.placitelli)

(In reply to Jan-Erik Rediger [:janerik] from comment #2)

Alright, I took a first look at this and how it could work.
First: having pings.yaml is great!

Now on to the difficulties:

  1. Glean won't persist a collected ping if it would be empty (that is: no metrics and no events)
    1. Solution a) Add new send_if_empty field to pings to skip this
    2. Solution b) Add a dummy metric for the deletion-request ping (glean.deletion_request.optout=true for all pings)

I think I'm leaning towards solution (a) here: it will prevent sending useless data when we're really looking for a signal. This might be some property in the pings for pings.yaml files.

  1. On disabling upload we clear out the whole pending pings directory and all its files
    1. For now I think we need to start looking into the files to determine what kind of ping it is (or include that ping name in the file name)

One other thing we could do is to save the 'deletion' ping specifically in a separate directory that doesn't get cleared.

  1. On upload disabled we stop all upload tasks.
    1. We need to keep them working for a bit, to pick up the last ping to send (the deletion-request ping itself).

Yes, good point. If we're going to save the ping to a separate directory, we might make our life simpler here. we might just spin up a new work manager job, without touching the other one.

  1. (On Android) The upload task checks Glean.isUploadEnabled() itself and refuses work if it isn't. We either need to remove that or special-case for the one ping

That's a bit more quirky. However, if we settle for a separate workmanager upload job just for the deletion ping, we might work around this.

The risks I see, in general:

  • we mess up the normal uploading mechanism and start uploading other non-deletion pings when we shouldn't;
  • we fail to send the 'deletion' ping and never try again (what's the design around this case? Do we keep trying forever?)
Flags: needinfo?(alessio.placitelli)

(In reply to Jan-Erik Rediger [:janerik] from comment #2)

Alright, I took a first look at this and how it could work.
First: having pings.yaml is great!

Now on to the difficulties:

  1. Glean won't persist a collected ping if it would be empty (that is: no metrics and no events)
    1. Solution a) Add new send_if_empty field to pings to skip this
    2. Solution b) Add a dummy metric for the deletion-request ping (glean.deletion_request.optout=true for all pings)

Definitely not solution b.

  1. On disabling upload we clear out the whole pending pings directory and all its files
    1. For now I think we need to start looking into the files to determine what kind of ping it is (or include that ping name in the file name)
    2. Later, with the changes to the ping uploader we might be smarter about it

I can't comment this deeply on the Glean SDK's internals, but I can note that pings on disk on Desktop contain the ping name.

  1. On upload disabled we stop all upload tasks.
    1. We need to keep them working for a bit, to pick up the last ping to send (the deletion-request ping itself).

It might be worth considering what service level you hope to provide with this ping. What if we're unable to send it immediately? We could take the "optout" ping approach and say "you had your chance". Or we could continue trying. The Desktop impl takes the latter approach (because it was (relatively) easy to work in).

  1. (On Android) The upload task checks Glean.isUploadEnabled() itself and refuses work if it isn't. We either need to remove that or special-case for the one ping

I got nothing here. I think it depends wholly on what your solutions to earlier points end up being.

Flags: needinfo?(chutten)
Attached file GitHub Pull Request
Depends on: 1599422
Blocks: 1599427
Depends on: 1599439
Blocks: 1599455
Attachment #9112488 - Flags: data-review?(bmiroglio)
Blocks: 1600249
Depends on: 1600259
Blocks: 1600259
No longer depends on: 1600259
Comment on attachment 9112488 [details]
data-review-request.txt

# Data Review Form

1) Is there or will there be **documentation** that describes the schema for the ultimate data set in a public, complete, and accurate way? 

This will be documented here: https://mozilla.github.io/glean/book/user/pings/index.html

2) Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism) Provide details as to the control mechanism available.

This ping is needed so that Mozilla can adhere to certain legislative requirements pertaining to the deletion of data on a per-user basis (CCPA, for example). Therefore there is not a way to turn it off. 

3) If the request is for permanent data collection, is there someone who will monitor the data over time?

:janerik will permanently monitor this data.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?

I'd consider this Category 2: Interaction data. This allows us to identify *profiles* who have turned of Telemetry, which is akin to changing a preference in Firefox. This won't affect the review result.

5) Is the data collection request for default-on or default-off?

default-on

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice? **If unsure: escalate to legal if:**

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No) (If yes, set a todo reminder or file a bug if appropriate)**

No.

9) Does the data collection use a third-party collection tool? **If yes, escalate to legal.**

No.

data-review: r+
Attachment #9112488 - Flags: data-review?(bmiroglio) → data-review+
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Blocks: 1601567
Blocks: 1601902
You need to log in before you can comment on or make changes to this bug.