Add support for sending a deletion request when a client opts out of data collection
Categories
(Data Platform and Tools :: Glean: SDK, task, P1)
Tracking
(Not tracked)
People
(Reporter: Dexter, Assigned: janerik)
References
(Blocks 1 open bug)
Details
(Whiteboard: [telemetry:glean-rs:m11])
Attachments
(2 files)
41 bytes,
text/x-github-pull-request
|
Details | Review | |
2.39 KB,
text/plain
|
bmiroglio
:
data-review+
|
Details |
Similar to Bug 1585410 on Desktop, we need a mechanism for sending a data deletion request when a user opts out of data collection.
Per discussion with @Dexterp37, there is an existing API for disabling data collection: Glean.setUploadEnabled(false)
, we may want to send a deletion request at this point or design a new mechanism for sending a deletion request.
This functionality is needed by the end of 2019.
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 2•4 years ago
|
||
Alright, I took a first look at this and how it could work.
First: having pings.yaml
is great!
Now on to the difficulties:
- Glean won't persist a collected ping if it would be empty (that is: no metrics and no events)
- Solution a) Add new
send_if_empty
field to pings to skip this - Solution b) Add a dummy metric for the deletion-request ping (
glean.deletion_request.optout=true
for all pings)
- Solution a) Add new
- On disabling upload we clear out the whole pending pings directory and all its files
- For now I think we need to start looking into the files to determine what kind of ping it is (or include that ping name in the file name)
- Later, with the changes to the ping uploader we might be smarter about it
- On upload disabled we stop all upload tasks.
- We need to keep them working for a bit, to pick up the last ping to send (the
deletion-request
ping itself).
- We need to keep them working for a bit, to pick up the last ping to send (the
- (On Android) The upload task checks
Glean.isUploadEnabled()
itself and refuses work if it isn't. We either need to remove that or special-case for the one ping
I think part 1 can be quickly resolved with a decision here.
The rest might have some bigger implications, thus might require a bit more design work.
Definitely want :chutten's input here with the knowledge from the Desktop side, and :Dexter's input from the Glean side.
Reporter | ||
Comment 3•4 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #2)
Alright, I took a first look at this and how it could work.
First: havingpings.yaml
is great!Now on to the difficulties:
- Glean won't persist a collected ping if it would be empty (that is: no metrics and no events)
- Solution a) Add new
send_if_empty
field to pings to skip this- Solution b) Add a dummy metric for the deletion-request ping (
glean.deletion_request.optout=true
for all pings)
I think I'm leaning towards solution (a) here: it will prevent sending useless data when we're really looking for a signal. This might be some property in the pings for pings.yaml
files.
- On disabling upload we clear out the whole pending pings directory and all its files
- For now I think we need to start looking into the files to determine what kind of ping it is (or include that ping name in the file name)
One other thing we could do is to save the 'deletion' ping specifically in a separate directory that doesn't get cleared.
- On upload disabled we stop all upload tasks.
- We need to keep them working for a bit, to pick up the last ping to send (the
deletion-request
ping itself).
Yes, good point. If we're going to save the ping to a separate directory, we might make our life simpler here. we might just spin up a new work manager job, without touching the other one.
- (On Android) The upload task checks
Glean.isUploadEnabled()
itself and refuses work if it isn't. We either need to remove that or special-case for the one ping
That's a bit more quirky. However, if we settle for a separate workmanager upload job just for the deletion ping, we might work around this.
The risks I see, in general:
- we mess up the normal uploading mechanism and start uploading other non-deletion pings when we shouldn't;
- we fail to send the 'deletion' ping and never try again (what's the design around this case? Do we keep trying forever?)
Comment 4•4 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #2)
Alright, I took a first look at this and how it could work.
First: havingpings.yaml
is great!Now on to the difficulties:
- Glean won't persist a collected ping if it would be empty (that is: no metrics and no events)
- Solution a) Add new
send_if_empty
field to pings to skip this- Solution b) Add a dummy metric for the deletion-request ping (
glean.deletion_request.optout=true
for all pings)
Definitely not solution b.
- On disabling upload we clear out the whole pending pings directory and all its files
- For now I think we need to start looking into the files to determine what kind of ping it is (or include that ping name in the file name)
- Later, with the changes to the ping uploader we might be smarter about it
I can't comment this deeply on the Glean SDK's internals, but I can note that pings on disk on Desktop contain the ping name.
- On upload disabled we stop all upload tasks.
- We need to keep them working for a bit, to pick up the last ping to send (the
deletion-request
ping itself).
It might be worth considering what service level you hope to provide with this ping. What if we're unable to send it immediately? We could take the "optout" ping approach and say "you had your chance". Or we could continue trying. The Desktop impl takes the latter approach (because it was (relatively) easy to work in).
- (On Android) The upload task checks
Glean.isUploadEnabled()
itself and refuses work if it isn't. We either need to remove that or special-case for the one ping
I got nothing here. I think it depends wholly on what your solutions to earlier points end up being.
Assignee | ||
Comment 5•4 years ago
|
||
Assignee | ||
Comment 6•4 years ago
|
||
Assignee | ||
Updated•4 years ago
|
Comment 7•4 years ago
|
||
Comment on attachment 9112488 [details] data-review-request.txt # Data Review Form 1) Is there or will there be **documentation** that describes the schema for the ultimate data set in a public, complete, and accurate way? This will be documented here: https://mozilla.github.io/glean/book/user/pings/index.html 2) Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism) Provide details as to the control mechanism available. This ping is needed so that Mozilla can adhere to certain legislative requirements pertaining to the deletion of data on a per-user basis (CCPA, for example). Therefore there is not a way to turn it off. 3) If the request is for permanent data collection, is there someone who will monitor the data over time? :janerik will permanently monitor this data. 4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under? I'd consider this Category 2: Interaction data. This allows us to identify *profiles* who have turned of Telemetry, which is akin to changing a preference in Firefox. This won't affect the review result. 5) Is the data collection request for default-on or default-off? default-on 6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)? No. 7) Is the data collection covered by the existing Firefox privacy notice? **If unsure: escalate to legal if:** Yes. 8) Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No) (If yes, set a todo reminder or file a bug if appropriate)** No. 9) Does the data collection use a third-party collection tool? **If yes, escalate to legal.** No. data-review: r+
Assignee | ||
Updated•4 years ago
|
Description
•