Add a "deletion request" ping
Categories
(Toolkit :: Telemetry, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: mreid, Assigned: chutten)
References
Details
Attachments
(2 files)
We removed the previous "deletion" ping in Bug 1445921, but we now plan to handle deletion of data on the backend, and as such will need to know the client id to perform the deletion.
We may need more than one submission to handle other identifiers that are not correlated with the Telemetry client id.
Assignee | ||
Comment 1•6 years ago
|
||
Hoooooo this is gonna have corner cases aplenty. I have questions.
- Is this Telemetry-specific, or should we expand the scope (or dupe the bug) to include Glean?
- How will the client code know that the order to be deleted has been successfully received? Is the usual "we received a 2xx HTTP status code" sufficient? (or, put generally: should this ping have different/stricter send and resend behaviours?)
- What's the user signal for sending off this ping? Is it a new preference control? Part of
about:telemetry
?- Is it available to only part of our audience? (only on release, only Californians, etc)
- Are we keying it off of a user unchecking the existing telemetry box in the prefs? If so, what happens if the user opts back in?
- Will we need code to find all of the
client_id
s for all of the profiles on this install? On this machine? - What should we do with the local Telemetry Archive?
- Are we responsible for signalling to other systems that they need to delete their data?
- Are we going to need meta-data on this process, like:
- Ratio of success/fail attempts
- UI interaction events/counts for the user signal
- (Number of profiles with
c0ffee
client_ids when attempt received, etc.) - And if so, how does that work wrt the deletion request.
- Will we need changes to the first-run Policy Notice data reporting policy code to display more information?
- Is there a deadline before which this must hit Release?
- What version does that translate into?
- How does this work's priority stack up against our other plans?
- Does the request to delete mean we should infer that the user opts out of Telemetry?
- Including Telemetry that isn't
client_id
based? - Or does it "just" mean we rotate the
client_id
and start afresh?
- Including Telemetry that isn't
That's what comes to mind at the moment. I might come up with more given some time to think.... oh wait. Right. There's some meta questions: Who's doing which parts of this? Is it starting life as a Proposal, is there a PRD we need to satisfy, are we expected to attend meetings, who is our contact for getting these questions answered, ...
Hm. Maybe I'd best stop here until those last ones get answered.
Reporter | ||
Comment 2•6 years ago
|
||
(In reply to Chris H-C :chutten from comment #1)
- Is this Telemetry-specific, or should we expand the scope (or dupe the bug) to include Glean?
Let's create a separate bug to handle Glean.
- How will the client code know that the order to be deleted has been successfully received? Is the usual "we received a 2xx HTTP status code" sufficient? (or, put generally: should this ping have different/stricter send and resend behaviours?)
I am inclined to use the normal 2xx. I am open to other suggestions here though. In particular, we may want to include some means of tracing the request to ensure it has been processed.
- What's the user signal for sending off this ping? Is it a new preference control? Part of
about:telemetry
?
The signal is the "opt out" signal, and should use the same preference that sends the "optout" ping today.
* Is it available to only part of our audience? (only on release, only Californians, etc)
It is available to all Firefox users.
* Are we keying it off of a user unchecking the existing telemetry box in the prefs?
Yes.
If so, what happens if the user opts back in?
Ideally, opting back in would cancel any pending deletion requests for that id. That is not strictly a requirement, though, so the current behaviour of resetting the id upon opt-out may be fine. Alicia Gray on the Trust team confirms that it is fine to continue to use a new client id when opting back in. This may have data analysis implications, so I’ll file a Data Science bug to investigate.
- Will we need code to find all of the
client_id
s for all of the profiles on this install? On this machine?
Opting in or out of data collection is done at the profile level, so I think it makes sense for the deletion request to match.
- What should we do with the local Telemetry Archive?
Locally stored telemetry data should also be deleted.
- Are we responsible for signalling to other systems that they need to delete their data?
"Downstream" consumers of data, such as Amplitude, will also be notified of the deletion request. As we learned in Armagaddon, other systems respond to the same pref, such as Activity Stream and Screenshots - any of these that have a unique, stable identifier will also be covered here. Did you have other systems in mind?
- Are we going to need meta-data on this process, like:
- Ratio of success/fail attempts
- UI interaction events/counts for the user signal
- (Number of profiles with
c0ffee
client_ids when attempt received, etc.)- And if so, how does that work wrt the deletion request.
Excellent questions - again I will defer to Data Science to help define requirements here - Emily, do you have thoughts on what signals we might need around opt-out and data deletion?
- Will we need changes to the first-run Policy Notice data reporting policy code to display more information?
I don’t think so, but ni? Alicia Gray for thoughts here.
- Is there a deadline before which this must hit Release?
- What version does that translate into?
- How does this work's priority stack up against our other plans?
Ideally this would land in Firefox by the end of 2019, so we should target 71 or 72. Is this possible? From talking with :kparlante, this is a very high priority. We’ll review the full set of Q4 priorities ASAP and I’ll update here if there’s a change.
- Does the request to delete mean we should infer that the user opts out of Telemetry?
- Including Telemetry that isn't
client_id
based?
Can you clarify what you mean here? Do you mean pings that do not include a client_id, such as the “anonymous” ping or the “launcher-process-failure” ping?
* Or does it "just" mean we rotate the `client_id` and start afresh?
The deletion request is tied to the opt-out action, so opting out implies requesting deletion. Rotating the client id is not sufficient here.
That's what comes to mind at the moment. I might come up with more given some time to think.... oh wait. Right. There's some meta questions: Who's doing which parts of this? Is it starting life as a Proposal, is there a PRD we need to satisfy, are we expected to attend meetings, who is our contact for getting these questions answered, …
This is the first piece to get filed as a bug, since it will need to ride the trains. I’m happy point at the right people to get questions answered for now, but this may need more organizational effort than I can provide. Stay tuned.
Hm. Maybe I'd best stop here until those last ones get answered.
Great questions, thanks!
Reporter | ||
Comment 3•6 years ago
|
||
If so, what happens if the user opts back in?
Ideally, opting back in would cancel any pending deletion requests for that id. That is not strictly a requirement, though, so the current behaviour of resetting the id upon opt-out may be fine. Alicia Gray on the Trust team confirms that it is fine to continue to use a new client id when opting back in. This may have data analysis implications, so I’ll file a Data Science bug to investigate.
I filed Bug 1586745 for the Data Science investigation.
Assignee | ||
Comment 4•6 years ago
|
||
Thank you for the many and comprehensive answers. I think I identified the two you had for me in return : )
(In reply to Mark Reid [:mreid] from comment #2)
- Are we responsible for signalling to other systems that they need to delete their data?
"Downstream" consumers of data, such as Amplitude, will also be notified of the deletion request. As we learned in Armagaddon, other systems respond to the same pref, such as Activity Stream and Screenshots - any of these that have a unique, stable identifier will also be covered here. Did you have other systems in mind?
I was thinking client-side systems like Activity Stream that hang off of the same pref as Telemetry but don't use Telemetry. Sounds like that's covered.
- Does the request to delete mean we should infer that the user opts out of Telemetry?
- Including Telemetry that isn't
client_id
based?Can you clarify what you mean here? Do you mean pings that do not include a client_id, such as the “anonymous” ping or the “launcher-process-failure” ping?
Or the "sync" ping. When the user opts out of data collection that is a clear signal that they don't want any of it. But if we were building a separate system to delete data associated with an id, that's a rather different signal of user intent.
Since we're proposing to put both actions inside the same control, it becomes moot. In order to delete client_id-keyed telemetry, you have to opt out. And in order to opt out, you have to delete client_id-keyed telemetry. It does mean that locally-stored telemetry in the archive will need to be deleted in both cases, and clearing the client_id will need to be done in both cases, but better safe than sorry.
Comment 5•6 years ago
|
||
Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?
What is currently included in the data reporting policy code?
Comment 6•6 years ago
|
||
(In reply to Alicia Gray from comment #5)
Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?
What is currently included in the data reporting policy code?
Hi Mark,
Assuming the question is about whether we need to change the first run Policy Notice data collection (where first run creates a telemetry id, but doesn’t send it until the browser is closed for the first time.) I don't think we need a change here. We should still continue to measure opt-out and this is where the delete would occur.
Let me know if I'm missing the question.
Reporter | ||
Comment 7•6 years ago
|
||
(In reply to Alicia Gray from comment #6)
(In reply to Alicia Gray from comment #5)
Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?
What is currently included in the data reporting policy code?
Hi Mark,
Assuming the question is about whether we need to change the first run Policy Notice data collection (where first run creates a telemetry id, but doesn’t send it until the browser is closed for the first time.) I don't think we need a change here. We should still continue to measure opt-out and this is where the delete would occur.Let me know if I'm missing the question.
That makes sense to me, thanks!
Assignee | ||
Comment 8•6 years ago
|
||
Mark, Jan-Erik and I need to have a clarification meeting. We have a few further questions about
- Archive deletion: do we really want to? Have to?
- Does this obsolete the "optout" ping? (and does it have any bearing on the scalar
telemetry.data_upload_optin
?) - How is CCPA's "give me my data" provision being implemented? Is it the Telemetry Archive? (and if so, see above)
- Do we need to provide any debouncing or batching or something to try and prevent someone from toggling it over and over and over again for fun and overwhelming our systems? (If so, maybe we should have that protection already for the "optout" ping)
And other stuff, too. (These are just the points in my notes at the moment)
Assignee | ||
Comment 9•6 years ago
|
||
We had a meeting! Here's some answers:
(In reply to Chris H-C :chutten from comment #8)
Mark, Jan-Erik and I need to have a clarification meeting. We have a few further questions about
- Archive deletion: do we really want to? Have to?
We don't have to, and there's at present no reason to do so.
- Does this obsolete the "optout" ping? (and does it have any bearing on the scalar
telemetry.data_upload_optin
?)
Yes. The scalar can remain.
- How is CCPA's "give me my data" provision being implemented? Is it the Telemetry Archive? (and if so, see above)
It isn't and can't be the Telemetry Archive (because the user could delete the Archive using filesystem operations and we're still tasked to provide them the data when asked). The "give me my data" provision will be handled separately.
- Do we need to provide any debouncing or batching or something to try and prevent someone from toggling it over and over and over again for fun and overwhelming our systems? (If so, maybe we should have that protection already for the "optout" ping)
Ping-volume-wise, no debouncing needed. We can handle the volume. As for the requests, which will be expensive: those are being batched and processed periodically on the pipeline side. (in short: It's handled)
Another thing that came up was that we might want to, in replacing the "optout" ping, consider going back to retaining instead of cycling the client_id
across optout -> optin transitions. The point of cycling the client_id
was to break the link from data we might already have to data we are receiving... and since the data we already have will be deleted by the optout, the same purpose is served.
This will result in less code that could go wrong on the client, so I'm broadly in favour of the proposal.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 10•6 years ago
|
||
Slight wrinkle if we choose to keep the client_id
across optout -> optin transitions: the client_id
isn't the only thing we reset on optout. We also reset the count of subsessions that profile's participated in.
Now, we could keep resetting that count to 0 while retaining the client_id
... but that means any given client_id
can have arbitrarily many pings with the same profileSubsessionCounter
. At present, these {client_id, profileSubsessionCounter}
tuples are expected to be unique (ie, a profile can only ever have 1 "first" subsession). I expect this to cause problems in analysis.
An alternative would be to keep the existing profileSubsessionCounter
intact in the same way that we keep client_id
intact. But then we're persisting information across a deletion request. Which might be legal and compliant (IANAL, so I don't venture to interpret if that's the case)... but even if so I'm not sure we'd want to.
So for now I'm going to retain the existing behaviour of cycling the client_id
across optout -> optin transitions, which means that our friend the canary client id c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0
might be with us a while longer.
Assignee | ||
Comment 11•6 years ago
|
||
needinfos for thoughts on Comment#10, and approval/disapproval for my proposed solution.
Comment 12•6 years ago
|
||
I'm absolutely ok with keeping c0ffee
. IMO that's still a very useful signal for us to have and also for the user (also we can easily dismiss if anyone ever asks for data associated with c0ffee
).
Keeping profileSubsessionCounter
intact seems weird to me, especially as it would allow us to learn something about a client's past when they re-enable telemetry in the future.
Assignee | ||
Comment 13•6 years ago
|
||
Blocked on a patch from bug 1559976 that creates a new test based on the optout marionette test. Gonna wait for that one to land, then I'll convert it to a "deletion-request" ping test.
Assignee | ||
Comment 14•6 years ago
|
||
If we opt out of Telemetry and then back into it, we might end up with
out-of-order writes and deletes to the clientid state file. This would result
in us sending pings with c0ffee canary client ids.
So let's wait for pending save tasks before we process our removal task.
Also, while I'm here, let's add some trace logging to client id operations.
Assignee | ||
Comment 15•6 years ago
|
||
The 'deletion-request' ping, which supercedes the 'optout' ping, notifies the
pipeline when a profile opts out of FHR upload. (IOW, when a user on a specific
profile unchecks the box in about:preferences#privacy about sharing
technical and interaction data with Mozilla).
This ping tries its best to reach the pipeline to let them know that we need
to delete data associated with the provided clientId. This means it will remain
pending on the client even after opt out and it will try to resend if upload is
ever re-enabled.
Depends on D51709
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Backed out 2 changesets (Bug 1585410) for causing failures in test_TelemetryController.js CLOSED TREE
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=275880368&resultStatus=testfailed%2Cbusted%2Cexception&revision=5205263485b69a2a9661a8a3a8572c829053df31
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275871543&repo=autoland&lineNumber=983
Backout: https://hg.mozilla.org/integration/autoland/rev/2ea61656154e395c149f6a025cf53f1b0fd2b878
Reporter | ||
Comment 18•6 years ago
|
||
Sorry for the slow reply, I'm fine with the behaviour you outlined in Comment 10.
Assignee | ||
Comment 19•6 years ago
|
||
Not sure what happened. The test passes locally. I'm rerunning on try here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbc261982e0c9ade4d7539df75a03a86b92af717
Assignee | ||
Comment 20•6 years ago
|
||
Yeah, the tests are passing on try for me. I'm gonna try repushing.
Comment 21•6 years ago
|
||
Comment 22•6 years ago
|
||
Comment 23•6 years ago
|
||
Backed out 2 changesets (bug 1585410) for causing xpcshell failures in test_TelemetryController.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/8c1ac1675109471d8e302229f34998b6d4e73928
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276229835&repo=autoland&lineNumber=8904
Assignee | ||
Comment 24•6 years ago
|
||
Okay, now I can get it to fail on try: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=276227813&revision=4b4a52cb39e352f22d28e2484c29150b61004ff2
Windows-only failure. Not looking forward to debugging this.
Assignee | ||
Comment 25•6 years ago
|
||
It seems as though fs operations are slow enough on Windows to make the test fail in an interestingly new way. By waiting for the deletion-request
ping to finish being submitted before continuing with the test (because it now pauses to wait until the c0ffee
canary client id is persisted to disk), that seems to get it to work properly.
Here's a broad-spectrum xpcshell run I've pushed to see whether that was enough: https://treeherder.mozilla.org/#/jobs?repo=try&author=chutten%40mozilla.com&fromchange=72febdd8375eabda4a362b1ea1f8eed26457607c
Comment 26•6 years ago
|
||
![]() |
||
Comment 27•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/eb9a2831d5f0
https://hg.mozilla.org/mozilla-central/rev/a427f1cbe75c
Updated•6 years ago
|
Description
•