1585410 - Add a "deletion request" ping

Reporter

Description

•

6 years ago

We removed the previous "deletion" ping in Bug 1445921, but we now plan to handle deletion of data on the backend, and as such will need to know the client id to perform the deletion.

We may need more than one submission to handle other identifiers that are not correlated with the Telemetry client id.

Chris H-C :chutten

Assignee

Comment 1

•

6 years ago

Hoooooo this is gonna have corner cases aplenty. I have questions.

Is this Telemetry-specific, or should we expand the scope (or dupe the bug) to include Glean?
How will the client code know that the order to be deleted has been successfully received? Is the usual "we received a 2xx HTTP status code" sufficient? (or, put generally: should this ping have different/stricter send and resend behaviours?)
What's the user signal for sending off this ping? Is it a new preference control? Part of about:telemetry?
- Is it available to only part of our audience? (only on release, only Californians, etc)
- Are we keying it off of a user unchecking the existing telemetry box in the prefs? If so, what happens if the user opts back in?
Will we need code to find all of the client_ids for all of the profiles on this install? On this machine?
What should we do with the local Telemetry Archive?
Are we responsible for signalling to other systems that they need to delete their data?
Are we going to need meta-data on this process, like:
- Ratio of success/fail attempts
- UI interaction events/counts for the user signal
- (Number of profiles with c0ffee client_ids when attempt received, etc.)
- And if so, how does that work wrt the deletion request.
Will we need changes to the first-run Policy Notice data reporting policy code to display more information?
Is there a deadline before which this must hit Release?
- What version does that translate into?
- How does this work's priority stack up against our other plans?
Does the request to delete mean we should infer that the user opts out of Telemetry?
- Including Telemetry that isn't client_id based?
- Or does it "just" mean we rotate the client_id and start afresh?

That's what comes to mind at the moment. I might come up with more given some time to think.... oh wait. Right. There's some meta questions: Who's doing which parts of this? Is it starting life as a Proposal, is there a PRD we need to satisfy, are we expected to attend meetings, who is our contact for getting these questions answered, ...

Hm. Maybe I'd best stop here until those last ones get answered.

Flags: needinfo?(mreid)

Mark Reid [:mreid]

Reporter

Comment 2

•

6 years ago

(In reply to Chris H-C :chutten from comment #1)

Is this Telemetry-specific, or should we expand the scope (or dupe the bug) to include Glean?

Let's create a separate bug to handle Glean.

How will the client code know that the order to be deleted has been successfully received? Is the usual "we received a 2xx HTTP status code" sufficient? (or, put generally: should this ping have different/stricter send and resend behaviours?)

I am inclined to use the normal 2xx. I am open to other suggestions here though. In particular, we may want to include some means of tracing the request to ensure it has been processed.

What's the user signal for sending off this ping? Is it a new preference control? Part of about:telemetry?

The signal is the "opt out" signal, and should use the same preference that sends the "optout" ping today.

* Is it available to only part of our audience? (only on release, only Californians, etc)

It is available to all Firefox users.

* Are we keying it off of a user unchecking the existing telemetry box in the prefs?

Yes.

If so, what happens if the user opts back in?

Ideally, opting back in would cancel any pending deletion requests for that id. That is not strictly a requirement, though, so the current behaviour of resetting the id upon opt-out may be fine. Alicia Gray on the Trust team confirms that it is fine to continue to use a new client id when opting back in. This may have data analysis implications, so I’ll file a Data Science bug to investigate.

Will we need code to find all of the client_ids for all of the profiles on this install? On this machine?

Opting in or out of data collection is done at the profile level, so I think it makes sense for the deletion request to match.

What should we do with the local Telemetry Archive?

Locally stored telemetry data should also be deleted.

Are we responsible for signalling to other systems that they need to delete their data?

"Downstream" consumers of data, such as Amplitude, will also be notified of the deletion request. As we learned in Armagaddon, other systems respond to the same pref, such as Activity Stream and Screenshots - any of these that have a unique, stable identifier will also be covered here. Did you have other systems in mind?

Are we going to need meta-data on this process, like:

Ratio of success/fail attempts

UI interaction events/counts for the user signal

(Number of profiles with c0ffee client_ids when attempt received, etc.)

And if so, how does that work wrt the deletion request.

Excellent questions - again I will defer to Data Science to help define requirements here - Emily, do you have thoughts on what signals we might need around opt-out and data deletion?

Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

I don’t think so, but ni? Alicia Gray for thoughts here.

Is there a deadline before which this must hit Release?

What version does that translate into?

How does this work's priority stack up against our other plans?

Ideally this would land in Firefox by the end of 2019, so we should target 71 or 72. Is this possible? From talking with :kparlante, this is a very high priority. We’ll review the full set of Q4 priorities ASAP and I’ll update here if there’s a change.

Does the request to delete mean we should infer that the user opts out of Telemetry?

Including Telemetry that isn't client_id based?

Can you clarify what you mean here? Do you mean pings that do not include a client_id, such as the “anonymous” ping or the “launcher-process-failure” ping?

* Or does it "just" mean we rotate the `client_id` and start afresh?

The deletion request is tied to the opt-out action, so opting out implies requesting deletion. Rotating the client id is not sufficient here.

That's what comes to mind at the moment. I might come up with more given some time to think.... oh wait. Right. There's some meta questions: Who's doing which parts of this? Is it starting life as a Proposal, is there a PRD we need to satisfy, are we expected to attend meetings, who is our contact for getting these questions answered, …

This is the first piece to get filed as a bug, since it will need to ride the trains. I’m happy point at the right people to get questions answered for now, but this may need more organizational effort than I can provide. Stay tuned.

Hm. Maybe I'd best stop here until those last ones get answered.

Great questions, thanks!

Flags: needinfo?(mreid)

Flags: needinfo?(ethompson)

Flags: needinfo?(chutten)

Flags: needinfo?(agray)

Mark Reid [:mreid]

Reporter

Comment 3

•

6 years ago

If so, what happens if the user opts back in?

Ideally, opting back in would cancel any pending deletion requests for that id. That is not strictly a requirement, though, so the current behaviour of resetting the id upon opt-out may be fine. Alicia Gray on the Trust team confirms that it is fine to continue to use a new client id when opting back in. This may have data analysis implications, so I’ll file a Data Science bug to investigate.

I filed Bug 1586745 for the Data Science investigation.

Comment 4

•

6 years ago

Thank you for the many and comprehensive answers. I think I identified the two you had for me in return : )

(In reply to Mark Reid [:mreid] from comment #2)

Are we responsible for signalling to other systems that they need to delete their data?

"Downstream" consumers of data, such as Amplitude, will also be notified of the deletion request. As we learned in Armagaddon, other systems respond to the same pref, such as Activity Stream and Screenshots - any of these that have a unique, stable identifier will also be covered here. Did you have other systems in mind?

I was thinking client-side systems like Activity Stream that hang off of the same pref as Telemetry but don't use Telemetry. Sounds like that's covered.

Does the request to delete mean we should infer that the user opts out of Telemetry?

Including Telemetry that isn't client_id based?

Can you clarify what you mean here? Do you mean pings that do not include a client_id, such as the “anonymous” ping or the “launcher-process-failure” ping?

Or the "sync" ping. When the user opts out of data collection that is a clear signal that they don't want any of it. But if we were building a separate system to delete data associated with an id, that's a rather different signal of user intent.

Since we're proposing to put both actions inside the same control, it becomes moot. In order to delete client_id-keyed telemetry, you have to opt out. And in order to opt out, you have to delete client_id-keyed telemetry. It does mean that locally-stored telemetry in the archive will need to be deleted in both cases, and clearing the client_id will need to be done in both cases, but better safe than sorry.

Flags: needinfo?(chutten)

Alicia Gray

Comment 5

•

6 years ago

Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

What is currently included in the data reporting policy code?

Flags: needinfo?(agray) → needinfo?(mreid)

Alessio Placitelli [:Dexter]

Updated

•

6 years ago

Comment 6

•

6 years ago

(In reply to Alicia Gray from comment #5)

Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

What is currently included in the data reporting policy code?

Hi Mark,
Assuming the question is about whether we need to change the first run Policy Notice data collection (where first run creates a telemetry id, but doesn’t send it until the browser is closed for the first time.) I don't think we need a change here. We should still continue to measure opt-out and this is where the delete would occur.

Let me know if I'm missing the question.

Mark Reid [:mreid]

Reporter

Comment 7

•

6 years ago

(In reply to Alicia Gray from comment #6)

(In reply to Alicia Gray from comment #5)

Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

What is currently included in the data reporting policy code?

Hi Mark,
Assuming the question is about whether we need to change the first run Policy Notice data collection (where first run creates a telemetry id, but doesn’t send it until the browser is closed for the first time.) I don't think we need a change here. We should still continue to measure opt-out and this is where the delete would occur.

Let me know if I'm missing the question.

That makes sense to me, thanks!

Flags: needinfo?(mreid)

Chris H-C :chutten

Assignee

Comment 8

•

6 years ago

Mark, Jan-Erik and I need to have a clarification meeting. We have a few further questions about

Archive deletion: do we really want to? Have to?
Does this obsolete the "optout" ping? (and does it have any bearing on the scalar telemetry.data_upload_optin?)
How is CCPA's "give me my data" provision being implemented? Is it the Telemetry Archive? (and if so, see above)
Do we need to provide any debouncing or batching or something to try and prevent someone from toggling it over and over and over again for fun and overwhelming our systems? (If so, maybe we should have that protection already for the "optout" ping)

And other stuff, too. (These are just the points in my notes at the moment)

Points: --- → 3

Priority: -- → P2

Chris H-C :chutten

Assignee

Comment 9

•

6 years ago

We had a meeting! Here's some answers:

(In reply to Chris H-C :chutten from comment #8)

Mark, Jan-Erik and I need to have a clarification meeting. We have a few further questions about

Archive deletion: do we really want to? Have to?

We don't have to, and there's at present no reason to do so.

Does this obsolete the "optout" ping? (and does it have any bearing on the scalar telemetry.data_upload_optin?)

Yes. The scalar can remain.

How is CCPA's "give me my data" provision being implemented? Is it the Telemetry Archive? (and if so, see above)

It isn't and can't be the Telemetry Archive (because the user could delete the Archive using filesystem operations and we're still tasked to provide them the data when asked). The "give me my data" provision will be handled separately.

Do we need to provide any debouncing or batching or something to try and prevent someone from toggling it over and over and over again for fun and overwhelming our systems? (If so, maybe we should have that protection already for the "optout" ping)

Ping-volume-wise, no debouncing needed. We can handle the volume. As for the requests, which will be expensive: those are being batched and processed periodically on the pipeline side. (in short: It's handled)

Another thing that came up was that we might want to, in replacing the "optout" ping, consider going back to retaining instead of cycling the client_id across optout -> optin transitions. The point of cycling the client_id was to break the link from data we might already have to data we are receiving... and since the data we already have will be deleted by the optout, the same purpose is served.

This will result in less code that could go wrong on the client, so I'm broadly in favour of the proposal.

Chris H-C :chutten

Assignee

Updated

•

6 years ago

Assignee: nobody → chutten

Status: NEW → ASSIGNED

Priority: P2 → P1

Chris H-C :chutten

Assignee

Comment 10

•

6 years ago

Slight wrinkle if we choose to keep the client_id across optout -> optin transitions: the client_id isn't the only thing we reset on optout. We also reset the count of subsessions that profile's participated in.

Now, we could keep resetting that count to 0 while retaining the client_id... but that means any given client_id can have arbitrarily many pings with the same profileSubsessionCounter. At present, these {client_id, profileSubsessionCounter} tuples are expected to be unique (ie, a profile can only ever have 1 "first" subsession). I expect this to cause problems in analysis.

An alternative would be to keep the existing profileSubsessionCounter intact in the same way that we keep client_id intact. But then we're persisting information across a deletion request. Which might be legal and compliant (IANAL, so I don't venture to interpret if that's the case)... but even if so I'm not sure we'd want to.

So for now I'm going to retain the existing behaviour of cycling the client_id across optout -> optin transitions, which means that our friend the canary client id c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0 might be with us a while longer.

Chris H-C :chutten

Assignee

Comment 11

•

6 years ago

needinfos for thoughts on Comment#10, and approval/disapproval for my proposed solution.

Flags: needinfo?(mreid)

Flags: needinfo?(jrediger)

Jan-Erik Rediger [:janerik]

Comment 12

•

6 years ago

I'm absolutely ok with keeping c0ffee. IMO that's still a very useful signal for us to have and also for the user (also we can easily dismiss if anyone ever asks for data associated with c0ffee).

Keeping profileSubsessionCounter intact seems weird to me, especially as it would allow us to learn something about a client's past when they re-enable telemetry in the future.

Flags: needinfo?(jrediger)

Chris H-C :chutten

Assignee

Comment 13

•

6 years ago

Blocked on a patch from bug 1559976 that creates a new test based on the optout marionette test. Gonna wait for that one to land, then I'll convert it to a "deletion-request" ping test.

Comment 14

•

6 years ago

Attached file Bug 1585410 - Fix race in ClientID.jsm r?janerik — Details

If we opt out of Telemetry and then back into it, we might end up with
out-of-order writes and deletes to the clientid state file. This would result
in us sending pings with c0ffee canary client ids.

So let's wait for pending save tasks before we process our removal task.

Also, while I'm here, let's add some trace logging to client id operations.

Chris H-C :chutten

Assignee

Comment 15

•

6 years ago

Attached file Bug 1585410 - Implement and document 'deletion-request' ping r?janerik — Details

The 'deletion-request' ping, which supercedes the 'optout' ping, notifies the
pipeline when a profile opts out of FHR upload. (IOW, when a user on a specific
profile unchecks the box in about:preferences#privacy about sharing
technical and interaction data with Mozilla).

This ping tries its best to reach the pipeline to let them know that we need
to delete data associated with the provided clientId. This means it will remain
pending on the client even after opt out and it will try to resend if upload is
ever re-enabled.

Depends on D51709

Pulsebot

Comment 16

•

6 years ago

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2f9292f4fbd1 Fix race in ClientID.jsm r=janerik https://hg.mozilla.org/integration/autoland/rev/5205263485b6 Implement and document 'deletion-request' ping r=janerik,Dexter

Noemi Erli[:noemi_erli]

Comment 17

•

6 years ago

Backed out 2 changesets (Bug 1585410) for causing failures in test_TelemetryController.js CLOSED TREE

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=275880368&resultStatus=testfailed%2Cbusted%2Cexception&revision=5205263485b69a2a9661a8a3a8572c829053df31

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275871543&repo=autoland&lineNumber=983

Backout: https://hg.mozilla.org/integration/autoland/rev/2ea61656154e395c149f6a025cf53f1b0fd2b878

Flags: needinfo?(chutten)

Mark Reid [:mreid]

Reporter

Comment 18

•

6 years ago

Sorry for the slow reply, I'm fine with the behaviour you outlined in Comment 10.

Flags: needinfo?(mreid)

Chris H-C :chutten

Assignee

Comment 19

•

6 years ago

Not sure what happened. The test passes locally. I'm rerunning on try here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbc261982e0c9ade4d7539df75a03a86b92af717

Flags: needinfo?(chutten)

Chris H-C :chutten

Assignee

Comment 20

•

6 years ago

Yeah, the tests are passing on try for me. I'm gonna try repushing.

Pulsebot

Comment 21

•

6 years ago

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/474430e7ea37 Fix race in ClientID.jsm r=janerik https://hg.mozilla.org/integration/autoland/rev/bbe85df3365a Implement and document 'deletion-request' ping r=janerik,Dexter

Pulsebot

Comment 22

•

6 years ago

Backout by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8c1ac1675109 Backed out 2 changesets for causing xpcshell failures in test_TelemetryController.js CLOSED TREE

Alexandru Michis [:malexandru]

Comment 23

•

6 years ago

Backed out 2 changesets (bug 1585410) for causing xpcshell failures in test_TelemetryController.js

Backout link: https://hg.mozilla.org/integration/autoland/rev/8c1ac1675109471d8e302229f34998b6d4e73928

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=%28x&tochange=8c1ac1675109471d8e302229f34998b6d4e73928&fromchange=bbe85df3365a23b6400c9f0b453f316d950b64c4&group_state=expanded

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276229835&repo=autoland&lineNumber=8904

Flags: needinfo?(chutten)

Chris H-C :chutten

Assignee

Comment 24

•

6 years ago

Okay, now I can get it to fail on try: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=276227813&revision=4b4a52cb39e352f22d28e2484c29150b61004ff2

Windows-only failure. Not looking forward to debugging this.

Flags: needinfo?(chutten)

Chris H-C :chutten

Assignee

Comment 25

•

6 years ago

It seems as though fs operations are slow enough on Windows to make the test fail in an interestingly new way. By waiting for the deletion-request ping to finish being submitted before continuing with the test (because it now pauses to wait until the c0ffee canary client id is persisted to disk), that seems to get it to work properly.

Here's a broad-spectrum xpcshell run I've pushed to see whether that was enough: https://treeherder.mozilla.org/#/jobs?repo=try&author=chutten%40mozilla.com&fromchange=72febdd8375eabda4a362b1ea1f8eed26457607c

Pulsebot

Comment 26

•

6 years ago

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eb9a2831d5f0 Fix race in ClientID.jsm r=janerik https://hg.mozilla.org/integration/autoland/rev/a427f1cbe75c Implement and document 'deletion-request' ping r=janerik,Dexter

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 27

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/eb9a2831d5f0
https://hg.mozilla.org/mozilla-central/rev/a427f1cbe75c

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox72: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla72

Emily Thompson

Updated

•

6 years ago

Flags: needinfo?(ethompson)

:shell escalante

Updated

•

6 years ago

Blocks: 1598720

Chris H-C :chutten

Assignee

Updated

•

6 years ago

Blocks: 1599145

Chris H-C :chutten

Assignee

Updated

•

6 years ago

Blocks: 1600718

Nan Jiang [:nanj]

Updated

•

6 years ago

Blocks: 1602064

Bug 1585410 - Fix race in ClientID.jsm r?janerik 6 years ago Chris H-C :chutten 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1585410 - Implement and document 'deletion-request' ping r?janerik 6 years ago Chris H-C :chutten 47 bytes, text/x-phabricator-request		Details \| Review