Closed Bug 1585410 Opened 6 years ago Closed 6 years ago

Add a "deletion request" ping

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement
Points:
3

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: mreid, Assigned: chutten)

References

Details

Attachments

(2 files)

We removed the previous "deletion" ping in Bug 1445921, but we now plan to handle deletion of data on the backend, and as such will need to know the client id to perform the deletion.

We may need more than one submission to handle other identifiers that are not correlated with the Telemetry client id.

Hoooooo this is gonna have corner cases aplenty. I have questions.

  • Is this Telemetry-specific, or should we expand the scope (or dupe the bug) to include Glean?
  • How will the client code know that the order to be deleted has been successfully received? Is the usual "we received a 2xx HTTP status code" sufficient? (or, put generally: should this ping have different/stricter send and resend behaviours?)
  • What's the user signal for sending off this ping? Is it a new preference control? Part of about:telemetry?
    • Is it available to only part of our audience? (only on release, only Californians, etc)
    • Are we keying it off of a user unchecking the existing telemetry box in the prefs? If so, what happens if the user opts back in?
  • Will we need code to find all of the client_ids for all of the profiles on this install? On this machine?
  • What should we do with the local Telemetry Archive?
  • Are we responsible for signalling to other systems that they need to delete their data?
  • Are we going to need meta-data on this process, like:
    • Ratio of success/fail attempts
    • UI interaction events/counts for the user signal
    • (Number of profiles with c0ffee client_ids when attempt received, etc.)
    • And if so, how does that work wrt the deletion request.
  • Will we need changes to the first-run Policy Notice data reporting policy code to display more information?
  • Is there a deadline before which this must hit Release?
    • What version does that translate into?
    • How does this work's priority stack up against our other plans?
  • Does the request to delete mean we should infer that the user opts out of Telemetry?
    • Including Telemetry that isn't client_id based?
    • Or does it "just" mean we rotate the client_id and start afresh?

That's what comes to mind at the moment. I might come up with more given some time to think.... oh wait. Right. There's some meta questions: Who's doing which parts of this? Is it starting life as a Proposal, is there a PRD we need to satisfy, are we expected to attend meetings, who is our contact for getting these questions answered, ...

Hm. Maybe I'd best stop here until those last ones get answered.

Flags: needinfo?(mreid)

(In reply to Chris H-C :chutten from comment #1)

  • Is this Telemetry-specific, or should we expand the scope (or dupe the bug) to include Glean?

Let's create a separate bug to handle Glean.

  • How will the client code know that the order to be deleted has been successfully received? Is the usual "we received a 2xx HTTP status code" sufficient? (or, put generally: should this ping have different/stricter send and resend behaviours?)

I am inclined to use the normal 2xx. I am open to other suggestions here though. In particular, we may want to include some means of tracing the request to ensure it has been processed.

  • What's the user signal for sending off this ping? Is it a new preference control? Part of about:telemetry?

The signal is the "opt out" signal, and should use the same preference that sends the "optout" ping today.

* Is it available to only part of our audience? (only on release, only Californians, etc)

It is available to all Firefox users.

* Are we keying it off of a user unchecking the existing telemetry box in the prefs?

Yes.

If so, what happens if the user opts back in?

Ideally, opting back in would cancel any pending deletion requests for that id. That is not strictly a requirement, though, so the current behaviour of resetting the id upon opt-out may be fine. Alicia Gray on the Trust team confirms that it is fine to continue to use a new client id when opting back in. This may have data analysis implications, so I’ll file a Data Science bug to investigate.

  • Will we need code to find all of the client_ids for all of the profiles on this install? On this machine?

Opting in or out of data collection is done at the profile level, so I think it makes sense for the deletion request to match.

  • What should we do with the local Telemetry Archive?

Locally stored telemetry data should also be deleted.

  • Are we responsible for signalling to other systems that they need to delete their data?

"Downstream" consumers of data, such as Amplitude, will also be notified of the deletion request. As we learned in Armagaddon, other systems respond to the same pref, such as Activity Stream and Screenshots - any of these that have a unique, stable identifier will also be covered here. Did you have other systems in mind?

  • Are we going to need meta-data on this process, like:
    • Ratio of success/fail attempts
    • UI interaction events/counts for the user signal
    • (Number of profiles with c0ffee client_ids when attempt received, etc.)
    • And if so, how does that work wrt the deletion request.

Excellent questions - again I will defer to Data Science to help define requirements here - Emily, do you have thoughts on what signals we might need around opt-out and data deletion?

  • Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

I don’t think so, but ni? Alicia Gray for thoughts here.

  • Is there a deadline before which this must hit Release?
    • What version does that translate into?
    • How does this work's priority stack up against our other plans?

Ideally this would land in Firefox by the end of 2019, so we should target 71 or 72. Is this possible? From talking with :kparlante, this is a very high priority. We’ll review the full set of Q4 priorities ASAP and I’ll update here if there’s a change.

  • Does the request to delete mean we should infer that the user opts out of Telemetry?
    • Including Telemetry that isn't client_id based?

Can you clarify what you mean here? Do you mean pings that do not include a client_id, such as the “anonymous” ping or the “launcher-process-failure” ping?

* Or does it "just" mean we rotate the `client_id` and start afresh?

The deletion request is tied to the opt-out action, so opting out implies requesting deletion. Rotating the client id is not sufficient here.

That's what comes to mind at the moment. I might come up with more given some time to think.... oh wait. Right. There's some meta questions: Who's doing which parts of this? Is it starting life as a Proposal, is there a PRD we need to satisfy, are we expected to attend meetings, who is our contact for getting these questions answered, …

This is the first piece to get filed as a bug, since it will need to ride the trains. I’m happy point at the right people to get questions answered for now, but this may need more organizational effort than I can provide. Stay tuned.

Hm. Maybe I'd best stop here until those last ones get answered.

Great questions, thanks!

Flags: needinfo?(mreid)
Flags: needinfo?(ethompson)
Flags: needinfo?(chutten)
Flags: needinfo?(agray)

If so, what happens if the user opts back in?

Ideally, opting back in would cancel any pending deletion requests for that id. That is not strictly a requirement, though, so the current behaviour of resetting the id upon opt-out may be fine. Alicia Gray on the Trust team confirms that it is fine to continue to use a new client id when opting back in. This may have data analysis implications, so I’ll file a Data Science bug to investigate.

I filed Bug 1586745 for the Data Science investigation.

See Also: → 1586745

Thank you for the many and comprehensive answers. I think I identified the two you had for me in return : )

(In reply to Mark Reid [:mreid] from comment #2)

  • Are we responsible for signalling to other systems that they need to delete their data?

"Downstream" consumers of data, such as Amplitude, will also be notified of the deletion request. As we learned in Armagaddon, other systems respond to the same pref, such as Activity Stream and Screenshots - any of these that have a unique, stable identifier will also be covered here. Did you have other systems in mind?

I was thinking client-side systems like Activity Stream that hang off of the same pref as Telemetry but don't use Telemetry. Sounds like that's covered.

  • Does the request to delete mean we should infer that the user opts out of Telemetry?
    • Including Telemetry that isn't client_id based?

Can you clarify what you mean here? Do you mean pings that do not include a client_id, such as the “anonymous” ping or the “launcher-process-failure” ping?

Or the "sync" ping. When the user opts out of data collection that is a clear signal that they don't want any of it. But if we were building a separate system to delete data associated with an id, that's a rather different signal of user intent.

Since we're proposing to put both actions inside the same control, it becomes moot. In order to delete client_id-keyed telemetry, you have to opt out. And in order to opt out, you have to delete client_id-keyed telemetry. It does mean that locally-stored telemetry in the archive will need to be deleted in both cases, and clearing the client_id will need to be done in both cases, but better safe than sorry.

Flags: needinfo?(chutten)

Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

What is currently included in the data reporting policy code?

Flags: needinfo?(agray) → needinfo?(mreid)
See Also: → 1587095

(In reply to Alicia Gray from comment #5)

Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

What is currently included in the data reporting policy code?

Hi Mark,
Assuming the question is about whether we need to change the first run Policy Notice data collection (where first run creates a telemetry id, but doesn’t send it until the browser is closed for the first time.) I don't think we need a change here. We should still continue to measure opt-out and this is where the delete would occur.

Let me know if I'm missing the question.

(In reply to Alicia Gray from comment #6)

(In reply to Alicia Gray from comment #5)

Re question in comment 2: Will we need changes to the first-run Policy Notice data reporting policy code to display more information?

What is currently included in the data reporting policy code?

Hi Mark,
Assuming the question is about whether we need to change the first run Policy Notice data collection (where first run creates a telemetry id, but doesn’t send it until the browser is closed for the first time.) I don't think we need a change here. We should still continue to measure opt-out and this is where the delete would occur.

Let me know if I'm missing the question.

That makes sense to me, thanks!

Flags: needinfo?(mreid)

Mark, Jan-Erik and I need to have a clarification meeting. We have a few further questions about

  • Archive deletion: do we really want to? Have to?
  • Does this obsolete the "optout" ping? (and does it have any bearing on the scalar telemetry.data_upload_optin?)
  • How is CCPA's "give me my data" provision being implemented? Is it the Telemetry Archive? (and if so, see above)
  • Do we need to provide any debouncing or batching or something to try and prevent someone from toggling it over and over and over again for fun and overwhelming our systems? (If so, maybe we should have that protection already for the "optout" ping)

And other stuff, too. (These are just the points in my notes at the moment)

Points: --- → 3
Priority: -- → P2

We had a meeting! Here's some answers:

(In reply to Chris H-C :chutten from comment #8)

Mark, Jan-Erik and I need to have a clarification meeting. We have a few further questions about

  • Archive deletion: do we really want to? Have to?

We don't have to, and there's at present no reason to do so.

  • Does this obsolete the "optout" ping? (and does it have any bearing on the scalar telemetry.data_upload_optin?)

Yes. The scalar can remain.

  • How is CCPA's "give me my data" provision being implemented? Is it the Telemetry Archive? (and if so, see above)

It isn't and can't be the Telemetry Archive (because the user could delete the Archive using filesystem operations and we're still tasked to provide them the data when asked). The "give me my data" provision will be handled separately.

  • Do we need to provide any debouncing or batching or something to try and prevent someone from toggling it over and over and over again for fun and overwhelming our systems? (If so, maybe we should have that protection already for the "optout" ping)

Ping-volume-wise, no debouncing needed. We can handle the volume. As for the requests, which will be expensive: those are being batched and processed periodically on the pipeline side. (in short: It's handled)


Another thing that came up was that we might want to, in replacing the "optout" ping, consider going back to retaining instead of cycling the client_id across optout -> optin transitions. The point of cycling the client_id was to break the link from data we might already have to data we are receiving... and since the data we already have will be deleted by the optout, the same purpose is served.

This will result in less code that could go wrong on the client, so I'm broadly in favour of the proposal.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: P2 → P1

Slight wrinkle if we choose to keep the client_id across optout -> optin transitions: the client_id isn't the only thing we reset on optout. We also reset the count of subsessions that profile's participated in.

Now, we could keep resetting that count to 0 while retaining the client_id... but that means any given client_id can have arbitrarily many pings with the same profileSubsessionCounter. At present, these {client_id, profileSubsessionCounter} tuples are expected to be unique (ie, a profile can only ever have 1 "first" subsession). I expect this to cause problems in analysis.

An alternative would be to keep the existing profileSubsessionCounter intact in the same way that we keep client_id intact. But then we're persisting information across a deletion request. Which might be legal and compliant (IANAL, so I don't venture to interpret if that's the case)... but even if so I'm not sure we'd want to.

So for now I'm going to retain the existing behaviour of cycling the client_id across optout -> optin transitions, which means that our friend the canary client id c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0 might be with us a while longer.

needinfos for thoughts on Comment#10, and approval/disapproval for my proposed solution.

Flags: needinfo?(mreid)
Flags: needinfo?(jrediger)

I'm absolutely ok with keeping c0ffee. IMO that's still a very useful signal for us to have and also for the user (also we can easily dismiss if anyone ever asks for data associated with c0ffee).

Keeping profileSubsessionCounter intact seems weird to me, especially as it would allow us to learn something about a client's past when they re-enable telemetry in the future.

Flags: needinfo?(jrediger)

Blocked on a patch from bug 1559976 that creates a new test based on the optout marionette test. Gonna wait for that one to land, then I'll convert it to a "deletion-request" ping test.

See Also: → 1559976

If we opt out of Telemetry and then back into it, we might end up with
out-of-order writes and deletes to the clientid state file. This would result
in us sending pings with c0ffee canary client ids.

So let's wait for pending save tasks before we process our removal task.

Also, while I'm here, let's add some trace logging to client id operations.

The 'deletion-request' ping, which supercedes the 'optout' ping, notifies the
pipeline when a profile opts out of FHR upload. (IOW, when a user on a specific
profile unchecks the box in about:preferences#privacy about sharing
technical and interaction data with Mozilla).

This ping tries its best to reach the pipeline to let them know that we need
to delete data associated with the provided clientId. This means it will remain
pending on the client even after opt out and it will try to resend if upload is
ever re-enabled.

Depends on D51709

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2f9292f4fbd1 Fix race in ClientID.jsm r=janerik https://hg.mozilla.org/integration/autoland/rev/5205263485b6 Implement and document 'deletion-request' ping r=janerik,Dexter

Sorry for the slow reply, I'm fine with the behaviour you outlined in Comment 10.

Flags: needinfo?(mreid)

Not sure what happened. The test passes locally. I'm rerunning on try here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbc261982e0c9ade4d7539df75a03a86b92af717

Flags: needinfo?(chutten)

Yeah, the tests are passing on try for me. I'm gonna try repushing.

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/474430e7ea37 Fix race in ClientID.jsm r=janerik https://hg.mozilla.org/integration/autoland/rev/bbe85df3365a Implement and document 'deletion-request' ping r=janerik,Dexter
Backout by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8c1ac1675109 Backed out 2 changesets for causing xpcshell failures in test_TelemetryController.js CLOSED TREE

Okay, now I can get it to fail on try: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=276227813&revision=4b4a52cb39e352f22d28e2484c29150b61004ff2

Windows-only failure. Not looking forward to debugging this.

Flags: needinfo?(chutten)

It seems as though fs operations are slow enough on Windows to make the test fail in an interestingly new way. By waiting for the deletion-request ping to finish being submitted before continuing with the test (because it now pauses to wait until the c0ffee canary client id is persisted to disk), that seems to get it to work properly.

Here's a broad-spectrum xpcshell run I've pushed to see whether that was enough: https://treeherder.mozilla.org/#/jobs?repo=try&author=chutten%40mozilla.com&fromchange=72febdd8375eabda4a362b1ea1f8eed26457607c

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eb9a2831d5f0 Fix race in ClientID.jsm r=janerik https://hg.mozilla.org/integration/autoland/rev/a427f1cbe75c Implement and document 'deletion-request' ping r=janerik,Dexter
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Flags: needinfo?(ethompson)
Blocks: 1598720
Blocks: 1599145
Blocks: 1600718
Blocks: 1602064
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: