Closed Bug 1602064 Opened 6 years ago Closed 6 years ago

Properly handle delete-pings for the impression_id in the telemetry subsystem

Categories

(Firefox :: Messaging System, enhancement, P1)

enhancement

Tracking

()

VERIFIED FIXED
Firefox 73
Iteration:
73.2 - Dec 16 - Jan 5
Tracking Status
firefox73 --- verified

People

(Reporter: nanj, Assigned: nanj)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

This is a placeholder to track all the necessary work on the client side in order to handle deletion requests based on impression_id (covers ActivityStream and Messaging System)

Blocks: 1598720

Note there's a slight wrinkle that, by the time you receive any notification that the optout has happened, ping upload is already disabled. So if we were hoping to send a ping with the necessary ids in response to the optout, that won't work without some painful special-casing.

This limits the mechanisms by which we might be able to communicate this to the pipeline. We could try piggy-backing on the "deletion-request" ping if it's okay sending your ids alongside the Telemetry client id (we could store your ids inside Telemetry on a "deletion-request" store which the "deletion-request" ping would snapshot as it is assembled. Or maybe we call out to an API and get the ids that way).

Either way we'll probably want to involve Data Platform to ensure that the approach we take will result in a data record that is useful for the actual deletion in the server.

impression_id is persisited in the pref browser.newtabpage.activity-stream.impressionId, so it's easy to fetch.

My main concern around piggy-backing it on the "deletion-request" is that impression_id and client_id have never shown up together in the same ping before, as the former is used in some Cat 3 telemetry (e.g. Pocket), doing so could uncover this connection. Since this request acts as the user signal to optout, and all their previous telemetry will be deleted soon, can we allow it as a special case?

If not, can we then issue two separate "deletion-requests"? Provided that this request supports to be sent without including the client_id.

I'd rather send two separate requests - could we consider adding a "identifier type" and "identifier value" to the deletion request ping to distinguish impression id from client id?

Assignee: nobody → najiang
Iteration: --- → 73.1 - Dec 2 - Dec 15
Priority: -- → P1
Target Milestone: --- → Firefox 73

The client id is sent in the common ping info area of the ping document. I imagine the impression_id would be in the payload, so it wouldn't be a problem there.

Unfortunately, everything else is a problem. The schema for deletion-request pings require a non-null clientId. Also, the exemption that allows for it to be sent might only work for a single ping (I'm not sure, I haven't tested it). Also also, if the point of sending it in multiple pings would be to make it so we couldn't ever (not that we would want to) connect the client id to the impression_id, then we have the matter of fingerprinting to consider wrt the timing.

Since we're deleting these things anyway, I propose we include any ids (regardless of our previous efforts to keep them separate) in a single ping. I think we'll need to bring in :agray to get Trust's approval on this idea, but only if you (Mark and Nan) think this would be a good idea.

Flags: needinfo?(najiang)
Flags: needinfo?(mreid)

I am fine with sending those ids in a single ping if that also works for Mark.

Hi Alicia - would it be OK for us to include the client_id and impression_id in the same ping for this particular case (to delete all the telemetry for this user)?

For the context, the impression_id is used for Pocket and CFR (Contextual Feature Recommendation) related telemetry whenever it's inappropriate to use the regular Firefox telemetry id (i.e. client_id). In general, we always keep the telemetry pings that contain impression_id isolated from other pings with client_id. However, to respond to the user's request for telemetry opt-out, we'd need to include both ids in order to wipe out all the telemetry for that user. Therefore, client_id and impression_id will be bundled together in a single ping in this case. From your perspective, would it be OK to do so?

Flags: needinfo?(najiang) → needinfo?(agray)
Depends on: 1585410

(In reply to Nan Jiang [:nanj] from comment #5)

I am fine with sending those ids in a single ping if that also works for Mark.

Hi Alicia - would it be OK for us to include the client_id and impression_id in the same ping for this particular case (to delete all the telemetry for this user)?

For the context, the impression_id is used for Pocket and CFR (Contextual Feature Recommendation) related telemetry whenever it's inappropriate to use the regular Firefox telemetry id (i.e. client_id). In general, we always keep the telemetry pings that contain impression_id isolated from other pings with client_id. However, to respond to the user's request for telemetry opt-out, we'd need to include both ids in order to wipe out all the telemetry for that user. Therefore, client_id and impression_id will be bundled together in a single ping in this case. From your perspective, would it be OK to do so?

HI Nan,

Trust is ok with bundling the 2 id's in a single ping for this particular purpose. Please let me know if I can help further.

Flags: needinfo?(agray)

Alicia - Thanks for the confirmation!.

chutten, if we bundle impression_id into the deletion-request ping, do we need to update the ping schema for this change? Looks like the current version doesn't support additional properties.

Mark, as you mentioned earlier that there might be some other similar ids to handle such as for FxA, would you be able to confirm this send-all-ids-in-one-ping proposal?

I was wondering about adding the following structure to the current deletion-reuqest ping,

{
  ...,
  "other_app_identifiers": [
      {
          "type": "impression_id",
          "value": "{IMPRESSION_UUID}"
      },
      {
          "type": "FxA",
          "value": "{FxA_UUID}"
      },
      ...
  ],
  ...
}

Thoughts?

Flags: needinfo?(chutten)

We do indeed need to update the schema. ...and I'm not sure if this change will necessitate a change of ping name or not. (Mark?)

Instead of a custom "other_app_identifiers" needing sync API calls on user optout, we could instead store the ids in Scalars with the store "deletion-request". Then, when it's ping time, Telemetry code would snapshot all Scalars in the "deletion-request" store and add them to the "deletion-request" ping's payload. Maybe that way we could reuse some pipeline code and prepare ourselves for additional ids we might find ourselves wanting to include (presuming they pass review).

Mark, do you have opinions?

Flags: needinfo?(chutten)
Summary: Properly handle CCPA in the telemetry subsystem → Properly handle delete-pings for the impression_id in the telemetry subsystem

I'm OK with including these IDs together if that simplifies things on the implementation side.

We don't need to change the ping name as far as I'm concerned. The change to add scalars to the payload field should be compatible on the pipeline side. It will require some work in the schema generator to turn scalars into fields for the 'deletion-request' ping.

Using scalars sounds fine to me. I have two opinions on the matter:

  • The set of IDs included should be explicitly encoded somewhere (in this case Scalars.yaml, so I'm happy), and we should be able to add more if need be (again, happy).
  • We should do something similar to what we'll do in Glean to keep things as consistent as possible.

Also, ni? :relud for further thoughts / visibility.

Flags: needinfo?(mreid) → needinfo?(dthorn)
Iteration: 73.1 - Dec 2 - Dec 15 → 73.2 - Dec 16 - Jan 5

Also, ni? :relud for further thoughts / visibility.

adding identifiers as scalars and keeping the ping name the same sounds good to me.

Flags: needinfo?(dthorn)

Thank y'all for confirming this!

I will create a Scalar store deletion-request for this, and add impression_id to it.

This adds a Scalar category "deletion.request" for the deletion request, it also adds a probe "impression_id" into this category.

Depends on: 1604312
Pushed by najiang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/244d2121b0c3 Handle deletion-ping for the impression_id in Messaging System and Activity Stream r=chutten

As this touches "Scalars.yml", data review is required.

:chutten, Could you?

Thanks!

Attachment #9116592 - Flags: data-review?(chutten)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 9116592 [details] data_review_request_impression_id.txt Passing to Ben as I advised on the implementation.
Attachment #9116592 - Flags: data-review?(chutten) → data-review?(bmiroglio)
Comment on attachment 9116592 [details] data_review_request_impression_id.txt # Data Review Form 1) Is there or will there be **documentation** that describes the schema for the ultimate data set in a public, complete, and accurate way? This will be documented in the [probe dictionary](https://probes.telemetry.mozilla.org/). 2) Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism) Provide details as to the control mechanism available. Yes, this can be disabled by turning of Telemetry from Firefox's Preferences. 3) If the request is for permanent data collection, is there someone who will monitor the data over time? Nan Jiang (najiang@mozill.com) will be monitoring those metrics. 4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under? Category 2: Interaction Data 5) Is the data collection request for default-on or default-off? default-on 6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)? No. 7) Is the data collection covered by the existing Firefox privacy notice? Yes. 8) Does there need to be a check-in in the future to determine whether to renew the data? No. 9) Does the data collection use a third-party collection tool? **If yes, escalate to legal.** No. data-review: r+
Attachment #9116592 - Flags: data-review?(bmiroglio) → data-review+

I have verified that the deletion.request.impression_id scalar is collected, is viewable in about:telemetry, and has the same value as the browser.newtabpage.activity-stream.impressionId pref on the latest Nightly 73.0a1, Build ID 20200106092427 using Windows 10 x64, macOS 10.14.6, and Arch Linux 5.3.6 x64.

Status: RESOLVED → VERIFIED
See Also: → 1729474
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: