Properly handle delete-pings for the impression_id in the telemetry subsystem
Categories
(Firefox :: Messaging System, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox73 | --- | verified |
People
(Reporter: nanj, Assigned: nanj)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
1.38 KB,
text/plain
|
bmiroglio
:
data-review+
|
Details |
This is a placeholder to track all the necessary work on the client side in order to handle deletion requests based on impression_id
(covers ActivityStream and Messaging System)
Comment 1•6 years ago
|
||
Note there's a slight wrinkle that, by the time you receive any notification that the optout has happened, ping upload is already disabled. So if we were hoping to send a ping with the necessary ids in response to the optout, that won't work without some painful special-casing.
This limits the mechanisms by which we might be able to communicate this to the pipeline. We could try piggy-backing on the "deletion-request" ping if it's okay sending your ids alongside the Telemetry client id (we could store your ids inside Telemetry on a "deletion-request" store which the "deletion-request" ping would snapshot as it is assembled. Or maybe we call out to an API and get the ids that way).
Either way we'll probably want to involve Data Platform to ensure that the approach we take will result in a data record that is useful for the actual deletion in the server.
Assignee | ||
Comment 2•6 years ago
|
||
impression_id
is persisited in the pref browser.newtabpage.activity-stream.impressionId
, so it's easy to fetch.
My main concern around piggy-backing it on the "deletion-request" is that impression_id
and client_id
have never shown up together in the same ping before, as the former is used in some Cat 3 telemetry (e.g. Pocket), doing so could uncover this connection. Since this request acts as the user signal to optout, and all their previous telemetry will be deleted soon, can we allow it as a special case?
If not, can we then issue two separate "deletion-requests"? Provided that this request supports to be sent without including the client_id.
Comment 3•6 years ago
|
||
I'd rather send two separate requests - could we consider adding a "identifier type" and "identifier value" to the deletion request ping to distinguish impression id
from client id
?
Updated•6 years ago
|
Comment 4•6 years ago
|
||
The client id is sent in the common ping info area of the ping document. I imagine the impression_id
would be in the payload, so it wouldn't be a problem there.
Unfortunately, everything else is a problem. The schema for deletion-request
pings require a non-null clientId
. Also, the exemption that allows for it to be sent might only work for a single ping (I'm not sure, I haven't tested it). Also also, if the point of sending it in multiple pings would be to make it so we couldn't ever (not that we would want to) connect the client id to the impression_id
, then we have the matter of fingerprinting to consider wrt the timing.
Since we're deleting these things anyway, I propose we include any ids (regardless of our previous efforts to keep them separate) in a single ping. I think we'll need to bring in :agray to get Trust's approval on this idea, but only if you (Mark and Nan) think this would be a good idea.
Assignee | ||
Comment 5•6 years ago
|
||
I am fine with sending those ids in a single ping if that also works for Mark.
Hi Alicia - would it be OK for us to include the client_id
and impression_id
in the same ping for this particular case (to delete all the telemetry for this user)?
For the context, the impression_id
is used for Pocket and CFR (Contextual Feature Recommendation) related telemetry whenever it's inappropriate to use the regular Firefox telemetry id (i.e. client_id). In general, we always keep the telemetry pings that contain impression_id
isolated from other pings with client_id. However, to respond to the user's request for telemetry opt-out, we'd need to include both ids in order to wipe out all the telemetry for that user. Therefore, client_id and impression_id will be bundled together in a single ping in this case. From your perspective, would it be OK to do so?
Comment 6•6 years ago
|
||
(In reply to Nan Jiang [:nanj] from comment #5)
I am fine with sending those ids in a single ping if that also works for Mark.
Hi Alicia - would it be OK for us to include the
client_id
andimpression_id
in the same ping for this particular case (to delete all the telemetry for this user)?For the context, the
impression_id
is used for Pocket and CFR (Contextual Feature Recommendation) related telemetry whenever it's inappropriate to use the regular Firefox telemetry id (i.e. client_id). In general, we always keep the telemetry pings that containimpression_id
isolated from other pings with client_id. However, to respond to the user's request for telemetry opt-out, we'd need to include both ids in order to wipe out all the telemetry for that user. Therefore, client_id and impression_id will be bundled together in a single ping in this case. From your perspective, would it be OK to do so?
HI Nan,
Trust is ok with bundling the 2 id's in a single ping for this particular purpose. Please let me know if I can help further.
Assignee | ||
Comment 7•6 years ago
|
||
Alicia - Thanks for the confirmation!.
chutten, if we bundle impression_id into the deletion-request ping, do we need to update the ping schema for this change? Looks like the current version doesn't support additional properties.
Mark, as you mentioned earlier that there might be some other similar ids to handle such as for FxA, would you be able to confirm this send-all-ids-in-one-ping proposal?
I was wondering about adding the following structure to the current deletion-reuqest
ping,
{
...,
"other_app_identifiers": [
{
"type": "impression_id",
"value": "{IMPRESSION_UUID}"
},
{
"type": "FxA",
"value": "{FxA_UUID}"
},
...
],
...
}
Thoughts?
Comment 8•6 years ago
|
||
We do indeed need to update the schema. ...and I'm not sure if this change will necessitate a change of ping name or not. (Mark?)
Instead of a custom "other_app_identifiers" needing sync API calls on user optout, we could instead store the ids in Scalars with the store "deletion-request". Then, when it's ping time, Telemetry code would snapshot all Scalars in the "deletion-request" store and add them to the "deletion-request" ping's payload. Maybe that way we could reuse some pipeline code and prepare ourselves for additional ids we might find ourselves wanting to include (presuming they pass review).
Mark, do you have opinions?
Updated•6 years ago
|
Comment 9•6 years ago
|
||
I'm OK with including these IDs together if that simplifies things on the implementation side.
We don't need to change the ping name as far as I'm concerned. The change to add scalars to the payload
field should be compatible on the pipeline side. It will require some work in the schema generator to turn scalars into fields for the 'deletion-request' ping.
Using scalars sounds fine to me. I have two opinions on the matter:
- The set of IDs included should be explicitly encoded somewhere (in this case
Scalars.yaml
, so I'm happy), and we should be able to add more if need be (again, happy). - We should do something similar to what we'll do in Glean to keep things as consistent as possible.
Also, ni? :relud for further thoughts / visibility.
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Also, ni? :relud for further thoughts / visibility.
adding identifiers as scalars and keeping the ping name the same sounds good to me.
Assignee | ||
Comment 11•6 years ago
|
||
Thank y'all for confirming this!
I will create a Scalar store deletion-request
for this, and add impression_id
to it.
Assignee | ||
Comment 12•6 years ago
|
||
This adds a Scalar category "deletion.request" for the deletion request, it also adds a probe "impression_id" into this category.
Comment 13•6 years ago
|
||
Assignee | ||
Comment 14•6 years ago
|
||
As this touches "Scalars.yml", data review is required.
:chutten, Could you?
Thanks!
Comment 15•6 years ago
|
||
bugherder |
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Comment 18•5 years ago
|
||
I have verified that the deletion.request.impression_id
scalar is collected, is viewable in about:telemetry
, and has the same value as the browser.newtabpage.activity-stream.impressionId
pref on the latest Nightly 73.0a1, Build ID 20200106092427 using Windows 10 x64, macOS 10.14.6, and Arch Linux 5.3.6 x64.
Description
•