add Pioneer deletion ping
Categories
(Firefox :: Pioneer, task, P1)
Tracking
()
People
(Reporter: rhelmer, Assigned: rhelmer)
References
(Blocks 1 open bug)
Details
Attachments
(5 files)
We need to support a standalone Telemetry deletion ping for Pioneer, since we can't combine the Pioneer ID and Telemetry client ID in the same ping.
Comment 1•5 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
|
||
Should this be a P1? I thought from recent meetings that we weren't ready to do this in 80.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
Mark, can we use a custom ping for Pioneer deletion? I guess we'd want to only include the Pioneer ID and nothing else.
How should I proceed with this? I'm presuming that we register a schema somewhere, who handles this on the Telemetry server side?
Assignee | ||
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Yes, I think a custom ping with just the pioneer ID is the right thing to do here. Anthony, can you help with the schema / server side q's?
Comment 5•5 years ago
•
|
||
I would recommend a deletion-request
ping per study. The upside of this is that the schema definition is minimal and the shredding process is isolated to a study project. However, when a client wants to delete their data globally, the must send a ping to each study they were enrolled in.
This would take the following form:
{
"$comment": "Bug 1646151 - add Pioneer deletion ping",
"$schema": "http://json-schema.org/draft-04/schema#",
"properties": {
"pioneerId": {
"description": "Pioneer ID to be deleted.",
"pattern":"^[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$",
"type": "string"
}
},
"required": [
"pioneerId"
],
"title": "deletion-request",
"type": "object"
}
Using the debug study as an example, the directory structure would take the following form:
pioneer-debug
├── debug
│ └── debug.1.schema.json
└── deletion-request
└── deletion-request.1.schema.json
The alternative is to support a single deletion ping. Since all pioneer pings are being rerouted from the telemetry pipeline family into the pioneer pipeline family, it does not make sense to the deletion request to live within the telemetry/deletion-request/deletion-request.4.schema.json
ping. It is also desirable for all Pioneer components to be isolated from the rest of the data platform. A new namespace (logically, a "study") would need to be allocated for deletion requests and assigned a keypair for acceptance of pings.
pioneer-deletion
└── deletion-request
└── deletion-request.1.schema.json
The deletion-request
schema is the same. However, shredder will now delete data across all studies and pings. Note that if data were encrypted at rest, and the ingestion sink project did not have access to all of the study keys, then this method of shredding data would not be viable. We trust the ACL's and give centralized automation control over all of the studies, so this is something we can do in practice.
I recommend a deletion-request
per study because there is more control over what is deleted. If requirements dictate that deletion-requests
are centralized, then it is possible to do so through a separate study. I do not think it's a good idea to mix data between telemetry and pioneer (e.g. telemetry.deletion-request.v4
) because it leaks information about the set of pioneer ids into the telemetry datasets (e.g. count(distinct pioneer_id)
)
Updated•5 years ago
|
Comment 6•5 years ago
|
||
Comment 7•5 years ago
|
||
After some discussion with :rhelmer, we've decided that the one ping per study approach will be sufficient for deletions. This follows the Glean-style of deletion requests (one table per dataset). This means the client will need to keep track of all of the studies it is currently and historically been part of in order to send the requests to the correct places. The ingestion pipeline will need to be modified to accept empty payloads, since the keys for the study may not exist anymore. The PioneerId will get injected into the table since it's part of the envelope metadata, so the schema is just the empty schema.
Assignee | ||
Comment 8•5 years ago
|
||
Comment 9•5 years ago
|
||
Comment 10•5 years ago
•
|
||
[Tracking Requested - why for this release]: this feature is required by trust and legal for our full launch in 81; we cannot launch without it.
Comment 11•5 years ago
|
||
Comment 12•5 years ago
|
||
(In reply to Marnie Pasciuto-Wood [:marnie] from comment #10)
[Tracking Requested - why for this release]:
In the future, filling this out with a reason would be appreciated.
Comment 13•5 years ago
|
||
Updated comment 10 to include a reason for 81 tracking.
Comment 14•5 years ago
|
||
Comment 15•5 years ago
|
||
Backed out for bc failures on browser_pioneer_ui.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/9b0bd3262a66f20f5fccbb7a25be46eae4ad7cf9
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313694751&repo=autoland&lineNumber=5272
Assignee | ||
Comment 16•5 years ago
|
||
(In reply to Narcis Beleuzu [:NarcisB] from comment #15)
Backed out for bc failures on browser_pioneer_ui.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/9b0bd3262a66f20f5fccbb7a25be46eae4ad7cf9
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313694751&repo=autoland&lineNumber=5272
Thanks... huh, not seeing this locally w/ a debug build. Could be a race condition in the test.
Assignee | ||
Comment 17•5 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #16)
(In reply to Narcis Beleuzu [:NarcisB] from comment #15)
Backed out for bc failures on browser_pioneer_ui.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/9b0bd3262a66f20f5fccbb7a25be46eae4ad7cf9
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313694751&repo=autoland&lineNumber=5272Thanks... huh, not seeing this locally w/ a debug build. Could be a race condition in the test.
Oh sheesh, I just needed to rebase, and the test code is too explicit about the number of pings.
Comment 18•5 years ago
|
||
Comment 19•5 years ago
|
||
Backed out for perma failures.
Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313763646&repo=autoland&lineNumber=2873
Backout: https://hg.mozilla.org/integration/autoland/rev/2a4db66b5e82350811e6d0d9b2d246d930c5c601
Assignee | ||
Comment 20•5 years ago
|
||
OK now there's a race condition (lots of async code, the test isn't waiting until the telemetry is sent before ending). Fixing that now.
Assignee | ||
Comment 21•5 years ago
•
|
||
(In reply to Robert Helmer [:rhelmer] from comment #20)
OK now there's a race condition (lots of async code, the test isn't waiting until the telemetry is sent before ending). Fixing that now.
I ended up turning off TV and filed bug 1660926 to follow-up (I'm pretty confident it's the test and not the code at this point, but I haven't been able to reproduce the problem locally yet even running --verify
multiple times).
Try run before attempting re-landing: https://treeherder.mozilla.org/#/jobs?repo=try&revision=be81126a00d7a73e18209010256cad4098905cf1
Comment 22•5 years ago
|
||
Comment 23•5 years ago
|
||
Backed out changeset 5b5e0d17dbc3 (Bug 1646151) for causing failures in browser_pioneer_ui.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/331c529201854224d437d87427526e2b44419466
Failure logs:
- https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=314054836&repo=autoland&lineNumber=4231
- https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=314053870&repo=autoland&lineNumber=2875
Bug 1661198 and Bug 1661188 where created for these issues when they first appeared, though further retriggers unveiled they were high frequency and warranted a backout.
Assignee | ||
Comment 24•5 years ago
|
||
OK. So a bunch of intermittent bugs are popping up for what should be a pretty simple patch, I'm going to go back to the drawing board a little bit and figure out a more minimal patch. I suspect that I'm doing something wrong in the new test code I've added.
Assignee | ||
Comment 25•5 years ago
|
||
Sorry for all the intermittent trouble, I've done a try run https://treeherder.mozilla.org/#/jobs?repo=try&revision=55e5ba98ff9e9106f195bdc1b542d906ec0d426f and also have had mach test --verify
going all day, hopefully that got the last of these.
Comment 26•5 years ago
|
||
Comment 27•5 years ago
|
||
bugherder |
Comment 28•5 years ago
|
||
Comment on attachment 9171448 [details] [review]
Link to GitHub pull-request: https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/597
Beta/Release Uplift Approval Request
- User impact if declined: We will be unable to uphold our legal obligations to prove that a user has successfully been deleted from Pioneer.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This is a simple telemetry ping, on the about:pioneer page. No risk to Firefox.
- String changes made/needed: None
Updated•5 years ago
|
Updated•5 years ago
|
Comment 29•5 years ago
|
||
Comment on attachment 9170824 [details]
Bug 1646151 - send deletion ping for each unenrolled Pioneer study r?sfoster
Approved for 81.0b4.
Comment 30•5 years ago
|
||
bugherder uplift |
Comment 31•5 years ago
|
||
Hey Chris,
this was lacking data-review, so dr?
Comment 32•5 years ago
|
||
Comment on attachment 9175513 [details]
pioneer-deletionrequest.md
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Ted Han is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default off.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
No. This collection is permanent.
Result: datareview+
Description
•