Pings with `metadata.follows_collection_enabled: false` but `metadata.include_info_sections: true` will fail schema validation after data upload opt-out
Categories
(Data Platform and Tools :: Glean: SDK, defect, P2)
Tracking
(firefox138 wontfix, firefox139 verified, firefox140 verified)
People
(Reporter: chutten, Assigned: janerik)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
As discovered by Cosmin while testing the "quick-suggest" ping updated in bug 1940808 to manually process collection enabled, if you disable data collection, the Glean SDK will dump its db. This includes the core metrics that supply all the information for client_info and ping_info.
(This was a necessary and agreed-upon shortcut to support the Terms of Service project's timelines in Q42024)
This means that subsequent submissions of a ping with metadata.include_info_sections: true (the default) and metadata.follows_collection_enabled: false will not have the data needed to fill in client_info or ping_info fields. Quite a few of these fields are required in the glean.1 schema (like client_info.app_build) meaning that these pings will fail schema validation and be rejected.
| Reporter | ||
Comment 1•1 year ago
|
||
This has resulted in a QA "yellow" sign-off on the related project.
Comment 2•1 year ago
|
||
Not only our project but this is a blocker for the TOS change, right?
| Assignee | ||
Comment 3•1 year ago
|
||
This is a bit ... tough.
client_info is Glean-internal managed and stored internally in separate storage, so we can easily clear it out in total.
For the usage reporting project we essentially "cloned" important client info fields into normal metrics, that are set by app code.
That would be a possible way forward for the quick-suggest ping, if it would go the include_info_sections: false route (though of course that has other problems, like ... we can't just switch the ping, we would need a new name for the table schemas to work out)
To change this in Glean requires a bit of a re-design on how we do things.
(1) We could just not clear the client info data on disabling collection (except the client ID).
(2) Or we store client info data per ping instead of once.
Code-wise the first one might be easier to do.
There's a few info fields that could be problematic:
ping_info.experiments. If telemetry is disabled a client won't have experiments, so this would be empty. No problem.client_info.client_id. As said needs to be removed when disabling anyway.quick-suggest(and otherfollows_collection_enabled: falsepings) don't have a client ID anyway.client_info.first_run_date. This is a user-specific value, which we even do keep across disabling/enabling telemetry. But should it go into other pings when telemetry is disabled? It's coarse, but might still allow connection back to old telemetry. Note that in the schema it's required, so we might need to change that.
Other ping_info fields are per ping (seq, times, reason), thus unproblematic.
Other client_info fields are app/device specific (app channel/version/build, device/os/...). Somewhat coarse and/or not user-specific. Seems fine to keep them.
now about a timeline tackling this: :chutten, willing to take this? I think if we clarify the remaining questions about the fields, the solution (1) might be not too much code-changes (though who knows what edge cases it holds).
| Reporter | ||
Comment 4•1 year ago
|
||
I think solution (1) is sensible and that first_run_date isn't that big of a deal because "allow connection back to old telemetry" isn't actually a risk. Because "old telemetry" gets deleted by "deletion-request" ping when collection_enabled switches to false anyway : )
ping_info.experiments on the other hand... that I'm not so sure about. We'll see what happens as I get around to trying this.
Comment 5•1 year ago
|
||
| Assignee | ||
Updated•1 year ago
|
Comment 6•1 year ago
|
||
This started showing up in our weekly data platform health checks with quick_suggest pings requiring a first_run_date, which seems to be missing in some cases.
| Assignee | ||
Comment 7•1 year ago
|
||
badboy merged PR [mozilla/glean]: Bug 1944865 - Do not clear client-info on disabling collection (#3068) in c34fec1.
We have that code landed. We plan on doing a release still this week and getting that into m-c quickly.
| Assignee | ||
Updated•1 year ago
|
Comment 8•1 year ago
|
||
I have verified this issue on the latest Firefox Beta 139.0b8 (Build ID: 20250514034321) and the latest Nightly 140.0a1 (Build ID: 20250515230342) on Windows 10 x64, macOS 14.4.1 and Linux Mint 20.1.
- The error is no longer displayed for the "quick-suggest" pings after setting the “datareporting.healthreport.uploadEnabled” pref to false.
Considering this, I will mark this issue as verified-fixed.
Updated•1 year ago
|
Description
•