Implement the new events thresholds for custom pings
Categories
(Data Platform and Tools :: Glean: SDK, task, P1)
Tracking
(Not tracked)
People
(Reporter: Dexter, Assigned: chutten)
References
Details
Attachments
(2 files)
42 bytes,
text/x-github-pull-request
|
Details | Review | |
2.62 KB,
text/plain
|
travis_
:
data-review+
|
Details |
Implement this proposal from bug 1696135.
Updated•3 years ago
|
Assignee | ||
Comment 1•1 year ago
|
||
As mentioned in bug 1784911 comment 17, this new and improved behaviour highlights an existing Glean SDK problem: what if a ping is defined with ping
-lifetime metrics but is never submitted? The SDK'll happily store information for that ping, assuming it'll have a chance to wipe the storage on submit... but it'll never wipe it.
For non-event
metrics, this would introduce a small but fixed amount of cruft in the db. Storing a counter
or timing_distribution
for all time will waste some bytes, but not too many.
For event
metrics, each record will be added to the db and stored until we reach sizes that trigger db clearing. We'll fill up. There's no bound on the number of events that might be recorded and never cleared by ping submission.
Our current solution of "ask folks not to do this" won't work for firefox_desktop
and firefox_desktop_background_agent
because both app_ids submit from the same binary running in different modes. They'll each have their own custom pings with events (firefox_desktop
has "newtab", firefox_desktop_background_agent
has "background-update"), so each'll exhibit this problem.
We'll need some sort of solution for this.
Assignee | ||
Comment 3•1 year ago
|
||
The design's solution for ever-increasing event storage this is to record and report invalid_overflow
errors for the events recorded beyond the max. This'll stop the worst ramifications, but these errors will start and never stop, which reduces the efficacy of the Glean Error Reporting Mechanism.
But since it isn't gonna be the worst thing ever, maybe we implement the solution in a FOG-agnostic way to begin with and take "elegantly handle multiple app ids with disjoint pings from the same binary" to a follow-up? We'll see what suggests itself as I start digging in.
Comment 4•1 year ago
|
||
Assignee | ||
Comment 5•1 year ago
|
||
Comment 6•1 year ago
|
||
Comment on attachment 9305367 [details]
data collection review request
Data Review
- Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?
Yes, through the metrics.yaml file and the Glean Dictionary.
- Is there a control mechanism that allows the user to turn the data collection on and off?
Yes, through the telemetry preference in the application settings.
- If the request is for permanent data collection, is there someone who will monitor the data over time?
Permanent collection to be monitored by :chutten and the Glean Team.
- Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 2, Interaction Data
- Is the data collection request for default-on or default-off?
Default-on
- Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?
No
- Is the data collection covered by the existing Firefox privacy notice?
Yes
- Does the data collection use a third-party collection tool?
No
Result
data-review+
Assignee | ||
Comment 7•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Description
•