Closed Bug 1716725 Opened 3 years ago Closed 1 year ago

Implement the new events thresholds for custom pings

Categories

(Data Platform and Tools :: Glean: SDK, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: chutten)

References

Details

Attachments

(2 files)

Implement this proposal from bug 1696135.

See Also: → 1716724
Priority: -- → P3

As mentioned in bug 1784911 comment 17, this new and improved behaviour highlights an existing Glean SDK problem: what if a ping is defined with ping-lifetime metrics but is never submitted? The SDK'll happily store information for that ping, assuming it'll have a chance to wipe the storage on submit... but it'll never wipe it.

For non-event metrics, this would introduce a small but fixed amount of cruft in the db. Storing a counter or timing_distribution for all time will waste some bytes, but not too many.

For event metrics, each record will be added to the db and stored until we reach sizes that trigger db clearing. We'll fill up. There's no bound on the number of events that might be recorded and never cleared by ping submission.

Our current solution of "ask folks not to do this" won't work for firefox_desktop and firefox_desktop_background_agent because both app_ids submit from the same binary running in different modes. They'll each have their own custom pings with events (firefox_desktop has "newtab", firefox_desktop_background_agent has "background-update"), so each'll exhibit this problem.

We'll need some sort of solution for this.

Duplicate of this bug: 1780588

The design's solution for ever-increasing event storage this is to record and report invalid_overflow errors for the events recorded beyond the max. This'll stop the worst ramifications, but these errors will start and never stop, which reduces the efficacy of the Glean Error Reporting Mechanism.

But since it isn't gonna be the worst thing ever, maybe we implement the solution in a FOG-agnostic way to begin with and take "elegantly handle multiple app ids with disjoint pings from the same binary" to a follow-up? We'll see what suggests itself as I start digging in.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: P3 → P1
Attachment #9305367 - Flags: data-review?(tlong)

Comment on attachment 9305367 [details]
data collection review request

Data Review

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, through the metrics.yaml file and the Glean Dictionary.

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, through the telemetry preference in the application settings.

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

Permanent collection to be monitored by :chutten and the Glean Team.

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction Data

  1. Is the data collection request for default-on or default-off?

Default-on

  1. Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does the data collection use a third-party collection tool?

No

Result

data-review+

Attachment #9305367 - Flags: data-review?(tlong) → data-review+
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
See Also: → 1804915
Regressions: 1811872
You need to log in before you can comment on or make changes to this bug.