Closed Bug 1313592 Opened 8 years ago Closed 8 years ago

Estimate storage impact of Event Telemetry

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

Georg Fritzsche [:gfritzsche]

Assignee

Description

•

8 years ago

We need to pessimistically estimate the storage impact of Event Telemetry.

We will also need to decide on what numbers are acceptable and tune the client-side limits from that.

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

8 years ago

Priority: P2 → P1

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

8 years ago

Points: --- → 2

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

8 years ago

Assignee: nobody → gfritzsche

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

8 years ago

status-firefox52: affected → ---

Georg Fritzsche [:gfritzsche]

Assignee

Comment 1

•

8 years ago

I started to look at storage estimation here:
https://docs.google.com/spreadsheets/d/1o1ZLfiEEj1nA0ViKA67PAP2q8adzmEzQt4BT-gsyPsA/

This is just looking at raw, upper-bound, size impact, but should give us a worst-case scenario to work from.

E.g. for a simple form of [timestamp,"category","method","object","value",null], sending 1k events per ping costs us ~0.16MB, 10k events ~1.6MB.
For event driven data-collection like the ones considered (clicks, navigations, tab open, ...), 1k events doesn't seem very much.
For comparison, the payload size for the whole opt-out ping "main" ping from release is currently ~0.15MB & we just discard any pings >1MB (raw or compressed).

Adding any information in the "extra" dictionary makes this quickly more expensive; i've added some example rows with different amounts of extra submission ratios.

There are different parts in the whole pipeline where this might be problematic:
- client side storage
- client bandwidth & upload times
- pipeline storage? (although this should compress away well?)
- worker processing?

As-is, i think we can't use this on any population without strict limits in place.
We can probably talk about different approaches to this:
- optimization/compression of event submissions
- population sampling (1% of clients only)
- different limits & policies for release & pre-release
- accepting hard cutoff after reaching limit of N events
- ...?

Georg Fritzsche [:gfritzsche]

Assignee

Comment 2

•

8 years ago

I added data on how events impact ping size, raw & compressed, based on an opt-out & opt-in sample ping:
https://docs.google.com/spreadsheets/d/1o1ZLfiEEj1nA0ViKA67PAP2q8adzmEzQt4BT-gsyPsA/

The script to generate this is here:
https://gist.github.com/georgf/989d484da9b75bc86eb858dfe02b3768

Georg Fritzsche [:gfritzsche]

Assignee

Comment 3

•

8 years ago

We discussed the initial options here in a smaller group:
https://docs.google.com/document/d/1hxpqQefc2QiIdZlNhaIAd3GpgN9VbABD71q96nh9Xec/

While we have good options to do things more clever in the medium- to longer-term, there is a short-term path we are taking for Fx52:
* cap after N=1000 per subsession
* only pre-release for now, disable recording on release
* no sampling for now as we don't go to release
* ride this on fx52, including at least the initial search probe (bug 1316281)

There will be another meeting with more people about the next steps from there, for which we need to:
* enable others to make budget decisions and state requirements better
* summarize options and concerns to make them more actionable
* build out size estimation with raw worst case & snappy compression
* estimate some expected event impact based on engagement measurements

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

8 years ago

Blocks: 1316810

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

8 years ago

Points: 2 → 3

Georg Fritzsche [:gfritzsche]

Assignee

Comment 4

•

8 years ago

We have more updated numbers in the notes in [1] and settled on roughly the following:
* collect events only from a sample of the population (bug 1320716)
* hard-limit event collection (1000 for built-in events on pre-release for now, bug 1316810)
* limit event collection to pre-release until we are confident about the mechanism (bug 1319102)
* on release, use a much lower limit for built-in events (e.g. 100, bug 1320713)
* allow dynamic event registration to override with higher limits (commented on bug 1302681)
  - this would be for smaller studies or experiments, which would not have as broad an impact
* we will monitor impact on telemetry sending via a minimal health ping (bug 1318297)
* we will need a tool that makes it easy to estimate the budget impact of new event collections (bug 1320711)

1: https://docs.google.com/document/d/1QJhCnuBWR5xVc0zegXDoFXBdswCGIcDBYcQl8DS-UJI/

Georg Fritzsche [:gfritzsche]

Assignee

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Estimate storage impact of Event Telemetry

Categories

(Toolkit :: Telemetry, defect, P1)

Tracking

()

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Updated