Closed Bug 1328678 Opened 9 years ago Closed 4 years ago

Aggregator should have more buckets for count histograms

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: frank, Unassigned)

References

Details

Frank Bertsch [:frank]

Reporter

Description

•

9 years ago

For some count histograms, most counts are beyond 10000 [0]. We need to continue to have bucket for 10000 histograms or more [1]. My idea would be linear histograms for the first 25 or so, then calculated exponential for the rest, tacking on new buckets when we need them. This would allow differentiation far beyond 10000, while still keeping precision for low counts. The big issue would be backfill. We could either do an actual backfill on a few recent count histograms (such as the one mentioned), or include some sort of note on TMO to let people know the difference. [0] https://mzl.la/2jamLUb [1] https://github.com/mozilla/python_mozaggregator/blob/master/mozaggregator/aggregator.py#L16

Ryan Hunt [:rhunt]

Updated

•

9 years ago

Blocks: 1297867

Roberto Agostino Vitillo (:rvitillo)

Comment 1

•

9 years ago

Variations of this issue show up time and time again; we should consider the use of histograms with dynamic range (e.g. [1][2]) to solve this class of problems. [1] https://github.com/HdrHistogram/HdrHistogram [2] https://github.com/vitillo/lua_tdigest

Thomas Huelbert

Updated

•

9 years ago

Points: --- → 3

Priority: -- → P3

:Harald Kirschner :digitarald

Comment 2

•

8 years ago

Any action here? This skews one of Quantum engagement metrics. If we collect data and then make it less useful in aggregation; we can stop collecting the data and save the bandwidth/storage

Frank Bertsch [:frank]

Reporter

Comment 3

•

8 years ago

(In reply to :Harald Kirschner :digitarald from comment #2) > Any action here? This skews one of Quantum engagement metrics. If we collect > data and then make it less useful in aggregation; we can stop collecting the > data and save the bandwidth/storage If this is blocking Quantum work we can certainly move it up the priority queue. Question: How and why are you using aggregates data for engagement measures? Are you using the data to create a dash somewhere? Or is this just for viewing in TMO?

Flags: needinfo?(hkirschner)

:Harald Kirschner :digitarald

Comment 4

•

8 years ago

We are planning to use scroll engagement as proxy for improved performance in pref-flipping experiments.

Flags: needinfo?(hkirschner)

Frank Bertsch [:frank]

Reporter

Comment 5

•

8 years ago

Is this usage then predicated on bug 1336989? Do you also need to see experiments and branches?

Flags: needinfo?(hkirschner)

:Harald Kirschner :digitarald

Comment 6

•

8 years ago

if experiments means the pref-flipping experiment pipeline, then yes.

Flags: needinfo?(hkirschner)

Frank Bertsch [:frank]

Reporter

Updated

•

8 years ago

No longer blocks: 1255755

Component: Metrics: Pipeline → Datasets: Telemetry Aggregates

Product: Cloud Services → Data Platform and Tools

Jeff Klukas [:klukas] (UTC-4)

Updated

•

4 years ago

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

3 years ago

Component: Datasets: Telemetry Aggregates → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Aggregator should have more buckets for count histograms

Categories

(Data Platform and Tools :: General, defect, P3)

Tracking

(Not tracked)

People

(Reporter: frank, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Updated