Closed
Bug 1328678
Opened 8 years ago
Closed 4 years ago
Aggregator should have more buckets for count histograms
Categories
(Data Platform and Tools :: General, defect, P3)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: frank, Unassigned)
References
Details
For some count histograms, most counts are beyond 10000 [0]. We need to continue to have bucket for 10000 histograms or more [1]. My idea would be linear histograms for the first 25 or so, then calculated exponential for the rest, tacking on new buckets when we need them. This would allow differentiation far beyond 10000, while still keeping precision for low counts.
The big issue would be backfill. We could either do an actual backfill on a few recent count histograms (such as the one mentioned), or include some sort of note on TMO to let people know the difference.
[0] https://mzl.la/2jamLUb
[1] https://github.com/mozilla/python_mozaggregator/blob/master/mozaggregator/aggregator.py#L16
Comment 1•8 years ago
|
||
Variations of this issue show up time and time again; we should consider the use of histograms with dynamic range (e.g. [1][2]) to solve this class of problems.
[1] https://github.com/HdrHistogram/HdrHistogram
[2] https://github.com/vitillo/lua_tdigest
Updated•8 years ago
|
Points: --- → 3
Priority: -- → P3
Comment 2•8 years ago
|
||
Any action here? This skews one of Quantum engagement metrics. If we collect data and then make it less useful in aggregation; we can stop collecting the data and save the bandwidth/storage
Reporter | ||
Comment 3•8 years ago
|
||
(In reply to :Harald Kirschner :digitarald from comment #2)
> Any action here? This skews one of Quantum engagement metrics. If we collect
> data and then make it less useful in aggregation; we can stop collecting the
> data and save the bandwidth/storage
If this is blocking Quantum work we can certainly move it up the priority queue.
Question: How and why are you using aggregates data for engagement measures? Are you using the data to create a dash somewhere? Or is this just for viewing in TMO?
Flags: needinfo?(hkirschner)
Comment 4•8 years ago
|
||
We are planning to use scroll engagement as proxy for improved performance in pref-flipping experiments.
Flags: needinfo?(hkirschner)
Reporter | ||
Comment 5•8 years ago
|
||
Is this usage then predicated on bug 1336989? Do you also need to see experiments and branches?
Flags: needinfo?(hkirschner)
Comment 6•8 years ago
|
||
if experiments means the pref-flipping experiment pipeline, then yes.
Flags: needinfo?(hkirschner)
Reporter | ||
Updated•8 years ago
|
No longer blocks: 1255755
Component: Metrics: Pipeline → Datasets: Telemetry Aggregates
Product: Cloud Services → Data Platform and Tools
Updated•4 years ago
|
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•3 years ago
|
Component: Datasets: Telemetry Aggregates → General
You need to log in
before you can comment on or make changes to this bug.
Description
•