Closed Bug 1565253 Opened 5 years ago Closed 5 years ago

Investigate the suitability of proposed functional bucketing in Glean

Categories

(Data Science :: Investigation, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdroettboom, Assigned: ccd)

References

Details

Brief Description of the request (required):

The bucketing of many of the existing histograms collected from Desktop Firefox is known to be less than ideal. Data too often overflows outside of the range, or there aren't a sufficient number of buckets to cover the "interesting" parts of the data. For the Glean project (already shipping in Fenix and FFTV, but expanding to iOS and Desktop in the future), we would like to provide a better way to perform bucketing of timing distributions (a special case of histograms) that has fewer parameters and less potential for error. A detailed proposal for a new approach has been written, that would effectively provide exponential histograms of infinite range with a default scale designed to match the needs of most timing data. What remains, however, is to verify that approach against existing data to confirm that what is suggested will in fact improve the situation. As part of bringing GeckoView telemetry to Fenix (via Glean), the performance and graphics teams have identified the most important histograms. These histograms would be the most important ones for this analysis.

Business purpose for this request (required):

The existing bucketing of many histograms makes it harder to analyze and reason about the data, since much of the information (either in range or resolution) is lost before it even leaves the client. This makes it harder to make technical and business decisions. We have a window of opportunity to do something better for Fenix now (and other Glean-using products in the future), that will be much harder to change once data is being collected.

Requested timelines for the request or how this fits into roadmaps or critical decisions (required):

The ideal timeline for this would be mid-August 2019, but sooner would be better.

In 2019Q3, the top-priority project for the client telemetry team is bringing the most important GeckoView performance and graphics histograms into Glean for collection from Fenix. If we can validate that the new approach to bucketing will work, we can start collecting better data as part of that work. If this timeline were to slip, our default position would be to continue collecting these histograms with the same bucketing used on Desktop, which is known to be suboptimal.

Links to any assets (e.g Start of a PHD, BRD; any document that helps describe the project):

Name of Data Scientist (If Applicable):

Felix Lawrence and Corey Dow-Hygelund have already been assisting with the proposal and have a lot of context here.

Flags: needinfo?(flawrence)
Flags: needinfo?(cdowhygelund)
Assignee: nobody → flawrence
Status: NEW → ASSIGNED

Corey will investigate, later next week

Assignee: flawrence → cdowhygelund
Flags: needinfo?(flawrence)

Additional comment: The proposal linked here is written specifically with timing distributions in mind. The spreadsheet of "important histograms" includes histograms of other types (memory etc.) that we will also want to account for. It's possible we'll want to use an entirely different approach there, or the same approach but with a different scaling etc.

See Also: → 1564989
Blocks: 1568589

Initial analysis complete. Doubling the coefficient to 8 will maintain the existing resolution for the high priority timing histograms.

Flags: needinfo?(cdowhygelund)

Great, :ccd. Thanks so much for this analysis.

For the remaining metrics that are memory related, we went through them one-by-one in a meeting and they seem to fall into two categories: (1) Things that are very custom and one-off that will probably need to retain explicit range and bucket count, and (2) things that seem to follow a pretty standard exponential distribution that might benefit from this approach.

Would you mind extending your analysis to include the following metrics (from category (2)) to confirm that we would be ok using something like that there?

MEMORY_HEAP_ALLOCATED
MEMORY_IMAGES_CONTENT_USED_UNCOMPRESSED
MEMORY_RESIDENT_FAST
MEMORY_RESIDENT_PEAK
MEMORY_TOTAL
MEMORY_UNIQUE
MEMORY_UNIQUE_CONTENT_STARTUP
MEMORY_VSIZE
MEMORY_VSIZE_MAX_CONTIGUOUS

Flags: needinfo?(cdowhygelund)

Memory probes have been analyzed. Doubling again the coefficient to 16 will maintain the existing resolution.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(cdowhygelund)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.