Open Bug 1906664 Opened 3 months ago Updated 1 month ago

Telemetry collection on android can add up to 30ms main thread time during pageload (parent process)

Categories

(Data Platform and Tools :: Glean: SDK, defect, P2)

defect

Tracking

(Not tracked)

People

(Reporter: acreskey, Assigned: janerik)

References

(Blocks 1 open bug)

Details

In bug 1899169 we are quantifying the impact of glean on fenix pageload.

With Bug 1892230, Bug 1898515, and Bug 1898649 fixed, we are still measuring a performance hit in CI pageload tests with Telemetry enabled.

The baseline revision has telemetry disabled in Fenix, while the "New" revision (i.e. right hand side) has telemetry enabled.
Performance Compare

With the "replicates" view enabled I am seeing a series of performance impacts, medium confidence unless otherwise noted.

• allrecipes ContentfulSpeedIndex opt warm, 4.43% slower
• amazon-search largestContentfulPaint opt warm 2.32% slower
• cnn visual metrics, 3-4% slower
• cnn fnbpaint opt warm, 6.71% slower, high confidence
• ebay-kleinanzeigen-search ContentfulSpeedIndex opt warm, 3.85% high confidence
• imdb fcp opt warm, 2.95% slower
• sina visual metrics, up to 18% slower, but high noise site

One source (not necessarily the primary one) is raw collection time on the main thread of the parent process.

Some examples:

sina pageload:
~35ms total in glean_core::metrics::counter::CounterMetric::add and fog_timing_distribution_accumulate_raw_nanos
https://share.firefox.dev/3xC3AdK

ebay, warm pageload
12ms of samples in glean_core::metrics::timing_distribution::TimingDistributionMetric::accumulate_raw_duration and
mozilla::glean::impl::DenominatorMetric::Add
https://share.firefox.dev/45T44c1

This one has 10 samples, 50ms, in glean, mostly CounterMetric:add

https://share.firefox.dev/4606iGG

Thanks! Those are some numbers we can work with and try to reduce now.
First I think we need some way to reliably and quickly get some ofthese numbers in test scenarios.
I do think some of that might be overhead from UniFFI and JNA.

Assignee: nobody → jrediger
Priority: -- → P2

All three profiles of amazon-search pageload show 30-50ms of parent process, main thread usage.

Mostly in glean_core::metrics::counter::CounterMetric::add from networking calls and mozilla::glean::impl::DenominatorMetric::Add coming from mozilla::net::CookieService::CanSetCookie

https://share.firefox.dev/3WjgNBH
https://share.firefox.dev/3W0KWEg
https://share.firefox.dev/3xGqe4J

You need to log in before you can comment on or make changes to this bug.