We spend too much time in the glean.dispatcher thread
Categories
(Data Platform and Tools :: Glean: SDK, defect, P2)
Tracking
(Not tracked)
People
(Reporter: jrmuizel, Assigned: janerik)
References
(Blocks 1 open bug)
Details
This is a profile of https://faraday.basschouten.com/mozilla/networkrequesttest/test.html where we do 300 network requests.
It looks like we're spending more CPU time in the glean.dispatcher thread than Chrome does in their network stack:
https://share.firefox.dev/43jxpwq
Do we wake up this thread every time we make a histogram accumulation?
| Assignee | ||
Updated•8 months ago
|
| Assignee | ||
Comment 1•8 months ago
|
||
I'll check the profile in more detail later.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #0)
Do we wake up this thread every time we make a histogram accumulation?
Yes. We essentially queue a task per accumulation, which the dispatcher thread picks up and runs.
That sure seems costly in this case. I'm taking a look.
Updated•8 months ago
|
| Reporter | ||
Comment 2•8 months ago
•
|
||
How did legacy telemetry avoid this overhead?
Comment 3•8 months ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)
How did legacy telemetry avoid this overhead?
By being a completely different data collection system.
Comment 4•8 months ago
|
||
To sum up a long follow-up conversation we had on Matrix:
- Legacy Telemetry had the calling thread pay the cost of synchronously updating volatile storage
- The Glean SDK has different requirements which dictated different decisions, like persistent storage of metric data
- We're moving at a sustainable speed towards a better world, with work like bug 1944248 finally within our planing horizon
- Quick Fixes are something we're wary of, after tight-deadlined features from last year have come back to haunt us this year
- We think this will be a tough problem to solve because of how integrated the dispatcher is (we rely on it for order of operations and preinit queuing at least) and how most metric instances don't know what their value is without asking storage (e.g. see that there's no owned
Stringinstringmetrics).- e.g. we can't have a mix of sync and async calls without violating order of operations. Is that important? Can we account for it?
- Please keep finding and filing performance issues with FOG and Glean so we can craft more comprehensive solutions
Description
•