Do not block Glean init on flushing pending tasks, if possible
Categories
(Data Platform and Tools :: Glean: SDK, defect, P1)
Tracking
(Not tracked)
People
(Reporter: Dexter, Assigned: Dexter)
Details
(Whiteboard: [telemetry:glean-rs:m11])
Attachments
(2 files)
Started from this issue on GH, the perf team noticed that the glean init is blocked on flushing actions from consumers who attempted recording before the SDK was initialized. From that issue:
In flushQueuedInitialTasks, it looks like we are blocking the main thread waiting for async tasks to complete: instead of blocking, can we do the work following the join() on a background thread or reacquire the main thread when needed? If not, maybe we can do all of this work after startup?
We should investigate if that's possible.
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 1•5 years ago
|
||
Note: this has big implications with the interaction of setUploadEnabled. Scenario:
- setUploadEnabled(false) is called
- many operations and ping submission are enqueued
- Glean.initialize() is called
- setUploadEnabled(true) is called
Currently, by blocking (3) on the flush of the queued tasks, we guarantee that none of the operations enqueued at (2) are sent. If we were to make it async, we'd need to make sure we don't mistakenly sent that data.
| Assignee | ||
Comment 2•5 years ago
|
||
In Slack, Alessio asked me to look into calling Glean.initialize as soon as possible: I tried this and found no significant difference. Removing my 100ms outliers (1 each on each impl) and not clearing data between runs, on my Pixel 2 my results were:
- Current impl: 79.29ms in Glean.initialize; 49.85ms in flushQueued... over 2 runs
- Moved impl: 70.35ms in Glean.initialize; 41.45ms in flushQueued... over 3 runs
For specifics of my implementation, you can see my branch. I removed the coroutine launch – which was deferring Glean.initialize until some time after HomeActivity.onResume but before the first frame was drawn – and moved it as early as possible – right after we async start initializing the megazord - and the difference in performance was negligible.
I think we have two options:
- Move
Glean.initializeuntil after the first frame is drawn (which is probably what was intended by the current implementation's coroutine launch) - Continue looking for a solution that doesn't require blocking the main thread in
flushQueued
I didn't take this into account when writing the above but I noticed that Sebastian landed a commit today that moves Glean.initialize to as early as possible during startup, i.e. it even blocks initing the megazord asynchronously. The new implementation is slower for Glean:
- Master impl: 102.44ms in Glean.initialize; 14.21ms in flushQueued... over 3 runs
This slightly improves overall cold startup on my P2 – 1s387.68ms before and 1s369.67ms after for 3 runs each on forPerformanceTest variants – but this is well within normal noisyness of startup.
I'm guessing the reason for the slow down in Glean.initialize is that we're doing more native initialization here (that may otherwise happen in Gecko or megazord init) and that the earliest moments of startup might have more resource content (e.g. touching the disk).
Comment 5•5 years ago
|
||
iOS PR
| Assignee | ||
Updated•5 years ago
|
Description
•