Closed Bug 1609052 Opened 5 years ago Closed 5 years ago

Do not block Glean init on flushing pending tasks, if possible

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: Dexter)

Details

(Whiteboard: [telemetry:glean-rs:m11])

Attachments

(2 files)

Started from this issue on GH, the perf team noticed that the glean init is blocked on flushing actions from consumers who attempted recording before the SDK was initialized. From that issue:

In flushQueuedInitialTasks, it looks like we are blocking the main thread waiting for async tasks to complete: instead of blocking, can we do the work following the join() on a background thread or reacquire the main thread when needed? If not, maybe we can do all of this work after startup?

We should investigate if that's possible.

Assignee: nobody → alessio.placitelli
Priority: -- → P1
Whiteboard: [telemetry:glean-rs:m?]
Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m11]

Note: this has big implications with the interaction of setUploadEnabled. Scenario:

  1. setUploadEnabled(false) is called
  2. many operations and ping submission are enqueued
  3. Glean.initialize() is called
  4. setUploadEnabled(true) is called

Currently, by blocking (3) on the flush of the queued tasks, we guarantee that none of the operations enqueued at (2) are sent. If we were to make it async, we'd need to make sure we don't mistakenly sent that data.

Attached file glean WIP PR

In Slack, Alessio asked me to look into calling Glean.initialize as soon as possible: I tried this and found no significant difference. Removing my 100ms outliers (1 each on each impl) and not clearing data between runs, on my Pixel 2 my results were:

  • Current impl: 79.29ms in Glean.initialize; 49.85ms in flushQueued... over 2 runs
  • Moved impl: 70.35ms in Glean.initialize; 41.45ms in flushQueued... over 3 runs

For specifics of my implementation, you can see my branch. I removed the coroutine launch – which was deferring Glean.initialize until some time after HomeActivity.onResume but before the first frame was drawn – and moved it as early as possible – right after we async start initializing the megazord - and the difference in performance was negligible.

I think we have two options:

  • Move Glean.initialize until after the first frame is drawn (which is probably what was intended by the current implementation's coroutine launch)
  • Continue looking for a solution that doesn't require blocking the main thread in flushQueued

I didn't take this into account when writing the above but I noticed that Sebastian landed a commit today that moves Glean.initialize to as early as possible during startup, i.e. it even blocks initing the megazord asynchronously. The new implementation is slower for Glean:

  • Master impl: 102.44ms in Glean.initialize; 14.21ms in flushQueued... over 3 runs

This slightly improves overall cold startup on my P2 – 1s387.68ms before and 1s369.67ms after for 3 runs each on forPerformanceTest variants – but this is well within normal noisyness of startup.

I'm guessing the reason for the slow down in Glean.initialize is that we're doing more native initialization here (that may otherwise happen in Gecko or megazord init) and that the earliest moments of startup might have more resource content (e.g. touching the disk).

Attached file GitHub Pull Request

iOS PR

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: