Use timers and lifecycle events to flush the GV -> Glean EXTRACT buffer
Categories
(Toolkit :: Telemetry, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox75 | --- | fixed |
People
(Reporter: mdroettboom, Assigned: chutten)
References
Details
(Whiteboard: [telemetry:glean-rs:m?])
Attachments
(2 files)
We could use a combination of timers and lifetime events to flush the EXTRACT buffer to reduce the likelihood of losing metrics in the buffer. Care would need to be taken so this doesn't have performance impact or unnecessary wakeups.
:dexter added: "What about this: when we start,we kick off a timer. When the timer hits, we flush (if there's data in the batch). When going to background, we kill the timer. When going to foreground, we recreate the timer (so that we don't mistakenly wake up the app)."
This should hopefully reduce the amount of telemetry that is lost in the buffer.
Assignee | ||
Comment 2•3 years ago
|
||
Sounds good to me. I'm tempted to restart the timer if there's an accumulation after we're backgrounded so we can guarantee an upper bound of staleness universally... but I'm loathe to be the reason for a wakeup (especially on mobile), so I think this is the better choice.
Do we have information about how GV is notified about lifecycle events? This is all precluded on GV Streaming Telemetry being able to, by some mechanism (observers?) learn about these lifecycle state changes.
Updated•3 years ago
|
Comment 3•3 years ago
|
||
(In reply to Chris H-C :chutten from comment #2)
Sounds good to me. I'm tempted to restart the timer if there's an accumulation after we're backgrounded so we can guarantee an upper bound of staleness universally... but I'm loathe to be the reason for a wakeup (especially on mobile), so I think this is the better choice.
Yes, if we can have timers work like, that, I like the idea of the universal staleness :-)
Do we have information about how GV is notified about lifecycle events? This is all precluded on GV Streaming Telemetry being able to, by some mechanism (observers?) learn about these lifecycle state changes.
Yes, we do :) I just asked :esawin and :snorp and they pointed me to this code: apparently the application-background
and application-foreground
are the notifications we're looking for.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:
- Start a "Just in Case" timer when we start a batch (let's say, 60s)
- Clear the timer whenever we send the batch
- Send the batch when we go to background
This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.
What do you think, Alessio? Does this design meet your needs?
Comment 5•3 years ago
|
||
(In reply to Chris H-C :chutten from comment #4)
Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:
- Start a "Just in Case" timer when we start a batch (let's say, 60s)
This will restart the timer as well, right? So that it will be 60s since the latest sample?
- Clear the timer whenever we send the batch
- Send the batch when we go to background
This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.
This looks good to me. Just to be clear: this means that we'll wake up at least once in the worst case of some consumer accumulating only a handful of metrics and then nothing else after going to background.?
Assignee | ||
Comment 6•3 years ago
|
||
We assumed that geckoview would be generating a consistent stream of data that
would keep the batches from getting much staler than their default 5s.
Turns out we're far too efficient, so the data dries up.
To still catch data stale in the buffer, we now send the batch when the app
goes to background (on the "application-background" topic).
Assignee | ||
Comment 7•3 years ago
|
||
Depends on D61837
Assignee | ||
Comment 8•3 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #5)
(In reply to Chris H-C :chutten from comment #4)
Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:
- Start a "Just in Case" timer when we start a batch (let's say, 60s)
This will restart the timer as well, right? So that it will be 60s since the latest sample?
Nope. The goal is to put an upper bound of 60s for the oldest/most stale data in the batch.
- Clear the timer whenever we send the batch
- Send the batch when we go to background
This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.
This looks good to me. Just to be clear: this means that we'll wake up at least once in the worst case of some consumer accumulating only a handful of metrics and then nothing else after going to background.?
Nope. If a consumer accumulates a handful of metrics and goes to the background, then on the background we'll send the batch and clear the timer. We won't wake up at all.
But if any accumulations happen after that batch is sent (ie, while the application is in the background, or after it comes back into the fg) then we'll have a 60s timer ensuring that the accumulations wait at most that long before being sent.
Make sense?
Comment 9•3 years ago
|
||
Make sense?
It does!
Comment 11•3 years ago
|
||
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2a906606a73e Send GV Telemetry batches on application background r=janerik https://hg.mozilla.org/integration/autoland/rev/6235a81b9882 Enforce max staleness of 60s in GV Streaming Telemetry r=janerik
Comment 12•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2a906606a73e
https://hg.mozilla.org/mozilla-central/rev/6235a81b9882
Assignee | ||
Comment 13•3 years ago
|
||
ni?alessio for a manual qa pass (presuming this has made it into Fenix by now)
Comment 14•3 years ago
|
||
Well, due to our well known circumstances this is now way past overdue and made it into the wild. Clearing the ni?.
Description
•