Closed Bug 1612283 Opened 10 months ago Closed 10 months ago

Use timers and lifecycle events to flush the GV -> Glean EXTRACT buffer

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement
Points:
2

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
firefox75 --- fixed

People

(Reporter: mdroettboom, Assigned: chutten|PTO)

References

Details

(Whiteboard: [telemetry:glean-rs:m?])

Attachments

(2 files)

We could use a combination of timers and lifetime events to flush the EXTRACT buffer to reduce the likelihood of losing metrics in the buffer. Care would need to be taken so this doesn't have performance impact or unnecessary wakeups.

:dexter added: "What about this: when we start,we kick off a timer. When the timer hits, we flush (if there's data in the batch). When going to background, we kill the timer. When going to foreground, we recreate the timer (so that we don't mistakenly wake up the app)."

This should hopefully reduce the amount of telemetry that is lost in the buffer.

Chris, how do you feel about this solution?

Flags: needinfo?(chutten)

Sounds good to me. I'm tempted to restart the timer if there's an accumulation after we're backgrounded so we can guarantee an upper bound of staleness universally... but I'm loathe to be the reason for a wakeup (especially on mobile), so I think this is the better choice.

Do we have information about how GV is notified about lifecycle events? This is all precluded on GV Streaming Telemetry being able to, by some mechanism (observers?) learn about these lifecycle state changes.

Flags: needinfo?(chutten) → needinfo?(alessio.placitelli)
See Also: → 1612282
Component: Glean: SDK → Telemetry
Flags: needinfo?(alessio.placitelli)
Priority: P3 → --
Product: Data Platform and Tools → Toolkit

(In reply to Chris H-C :chutten from comment #2)

Sounds good to me. I'm tempted to restart the timer if there's an accumulation after we're backgrounded so we can guarantee an upper bound of staleness universally... but I'm loathe to be the reason for a wakeup (especially on mobile), so I think this is the better choice.

Yes, if we can have timers work like, that, I like the idea of the universal staleness :-)

Do we have information about how GV is notified about lifecycle events? This is all precluded on GV Streaming Telemetry being able to, by some mechanism (observers?) learn about these lifecycle state changes.

Yes, we do :) I just asked :esawin and :snorp and they pointed me to this code: apparently the application-background and application-foreground are the notifications we're looking for.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Type: defect → enhancement
Points: --- → 2
Priority: -- → P1

Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:

  • Start a "Just in Case" timer when we start a batch (let's say, 60s)
  • Clear the timer whenever we send the batch
  • Send the batch when we go to background

This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.

What do you think, Alessio? Does this design meet your needs?

Flags: needinfo?(alessio.placitelli)

(In reply to Chris H-C :chutten from comment #4)

Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:

  • Start a "Just in Case" timer when we start a batch (let's say, 60s)

This will restart the timer as well, right? So that it will be 60s since the latest sample?

  • Clear the timer whenever we send the batch
  • Send the batch when we go to background

This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.

This looks good to me. Just to be clear: this means that we'll wake up at least once in the worst case of some consumer accumulating only a handful of metrics and then nothing else after going to background.?

Flags: needinfo?(alessio.placitelli)

We assumed that geckoview would be generating a consistent stream of data that
would keep the batches from getting much staler than their default 5s.

Turns out we're far too efficient, so the data dries up.

To still catch data stale in the buffer, we now send the batch when the app
goes to background (on the "application-background" topic).

(In reply to Alessio Placitelli [:Dexter] from comment #5)

(In reply to Chris H-C :chutten from comment #4)

Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:

  • Start a "Just in Case" timer when we start a batch (let's say, 60s)

This will restart the timer as well, right? So that it will be 60s since the latest sample?

Nope. The goal is to put an upper bound of 60s for the oldest/most stale data in the batch.

  • Clear the timer whenever we send the batch
  • Send the batch when we go to background

This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.

This looks good to me. Just to be clear: this means that we'll wake up at least once in the worst case of some consumer accumulating only a handful of metrics and then nothing else after going to background.?

Nope. If a consumer accumulates a handful of metrics and goes to the background, then on the background we'll send the batch and clear the timer. We won't wake up at all.

But if any accumulations happen after that batch is sent (ie, while the application is in the background, or after it comes back into the fg) then we'll have a 60s timer ensuring that the accumulations wait at most that long before being sent.

Make sense?

Make sense?

It does!

Duplicate of this bug: 1612282
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2a906606a73e
Send GV Telemetry batches on application background r=janerik
https://hg.mozilla.org/integration/autoland/rev/6235a81b9882
Enforce max staleness of 60s in GV Streaming Telemetry r=janerik
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

ni?alessio for a manual qa pass (presuming this has made it into Fenix by now)

Flags: needinfo?(alessio.placitelli)

Well, due to our well known circumstances this is now way past overdue and made it into the wild. Clearing the ni?.

Flags: needinfo?(alessio.placitelli)
You need to log in before you can comment on or make changes to this bug.