Closed Bug 1612283 Opened 4 years ago Closed 4 years ago

Use timers and lifecycle events to flush the GV -> Glean EXTRACT buffer

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla75

Tracking Flags:

Tracking

Status

firefox75

---

fixed

People

(Reporter: mdroettboom, Assigned: chutten)

References

Details

(Whiteboard: [telemetry:glean-rs:m?])

Attachments

(2 files)

Bug 1612283 - Send GV Telemetry batches on application background r?janerik! 4 years ago Chris H-C :chutten 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1612283 - Enforce max staleness of 60s in GV Streaming Telemetry r?janerik 4 years ago Chris H-C :chutten 47 bytes, text/x-phabricator-request		Details \| Review

Michael Droettboom [:mdroettboom]

Reporter

Description

•

4 years ago

We could use a combination of timers and lifetime events to flush the EXTRACT buffer to reduce the likelihood of losing metrics in the buffer. Care would need to be taken so this doesn't have performance impact or unnecessary wakeups.

:dexter added: "What about this: when we start,we kick off a timer. When the timer hits, we flush (if there's data in the batch). When going to background, we kill the timer. When going to foreground, we recreate the timer (so that we don't mistakenly wake up the app)."

This should hopefully reduce the amount of telemetry that is lost in the buffer.

Alessio Placitelli [:Dexter]

Comment 1

•

4 years ago

Chris, how do you feel about this solution?

Flags: needinfo?(chutten)

Chris H-C :chutten

Assignee

Comment 2

•

4 years ago

Sounds good to me. I'm tempted to restart the timer if there's an accumulation after we're backgrounded so we can guarantee an upper bound of staleness universally... but I'm loathe to be the reason for a wakeup (especially on mobile), so I think this is the better choice.

Do we have information about how GV is notified about lifecycle events? This is all precluded on GV Streaming Telemetry being able to, by some mechanism (observers?) learn about these lifecycle state changes.

Flags: needinfo?(chutten) → needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Updated

•

4 years ago

Updated

•

4 years ago

Component: Glean: SDK → Telemetry

Flags: needinfo?(alessio.placitelli)

Priority: P3 → --

Product: Data Platform and Tools → Toolkit

Alessio Placitelli [:Dexter]

Comment 3

•

4 years ago

(In reply to Chris H-C :chutten from comment #2)

Sounds good to me. I'm tempted to restart the timer if there's an accumulation after we're backgrounded so we can guarantee an upper bound of staleness universally... but I'm loathe to be the reason for a wakeup (especially on mobile), so I think this is the better choice.

Yes, if we can have timers work like, that, I like the idea of the universal staleness :-)

Do we have information about how GV is notified about lifecycle events? This is all precluded on GV Streaming Telemetry being able to, by some mechanism (observers?) learn about these lifecycle state changes.

Yes, we do :) I just asked :esawin and :snorp and they pointed me to this code: apparently the application-background and application-foreground are the notifications we're looking for.

Chris H-C :chutten

Assignee

Updated

•

4 years ago

Assignee: nobody → chutten

Status: NEW → ASSIGNED

Type: defect → enhancement

Points: --- → 2

Priority: -- → P1

Chris H-C :chutten

Assignee

Comment 4

•

4 years ago

Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:

Start a "Just in Case" timer when we start a batch (let's say, 60s)
Clear the timer whenever we send the batch
Send the batch when we go to background

This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.

What do you think, Alessio? Does this design meet your needs?

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Comment 5

•

4 years ago

(In reply to Chris H-C :chutten from comment #4)

Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:

Start a "Just in Case" timer when we start a batch (let's say, 60s)

This will restart the timer as well, right? So that it will be 60s since the latest sample?

Clear the timer whenever we send the batch

Send the batch when we go to background

This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.

This looks good to me. Just to be clear: this means that we'll wake up at least once in the worst case of some consumer accumulating only a handful of metrics and then nothing else after going to background.?

Flags: needinfo?(alessio.placitelli)

Chris H-C :chutten

Assignee

Comment 6

•

4 years ago

Attached file Bug 1612283 - Send GV Telemetry batches on application background r?janerik! — Details

We assumed that geckoview would be generating a consistent stream of data that
would keep the batches from getting much staler than their default 5s.

Turns out we're far too efficient, so the data dries up.

To still catch data stale in the buffer, we now send the batch when the app
goes to background (on the "application-background" topic).

Chris H-C :chutten

Assignee

Comment 7

•

4 years ago

Attached file Bug 1612283 - Enforce max staleness of 60s in GV Streaming Telemetry r?janerik — Details

Depends on D61837

Chris H-C :chutten

Assignee

Comment 8

•

4 years ago

(In reply to Alessio Placitelli [:Dexter] from comment #5)

(In reply to Chris H-C :chutten from comment #4)

Alrighty, I've thought about the design and what we can accomplish with the minimal state. How about this:

Start a "Just in Case" timer when we start a batch (let's say, 60s)

This will restart the timer as well, right? So that it will be 60s since the latest sample?

Nope. The goal is to put an upper bound of 60s for the oldest/most stale data in the batch.

Clear the timer whenever we send the batch

Send the batch when we go to background

This way GV streaming telemetry need not maintain any state about the app's "foregroundness" and we end up with a universal guarantee of staleness (60s). If there are background measurements, they will trigger a wakeup 60s later only if there isn't another background measurement between 5s and 60s to trigger the batch.

This looks good to me. Just to be clear: this means that we'll wake up at least once in the worst case of some consumer accumulating only a handful of metrics and then nothing else after going to background.?

Nope. If a consumer accumulates a handful of metrics and goes to the background, then on the background we'll send the batch and clear the timer. We won't wake up at all.

But if any accumulations happen after that batch is sent (ie, while the application is in the background, or after it comes back into the fg) then we'll have a 60s timer ensuring that the accumulations wait at most that long before being sent.

Make sense?

Alessio Placitelli [:Dexter]

Comment 9

•

4 years ago

Make sense?

It does!

Pulsebot

Comment 11

•

4 years ago

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2a906606a73e
Send GV Telemetry batches on application background r=janerik
https://hg.mozilla.org/integration/autoland/rev/6235a81b9882
Enforce max staleness of 60s in GV Streaming Telemetry r=janerik

Natalia Csoregi [:nataliaCs]

Comment 12

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2a906606a73e
https://hg.mozilla.org/mozilla-central/rev/6235a81b9882

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox75: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla75

Chris H-C :chutten

Assignee

Comment 13

•

4 years ago

ni?alessio for a manual qa pass (presuming this has made it into Fenix by now)

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Comment 14

•

4 years ago

Well, due to our well known circumstances this is now way past overdue and made it into the wild. Clearing the ni?.

Flags: needinfo?(alessio.placitelli)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Use timers and lifecycle events to flush the GV -> Glean EXTRACT buffer

Categories

(Toolkit :: Telemetry, enhancement, P1)

Tracking

()

People

(Reporter: mdroettboom, Assigned: chutten)

References

Details

(Whiteboard: [telemetry:glean-rs:m?])

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Updated

Updated

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 11

Comment 12

Comment 13

Comment 14

Attachment

General

Description

File Name

Content Type