Closed Bug 1365978 Opened 7 years ago Closed 7 years ago

Validate sending shutdown pings using the PingSender [Beta]

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: Dexter, Assigned: Dexter)

References

Details

(Whiteboard: [measurement:client])

As done with Nightly in bug 1343282, we should check how the pingsender behaves with the shutdown ping once the code reaches Beta.

The analysis from nightly lives here: https://github.com/mozilla/mozilla-reports/blob/master/projects/main_ping_delays_pingsender.kp/knowledge.md
Blocks: 1343277
Priority: -- → P3
Whiteboard: [measurement:client]
Priority: P3 → P2
The current analysis, in review, lives at https://github.com/mozilla/mozilla-reports/pull/66
Assignee: nobody → alessio.placitelli
Status: NEW → ASSIGNED
Priority: P2 → P1
The analysis was reviewed and merged, it's now available at: http://reports.telemetry.mozilla.org/post/projects/main_ping_delays_pingsender_beta.kp (1 week of data).

Key findings:

- we get 85% of the shutdown main pings within an hour from their generation (vs only 25% without the pingsender);
- we get as much as 95% of the pings within 8 hours, a threshold that is not reached before 90 (NINETY!) hours without the pingsender.

@Ehsan, the analysis also contains the information about the shutdown duration, which *includes* the overhead introduced by CreateProcess on Beta (see bug 1343282 comment 21). The data shows that the time it takes to shut down Firefox increased a tiny bit on Beta too:

- nightly values were +1ms (50 percentile), +2ms (90 percentile), +4ms (99 percentile);
- beta values are +2ms (50 percentile), +7ms (90 percentile), +8ms (99 percentile).

It's still under 10ms, so I don't think it's a real threat to performance or perceptible by users.

(In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from bug 1343282 comment #11)
> ...
> Mike, Ehsan: IMO if this feature gets landed and has a larger-than-expected
> shutdown regression (  >=5ms ), we back it out.

@Jeff, in bug 1343282 you mentioned that you wanted us to disable the pingsender in case the delay was greater than 5ms on Beta. The maximum regression we're getting from Beta is 8ms (see above). However, I think the latency benefits out-weight the small shutdown duration regression and that the pingsender should still be left active.

Let me know if you have concerns about that or if we need to talk this through a bit more!
Flags: needinfo?(jgriffiths)
Flags: needinfo?(ehsan)
(In reply to Alessio Placitelli [:Dexter] from comment #2)
...
> @Jeff, in bug 1343282 you mentioned that you wanted us to disable the
> pingsender in case the delay was greater than 5ms on Beta. The maximum
> regression we're getting from Beta is 8ms (see above). However, I think the
> latency benefits out-weight the small shutdown duration regression and that
> the pingsender should still be left active.
> 
> Let me know if you have concerns about that or if we need to talk this
> through a bit more!

Thanks for raising this - I did want to revisit if we saw longer pings. 8ms still seems reasonable to me.
Flags: needinfo?(jgriffiths)
Yeah I agree, I don't think this is a huge reason for concern (especially given what we are getting in return, these are impressive results BTW, congrats to you and the team!)

It's curious that the overhead of the CreateProcess() is different on the different channels.  I do expect that to have a correlation with the different profile of software that users can have installed on the different channels.  I *think* (but I can be wrong here) that software such as AntiVirus packages can hook into the CreateProcess pipeline to make it even slower than it is, for example by scanning the executable forcefully before CreateProcess is allowed to return(!), so this difference is interesting even though I can't directly explain it.

If you ever happened to run a similar study that revealed the overhead of the CreateProcess() call on the release channel, I'm very interested to know about its results as well.  :-)  Thanks again.
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #4)
> Yeah I agree, I don't think this is a huge reason for concern (especially
> given what we are getting in return, these are impressive results BTW,
> congrats to you and the team!)

Thanks Ehsan!

> It's curious that the overhead of the CreateProcess() is different on the
> different channels.  I do expect that to have a correlation with the
> different profile of software that users can have installed on the different
> channels.  I *think* (but I can be wrong here) that software such as
> AntiVirus packages can hook into the CreateProcess pipeline to make it even
> slower than it is, for example by scanning the executable forcefully before
> CreateProcess is allowed to return(!), so this difference is interesting
> even though I can't directly explain it.

Yeah, I see your point here. If you feel like this should be investigated more, maybe we can craft a temporary probe to record the process creation times for a couple of releases. Feel free to file a bug if you're interested!

> If you ever happened to run a similar study that revealed the overhead of
> the CreateProcess() call on the release channel, I'm very interested to know
> about its results as well.  :-)  Thanks again.

Sure, I'll keep you posted!
(In reply to Alessio Placitelli [:Dexter] from comment #5)
> > It's curious that the overhead of the CreateProcess() is different on the
> > different channels.  I do expect that to have a correlation with the
> > different profile of software that users can have installed on the different
> > channels.  I *think* (but I can be wrong here) that software such as
> > AntiVirus packages can hook into the CreateProcess pipeline to make it even
> > slower than it is, for example by scanning the executable forcefully before
> > CreateProcess is allowed to return(!), so this difference is interesting
> > even though I can't directly explain it.
> 
> Yeah, I see your point here. If you feel like this should be investigated
> more, maybe we can craft a temporary probe to record the process creation
> times for a couple of releases. Feel free to file a bug if you're interested!

Nah, I was just thinking out loud / documenting this somewhere.  :-)  I don't think we need to spend more energy on this at the moment.
Closing this as Fixed, as the discussion settled and we decided to retain the pingsender on Beta.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.