Closed Bug 1251623 Opened 4 years ago Closed 2 years ago

Crash pings apparently getting sent faster than main pings

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: kairo, Unassigned)

References

(Blocks 1 open bug)

Details

See bug 1222890 comment #48 - "Telemetry is registering crash pings quicker than main."

In all the stats on crashes I have seen, we also seem to always see higher crash rates (count of crash pings divided by subsessionLength of main pings) for more recent days than for previous days, which also could point to that.

Given that a lot of my interest is getting data fast after a build has been released to either qualify it or react to issues ASAP, having crash counts match data from main pings is important.
Flags: needinfo?(gfritzsche)
This is inherently finicky, just as the crash-stats and ADI have different incoming times. However, I'm skeptical that there is a major problem here.

We send a crash ping on next startup after the crash (after a 60-second delay).
We should send the matching aborted-session main ping also at startup, although after a slightly longer delay perhaps.

We roll both of those up daily, so it shouldn't be that much different by the time it hits the dashboard.

It's worth verifying the incoming crash and main ping dates for a sample, using the longitudinal datasets, once we have a longitudinal dataset that includes crash pings.

I'm actually more worried about the possibility that having recently updated *causes* crashes, and that's why we're seeing higher crash rates soon after each build is released. The crash rate for release is typically 4-5, and for beta it's 9-10. What if updating twice a week *causes* a much higher crash rate on beta?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> I'm actually more worried about the possibility that having recently updated
> *causes* crashes, and that's why we're seeing higher crash rates soon after
> each build is released. The crash rate for release is typically 4-5, and for
> beta it's 9-10. What if updating twice a week *causes* a much higher crash
> rate on beta?
bug 1134620, these are crashes reported to socorro in the old version. Maybe telemetry is counting them as newer?
Johnathan, I don't understand your comment. "newer" than what?
Flags: needinfo?(jonathan)
Say someone running b6 gets this crash when updating to b10. Socorro gets the record linked to b6. I have no idea if telemetry will also be counting this as coming from b6 or linking it to b10. (Crash probably more likely with full upgrade hence I use b6 and not b9.)
Flags: needinfo?(jonathan)
It should be linked to b6. https://gecko.readthedocs.org/en/latest/toolkit/components/telemetry/telemetry/crash-ping.html see how we record the "crashing enviroment" if that's available, and it should be available unless the crash happens within the first 60 seconds of the session.
Flags: needinfo?(gfritzsche)
The latency of "main" vs. "crash" pings was discussed above and will always be an issue, although bug 1120372 and bug 1120370 will help with uptake numbers.

Can we close this or is there a separate issue here?
Blocks: 1257321
This seems to have settled, i didn't hear about specific needs or concerns on this.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.