470620 - Create a new stat: crashes per user

Reporter

Description

•

16 years ago

I'd like to have a new report generated that tells us crashes per user. This is different than what can be gathered from the MTBF reports now.

Currently, MTBF reports also tell us: Firefox 3.0.4- MTBF 33425 seconds based on 2200917 crash reports of 1926046 users (blackboxen) from period between 2008-11-05 and 2009-01-03

From which, we can do some quick division and see how many crashes happen per user who reports them.

The new stat I'd like to see would take total crashes on a given day for a given version and divide it by the total number of blocklist pings we had for that day. We should also be able to drill down by OS.

For example (all data is fake), on December 1:
  * 3000 crashes for Firefox 3.0.4 Mac
  * 1m blocklist pings for Firefox 3.0.4 Mac.
  * 3000/1,000,000 = 0.003 crashes per user

Obviously this stat doesn't take into account the number of users who don't report crashes at all, but I don't care about that. I want a number that we can track over time.

I'd also like to graph this stat and provide a tab for the daily data in use.

chofmann might say we should average this data over a ten day window. He might be right. But I'll let him sell it... Doing so would take out the weird bumps we're seeing with MTBF right now. (We should probably move to a day ten day floating window for MTBF as well...)

chris hofmann

Comment 1

•

16 years ago

In sam's example the sessions (1m) that ended in crash reports (3000) is real interesting data and calculating per day would be interesting.

smoothing this by accumulating the date for the last 10 days (or maybe 7 days to avoid the decline in use over weekends) would also be interesting data to look at.

-chofmann

Samuel Sidler (old account; do not CC)

Reporter

Comment 2

•

16 years ago

(In reply to comment #1)
> smoothing this by accumulating the date for the last 10 days (or maybe 7 days
> to avoid the decline in use over weekends) would also be interesting data to
> look at.

Not sure you need to avoid the decline in use over the weekends. The amount of submitted crashes should go down with it, really.

chris hofmann

Comment 3

•

16 years ago

the purpose of the averaging over a longer period is to smooth out the data to see the trends more easily.   having some ten day periods with two weekend, and other ten day periods with only one weekend will introduce noise in the smoothing.

a 7 or 14 day sliding window means we will have more uniform smoothing.  It might also be interesting to show the individual daily calculations as points on the graph, and then show the 7 and 14 day smoothed series as a line using the same color for each release.

Frank Griswold [:griswolf] [:fgriswold]

Comment 4

•

15 years ago

We no longer collect user id or data that is (reasonably easy) to connect to a particular user. Thus we cannot add this functionality.

timeless

Comment 5

•

15 years ago

not precisely accurate.

the clients themselves could be taught to keep track of their crash count and we could do some evil math to try to discard their older crash counts.

Frank Griswold [:griswolf] [:fgriswold]

Comment 6

•

15 years ago

timeless makes a good point in comment 5. We could do other things on the client side, too, such as looking at sequences of crashes per user. We are apparently already able to store crash data on the client (see bug 495700)

Frank Griswold [:griswolf] [:fgriswold]

Comment 7

•

15 years ago

Extending from comment 5 and comment 6. 

We would have to get the client to send us another field in the crash report if we want to summarize such things for our own purposes. Right now, we have header lines for OS, CPU, Crash, Module. We would need one more. Maybe RecentCrashTimes

When it is sent RecentCrashTimes would hold some reasonable number of date-timestamps as known to the client. Some hard count (50?) which could be held in a circular buffer maybe. On the server side, we could count only the ones that are 'recent' by our own definition. If it matters, we can use an offset of servers reports:date_processed versus the most recent stamp to approximate the 'actual' crash dates, since the client won't have the actual date_processed to give us (and we wouldn't trust it anyway)

Samuel Sidler (old account; do not CC)

Reporter

Comment 8

•

15 years ago

I don't care about exact data... That's not what I filed here and when I filed this we no longer had data on individual users.

What I want is exactly as I said in comment 0. Take the total number of crashes submitted on a day for a version and divide it by the total number of blocklist pings for that same 24 hour period.

Let's not over-engineer this.

Samuel Sidler (old account; do not CC)

Reporter

Comment 9

•

15 years ago

chofmann made this graph which is basically what I want (but with more data, of course): http://people.mozilla.com/~chofmann/crash-data/crashes-per-user.png

Michael Morgan [:morgamic]

Updated

•

15 years ago

Whiteboard: cloud

Target Milestone: --- → Future

Michael Morgan [:morgamic]

Comment 10

•

15 years ago

This is probably one of the most important reports we don't have.  Bumping priority.

Assignee: nobody → lars

Severity: normal → critical

Target Milestone: Future → 1.2

Michael Morgan [:morgamic]

Updated

•

15 years ago

Target Milestone: 1.2 → 1.3

chris hofmann

Comment 11

•

15 years ago

changing title.  lets not overload cpu...

chart should look something like this.
http://people.mozilla.com/~chofmann/crash-data/crashes-per-1-users.png

Summary: Create a new stat: cpu (crashes per user) → Create a new stat: crashes per user

Michael Morgan [:morgamic]

Comment 12

•

15 years ago

So pretty!

chris hofmann

Comment 13

•

15 years ago

in looking at data that shows up on metrics, and the times assigned to incoming crash reports during the day it look like they might not be aligned.   

for instance I see 2009-11-10 data all ready on metrics serveral hours before midnight many maybe the result of using a different zone for adu data.

we should try and align the timezones the best we can or at least understand how that might throw things off.

Austin King [:ozten]

Comment 14

•

15 years ago

From CrashKill meeting:

The following wiki page is created manually and will start to be added to the header of the CrashKill notes.

https://wiki.mozilla.org/CrashKill/Crashr
Notes on that wiki page
#crashes / # ADU
Metrics granularity: crashes per 100 users
For throttled builds the metric is adjusted, otherwise the adjusted column is empty

chris hofmann

Comment 15

•

15 years ago

can anyone comment on the possible time align problem mentioned in comment 13 ?

are adu's on GMT and socorro timestamps PST or something like that?

ken kovash

Comment 16

•

15 years ago

ADU's are on GMT.

chris hofmann

Comment 17

•

15 years ago

Its turning out to be hard to use the adu and crash volume numbers right now since adu's are reported in gmt and the time snaps of data I have are pst

See the prototype crash per user report for various releases at 
https://wiki.mozilla.org/CrashKill/Crashr#Release_Snapshots

this has the biggest effect when releases ramp up by a significant number of users in a single day

have we thought about converting all our collection systems to the same timezone?

chris hofmann

Comment 18

•

15 years ago

for the purposes of the crashes per user reports we could 

  just calculate crash numbers using the cut off time of 4pm pacific each day.

or we could

   change socorro over to using gmt 

the later has the advantage of syncing all the data up to adu collection and maybe other data gathering systems.

Samuel Sidler (old account; do not CC)

Reporter

Updated

•

15 years ago

Blocks: 534697

K Lars Lohn [:lars] [:klohn]

Assignee

Updated

•

15 years ago

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

Bugzilla

Quick Search

Create a new stat: crashes per user

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: samuel.sidler+old, Assigned: lars)

References

Details

(Whiteboard: cloud)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Updated

Updated

Updated