Closed Bug 411424 Opened 17 years ago Closed 15 years ago

Need to have a report for MTBF per build

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: samuel.sidler+old, Assigned: ozten)

References

()

Details

Attachments

(2 files, 2 obsolete files)

Reported by morgamic, Nov 29, 2007

Need to add a report for average mean time between failure.
Priority: -- → P2
morgamic, ok to move this to P1 now?   

lets hash out how we can do the computation of the number and report it.

in the past I think we have just added up all the time-of-crash minus browser-start-time numbers for each black box for a specific release to come up with the total number of hours run; then we divided that number by the number of crashes.

sample size has been last 10 days, but we could switch to two weeks if we think that has some value.

for example a report like this would allow us to directly compare to 2.x releases.

 Total blackboxes in this sample:  288999
 Total unique users:  147090
 MTBF For these builds is estimated at 25.625648 hours,
 based on 273144 reports and 6999491.939167 hours of user testing

and we should do any needed sanity checking and clean up of the db and the sample before we do the calculations as in bug 422549
ken, let's work out for how we want to calculate this.

I think the old crash reporting system was basically using something like this.

-pull a sample for all blackboxes or a particular release (e.g. grab all the reports for windows, mac, and linux build numbers for the release like firefox 3.0 beta 4, or final)
-throw out any outliers like 0 or negative time since start up, or anything that looked like a duplicate submission
-add up all the time since start up times.
-divide by the number of blackboxes in the sample.


jay can confirm what we used in the custom report we built for past tracking.

sample size has been " 10 days of data " previously but there might be good reasons to move to a two week window since we know active users drop over weekends and might create wiggles in the reporting as a different kind of user base comes in and out on weekends.
and of course he holy grail on this is to tie it into AUS active user data.  We will be eye-balling a correlation between active users and total crashes received until we can get the two tied together with automation, but that shouldn't hold us up for now.

right now the fact that we are only receiving crashes from the users that "opt in" can give us a distorted view of what is going on, but having that same distortion applied across multiple releases has yeilded valuable feedback...

e.g.   
14 days into beta 4 MTBF was 30.3 hours
14 days into beta 5 MTBF was 35.5 hours

so we must have fixed the right set of top crashers to improve stability and not not introduced any crash regressions.   those are really the kind of numbers we are after here.

we also used to have graphs that aligned the releases to show the changes for MTBF over time since release.  it would be cool if we could also get those going again at some point

Chofmann and I recently spoke regarding some new stats that would useful:
(1) daily number of crashes.  this is sort of a raw/ignorant look at the data, but it could be helpful.
(2) median tbf + anything else that describes the distribution of failure time.  e.g., is one user responsible for all crashes?, is the tbf normally distributed?, etc.
(3) ratio of comments to crashes
Assignee: nobody → aking
Target Milestone: --- → 0.7
We need a mockup that shows what t his type of report would look like.
Chofmann wants a graph with these properties:
* x-axis: days since releases
* y-axis: hours
* series: release versions only
Attached patch A first cut at MTBF (obsolete) — Splinter Review
Development URL:
http://aking.khan.mozilla.org/reporter/mtbf/of/Firefox/major

Screenshots:
http://people.mozilla.org/~aking/Socorro/mtbf.html

See following attachements with DB schema for more context.
Attachment #351635 - Flags: review?(morgamic)
Attachment #351635 - Flags: review?(lars)
Cron Script:
When run startMtbf.py will populate MTBF facts table for the previous day. Date can be overriden say
startMtbf.py -d 2008-12-01 

Database Changes:
To see more realistic data - look at breakpad_aking DB on Postgres on khan.mozilla.org. that DB shows realistic values in all three tables. I don't have much data to work with, so it is 5 days of data instead several release builds on day 1 through day 30 or 60.

TODO:
I know of a couple bugs, Need indexes on tables, Have a flot redisplay bug, etc but wanted to get a review.

Thanks.
Attached patch A cleaner patch. Minor updates. (obsolete) — Splinter Review
Attachment #351635 - Attachment is obsolete: true
Attachment #353470 - Flags: review?(morgamic)
Attachment #353470 - Flags: review?(lars)
Attachment #351635 - Flags: review?(morgamic)
Attachment #351635 - Flags: review?(lars)
is the plan to have thunderbird be one of the products reported?
I don't have a firm plan around products and versions.

If you give me versions and start dates then I will set this up.
Optionally you can give me end dates or 60 days will be default.

Example(made up data):
Thunderbird
2.0.0.19 - 12/10 - major release
2.0.0.20 - 1/10/2009 - major release
3.0a3 - 9/12 - milestone release
3.0b2pre - 11/15 - developer release

etc

I will be getting this info for Firefox from S.S, but I don't have any other data or person for any other products yet.
Setting this up for Thunderbird would be fantastic.  

I think all the data for released version is likely to be available on the Release pages linked to from <https://wiki.mozilla.org/Releases/>.  It would be great to track all the Thunderbird 3 releases there (3.0a1, 3.0a2, 3.0a3, 3.0b1).
At least the last several Thunderbird 2 releases would be very helpful as well.

I believe our branch nightlies are 3.0b2pre and our trunk nightlies are 3.1a1pre.  gozer probably has exact start dates for those.

60 days sounds like a perfectly reasonable default to start with.

Thanks!
Attachment #353470 - Attachment is obsolete: true
Attachment #353593 - Flags: review?(morgamic)
Attachment #353593 - Flags: review?(lars)
Attachment #353470 - Flags: review?(morgamic)
Attachment #353470 - Flags: review?(lars)
Here are my comments for the reporter changes.

- the data should be listed in a table under the graph in case scaling makes it hard to interpret
- the major/milestone/development links shouldn't rotate, all three should be visible at all times
- text for top nav should be "Release type: Major Milestone Development"

More on table layout:
# Firefox 3.0- MTBF 13010 seconds based on 50103 crash reports of 32726 users (blackboxen) from period between 2008-08-01 and 2008-11-20
# Firefox 3.0.1- MTBF 250139 seconds based on 765446 crash reports of 496840 users (blackboxen) from period between 2008-08-01 and 2008-11-20
# Firefox 3.0 Win- MTBF 10119 seconds based on 39161 crash reports of 24196 users (blackboxen) from period between 2008-08-01 and 2008-11-20

Should be changed to:
Product | Version | OS | MTBF | # Reports | # Users | Start | End

That was UX stuff, looking at PHP code.
Indentation is messed up in load_product_info().  Looks like there are tabs mixed in with spaces, so the code is littered with some indentation issues.

Question - for the zero-case (no data) seems like some of the behavior is to show an empty white box -- is that expected?

Functionally, it works for me, so let's move forward and iterate on it.
Attachment #353593 - Flags: review?(morgamic) → review+
This code is checked in and scheduled to be released tonight.
r751 with some initial configuration checked in under r753.
Status: NEW → ASSIGNED
I'm not such how much history you have, but I'd like to do MTBF for the following builds:
  * Firefox 3.0.3 (starting Sept 24)
  * Firefox 3.0.4 (starting Nov 5)
  * Firefox 3.0.5 (starting Dec 10)
  * All Firefox 3.0.x pre builds starting with 3.0.4pre (start these when
    3.0.[n-1] started; i.e., start 3.0.4pre on Sept 24)
  * Firefox 3.1b1 (starting Oct 7)
  * Firefox 3.1b2 (starting Dec 1)
  * All Firefox 3.1pre builds starting with 3.1b2pre (starting Oct 7)

For Thunderbird, do the following builds:
  * Thunderbird 3.0a3 (starting Oct 7)
  * Thunderbird 3.0b1 (starting Dec 2)
  * Thunderbird 3.0b1pre (starting Oct 7)
  * Thunderbird 3.0b2pre (starting Nov 28)

If you have data prior to Sept 24 (when the first one of these starts), let me know and we can add more, but this is a great start.
(In reply to comment #15)
> At least the last several Thunderbird 2 releases would be very helpful as well.

Thunderbird 2 can't be done in this style since it's Socorro dependent, but you look at MTBF for Thunderbird 2 builds at:

  http://talkback-public.mozilla.org/reports/thunderbird/

Simply select a release (e.g., Thunderbird 2.0.0.18) and under "Smart Analysis" on the left side, select "All Platforms". MTBF appears at the top of the smart analysis report. Note: This isn't comparing apples to apples since the crash reporting is very different between 1.8 and 1.9.
Oh, and 60-day default is a good start. We can start specifying end-dates as needed later (I'll let you know what those are when we get there). Let's get this going! :)
What about SeaMonkey? 2.0 alpha 1 and 2 have been released by now, so I suppose the following SeaMonkey builds (or build families) could be added to the list (subject, I suppose, to some agreed-upon time-limit such as that in comment #22).
2.0a1pre
2.0a1
2.0a2pre
2.0a2
2.0a3pre

Also, what about Firefox 3.2a1pre, which is already coming out in the form of nightlies? AFAIK, they're the only builds already being done based on Gecko 1.9.2.

Not sure how much statistical data would be available as yet, but wouldn't it be worth while to have the MTBF reports up and rolling by the time Sm 2.0 and/or Fx 3.2 are ready for a release, or maybe even for a beta?
Depends on: 470621
Depends on: 470622
Austin, I filed a couple of follow ups to look at since some of this is live already. See the "Depends On" field.
(In reply to comment #23)
> What about SeaMonkey? 2.0 alpha 1 and 2 have been released by now, so I suppose
> the following SeaMonkey builds (or build families) could be added to the list
> (subject, I suppose, to some agreed-upon time-limit such as that in comment
> #22).
> 2.0a1pre
> 2.0a1
> 2.0a2pre
> 2.0a2
> 2.0a3pre
> 
> Also, what about Firefox 3.2a1pre, which is already coming out in the form of
> nightlies? AFAIK, they're the only builds already being done based on Gecko
> 1.9.2.
> 
> Not sure how much statistical data would be available as yet, but wouldn't it
> be worth while to have the MTBF reports up and rolling by the time Sm 2.0
> and/or Fx 3.2 are ready for a release, or maybe even for a beta?

I am happy to add these to the MTBF reports. I need start dates which is "day 0" for calculating uptime.

I will add SeaMonkey 2.0a2 and 2.0a3pre to the top crash by url reports also.

As for 3.2a1pre is the Product Minefeild or Firefox?
I still need two more pieces of information for all the SeaMonke builds.

1) major|milestone|dev
2) start and end dates (60 days)

I've taken a guess at these. Please fill in and confirm.

SeaMonkey 2.0a1pre, developer, ?? - ??
SeaMonkey 2.0a1, milestone, 2008-10-05, 2008-12-03
SeaMonkey 2.0a2pre, developer, ?? - ??
SeaMonkey 2.0a2, milestone, 2008-12-10, 2009-02-07
SeaMonkey 2.0a3pre, developer, ?? - ??
SeaMonkey 2.0a1pre, developer, 2007-07-09 - (60 days)
SeaMonkey 2.0a1, milestone, 2008-10-05, 2008-12-03
SeaMonkey 2.0a2pre, developer, 2008-09-25 - (60 days)
SeaMonkey 2.0a2, milestone, 2008-12-10, 2009-02-07
SeaMonkey 2.0a3pre, developer, 2008-12-02 - (60 days)
Adding dependency on 477914 which has the SeaMonkey update. Will schedule a push with IT after SQL, shell script review.
Depends on: 477914
(In reply to comment #29)
> SeaMonkey 2.0a1pre, developer, 2007-07-09 - (60 days)
> is either wrong, or we don't have data for it back in 2007.

I got to this date by trying to find out since when SeaMonkey had crashreporter support, but it may not have worked correctly from the start. Can we find out when we got the first SeaMonkey 2.0a1pre crash reports and start the window with that?

Also, we started the 2.0b1pre dev cycle on 2009-02-19 and released the 2.0a3 milestone yesterday, what's the process for getting those added?
Please open a new bug for MTBF entries.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Attachment #353593 - Flags: review?(lars)
not sure we are still planning to do this but it appear that we also have values like

  "Install Age"	7057413 seconds (11.7 weeks) since version was first installed.

We should also integrate that into the calculation, or a parallel metric that reports the lower value of TimeSinceLastCrash or InstallAge  to produce  MTBF_For_Current_Build

This would be a a bit different number that total MTBF, but also useful to understanding the time between failure on individual builds.
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: