Closed Bug 915317 Opened 11 years ago Closed 11 years ago

Display total volume of (Nightly) GC crashes over time

Categories

(Socorro :: Webapp, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kairo, Assigned: espressive)

References

Details

(Whiteboard: [QA+])

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #803209 +++

The actually important outcome of that bug would have been some display of the total volume of GC crashes on a certain version (Nightly, most importantly) over (build) time to be able to detect regressions and spikes happening on certain build dates.
Ideally by buildid instead of by date, as in bug 915246
I do not want Scorro to go into doing anything in smaller units than full days, not even for build time.
I don't understand what that comment means. Whenever we have reports focused on nightly/aurora regression ranges, we should switch them to be by buildid instead of by date. Our current system of measuring crashes on nightly and aurora by date instead of build introduces delay and uncertainty in measuring regression/fix ranges. So I want to insist that all new reports of this sort group and order by buildid first.
I was talking about build date, with a day granularity, in comment #0, and that's what I am rooting for being implemented. I just don't want to have time invested into figuring out the multiple-builds-per-day issue that comes when you go with the buildid itself.
Assignee: nobody → schalk.neethling.bugs
Schalk: what's the status here?
This should just need UI work at this point. The MW/cron should already exist.
(In reply to Brandon Savage [:brandon] from comment #6)
> This should just need UI work at this point. The MW/cron should already
> exist.

Is there any UI for this at the moment or is it all new?
Couple of questions:

1) Where does this fit into the current Socorro workflow/pages?
2) If this is a new page, can someone give me a rough idea of what we need to display? Is it a graph displaying total volume of (Nightly) GC crashes over time on a page much similar to crash trends?
3) So from the title it sounds like this will only be for Nightly but, is this for all nightlies or only featured.
4) I assume this will also be for all products including Thunderbird and Firefox OS?

Thanks!
Flags: needinfo?(kairo)
Flags: needinfo?(bsavage)
This is a new report.

This should be a graph of GC crashes/ADU for each nightly, as in bug 915246 except rather than for a signature it's for all crashes with that specific annotation.

For now, this is for Firefox nightly only. None of the other products has enough volume of nightly users to produce meaningful results. Maybe as a future step it would be good to have trends of GC crashes/ADU for other builds (beta/release), but since you can't graph them the same way it's less interesting. The primary purpose of this report is to monitor whether changes in nightly are causing memory corruption/GC issues that we can't catch with normal by-signature reports because they would be in many different signatures.
Flags: needinfo?(kairo)
Target Milestone: --- → 71
Target Milestone: 71 → 70
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/0a04b23ab6b7ad2f877f3b31a8ce2de2b9014eda
Fix Bug 915317 UI, display total volume of gc crashes over time
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
(In reply to Bill McCloskey (:billm) from comment #13)
> How do we get to this page?

This is unlinked from the menu and will be on stage in the next 20 minutes or so:

https://crash-stats.allizom.org/gccrashes/products/Firefox
Whiteboard: [QA+]
(In reply to Robert Helmer [:rhelmer] from comment #14)
> (In reply to Bill McCloskey (:billm) from comment #13)
> > How do we get to this page?
> 
> This is unlinked from the menu and will be on stage in the next 20 minutes
> or so:
> 
> https://crash-stats.allizom.org/gccrashes/products/Firefox

It's up now, but looks like there's a CSRF problem :/ I'll take a look.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/0a0502cc735c0a9c6119175b1131876eb4e39ad0
bug 915317 - add csrf protection to gccrash page

https://github.com/mozilla/socorro/commit/d3e826f09ebe386d18373eda9e9538cf459e3e98
Merge pull request #1756 from rhelmer/bug915317-gccrashes-fix-csrf

bug 915317 - add csrf protection to gccrash page
I just pushed a fix for the CSRF problem, I am seeing any data but this may depend on reports that have not run yet etc.

needinfo'ing Schalk for info on the status of this feature.
Flags: needinfo?(schalk.neethling.bugs)
(In reply to Robert Helmer [:rhelmer] from comment #17)
> I just pushed a fix for the CSRF problem, I am seeing any data but this may
> depend on reports that have not run yet etc.
> 
> needinfo'ing Schalk for info on the status of this feature.

s/I am seeing any data/I am *not* seeing any data/?
(In reply to Stephen Donner [:stephend] from comment #18)
> (In reply to Robert Helmer [:rhelmer] from comment #17)
> > I just pushed a fix for the CSRF problem, I am seeing any data but this may
> > depend on reports that have not run yet etc.
> > 
> > needinfo'ing Schalk for info on the status of this feature.
> 
> s/I am seeing any data/I am *not* seeing any data/?

Yes, thank you :)
:rhelmer and :stephend So, I guess there is a little confusion here, which is good ;) So, the page in questions for this bug 'Display total volume of (Nightly) GC crashes over time' is indeed as Rob indicated but, that is not the bug myself and Rob has been working on.

The one that should be targeted for 70 is https://bugzilla.mozilla.org/show_bug.cgi?id=915373 as the PR suggests. To get to this page, the URL is: https://crash-stats.allizom.org/topcrasher_ranks_bybug/
Flags: needinfo?(schalk.neethling.bugs)
Flags: needinfo?(bsavage)
As far as this bug goes, I am not sure if there is middleware for it yet.
Flags: needinfo?(rhelmer)
(In reply to Schalk Neethling [:espressive] from comment #20)
> The one that should be targeted for 70 is
> https://bugzilla.mozilla.org/show_bug.cgi?id=915373 as the PR suggests. To
> get to this page, the URL is:
> https://crash-stats.allizom.org/topcrasher_ranks_bybug/

That one has nothing to do with the bug here, it's for bug 915373, which is something completely different.
Although part of this has landed, I mistakenly moved it to milestone 70. It will not be ready until 71, sorry for the noise!
Flags: needinfo?(rhelmer)
Target Milestone: 70 → 71
Target Milestone: 71 → 72
Target Milestone: 72 → 73
Any new updates on this feature? I am not sure what, if anything, is left to be done from a front end perspective. Thanks!
Target Milestone: 73 → 71
A lot of major changes are being made to the GC right now and we really need this feature. Can anything be done to prioritize this? I went to this link:
  https://crash-stats.allizom.org/gccrashes/products/Firefox
but there's no data there.

Also, as Benjamin said, it's really important that this data be broken down by buildid.
Do we have middleware for this? I believe that is all that is missing for this bug. Rob, you know anything about that?
Flags: needinfo?(rhelmer)
Depends on: 967593
(In reply to Schalk Neethling [:espressive] from comment #26)
> Do we have middleware for this? I believe that is all that is missing for
> this bug. Rob, you know anything about that?

Filed bug 967593 to track this.
Flags: needinfo?(rhelmer)
Attachment #8370340 - Flags: review?(peterbe)
Target Milestone: 71 → 73
OK I've created a stored procedure to populate a new "gccrashes" matview, and there's a corresponding middleware service and Django model+view. It's live on stage, you can see the output here:

https://crash-stats.allizom.org/gccrashes/json_data/products/Firefox/versions/29.0a1/start_date/2014-01-27/end_date/2014-02-04

The format is:

{
  'hits': 
    [
      [{{ build_id }}, {{ is_gc_count }}]
    ]
  'total': 1
}

You should be able to plot build_id and is_gc_count - I think you can probably drop this directly into flot and get the desired result, although I haven't tried it.
Flags: needinfo?(schalk.neethling.bugs)
Awesome, I am going to work on this and will let you know.
Flags: needinfo?(schalk.neethling.bugs)
Target Milestone: 73 → 74
Attachment #8370340 - Flags: review?(peterbe)
Status: REOPENED → ASSIGNED
hmm so the SP and matview are running find in production now, and I've backfilled a few days to see what it looks like:

https://crash-stats.mozilla.com/gccrashes/json_data/products/FennecAndroid/versions/27.0/start_date/2014-01-27/end_date/2014-02-10

However, there's a problem with the query, I think - note that's returning duplicate build IDs, because it's not exposing the report_date. I *think* what's probably desired here is to GROUP BY build and sum(is_gc_count) - that would give the total for a particular build_id for the date range specified in the URL, but just a single data point would be graphed for each build ID (ignoring report_date in the graph).

Make sense?
Flags: needinfo?(schalk.neethling.bugs)
Hey Rob,

I also saw that for some ranges the last item in the array sent back to the JS is:

[null, 0]

Not sure if it is related?
Flags: needinfo?(schalk.neethling.bugs)
(In reply to Schalk Neethling [:espressive] from comment #33)
> Hey Rob,
> 
> I also saw that for some ranges the last item in the array sent back to the
> JS is:
> 
> [null, 0]
> 
> Not sure if it is related?

We actually get some null build IDs in reports_clean, for some reason :/ I can filter these out if you like.
(In reply to Robert Helmer [:rhelmer] from comment #34)
> (In reply to Schalk Neethling [:espressive] from comment #33)
> > Hey Rob,
> > 
> > I also saw that for some ranges the last item in the array sent back to the
> > JS is:
> > 
> > [null, 0]
> > 
> > Not sure if it is related?
> 
> We actually get some null build IDs in reports_clean, for some reason :/ I
> can filter these out if you like.

That would be great! One check I do not have to do on the client. Thanks!
Target Milestone: 74 → 75
Depends on: 972165
Attached image Screen Shot 2014-02-13 at 22.31.01.png (obsolete) —
Is this what you would need for GCCrashes?
Attachment #8375760 - Flags: review?(kairo)
Here is the data I get/use to plot the graph in the attachment in comment 36:
http://sneethling.pastebin.mozilla.org/4280209
Depends on: 972563
Comment on attachment 8375760 [details]
Screen Shot 2014-02-13 at 22.31.01.png

r- because we need bug 972563 before this makes sense (and you promised a new request when that's done).

That said, it this is total crashes, the decimals are probably superfluous as we don't have half or smaller fractals of crashes. ;-)
Attachment #8375760 - Flags: review?(kairo) → review-
Depends on: 972612
All dependent bugs have landed, but we need migration to run on stage. Once that is done we should have:
* gccrashes per ADU
* only crashes from nightly channel considered
(In reply to Robert Helmer [:rhelmer] from comment #39)
> All dependent bugs have landed, but we need migration to run on stage. Once
> that is done we should have:
> * gccrashes per ADU
> * only crashes from nightly channel considered

This needs to be manually run (I don't have access, needinfo'ing selena - please upgrade to latest alembic version :) )
Flags: needinfo?(sdeckelmann)
(In reply to Robert Helmer [:rhelmer] from comment #40)
> (In reply to Robert Helmer [:rhelmer] from comment #39)
> > All dependent bugs have landed, but we need migration to run on stage. Once
> > that is done we should have:
> > * gccrashes per ADU
> > * only crashes from nightly channel considered
> 
> This needs to be manually run (I don't have access, needinfo'ing selena -
> please upgrade to latest alembic version :) )

Oh! Actually I checked and I *do* have access.
OK there's a bit of cleanup work I need to do wrt DB migrations in bug 972612, but I went ahead and tested the migration waiting for review on stage and it looks like it did the right thing (just a matter of using alembic correctly so the column type gets altered).

Schalk, please give it a try now, the following should work on stage:

(In reply to Robert Helmer [:rhelmer] from comment #39)
> * gccrashes per ADU
> * only crashes from nightly channel considered
Flags: needinfo?(sdeckelmann) → needinfo?(schalk.neethling.bugs)
(In reply to Robert Helmer [:rhelmer] from comment #42)
> OK there's a bit of cleanup work I need to do wrt DB migrations in bug
> 972612, but I went ahead and tested the migration waiting for review on
> stage and it looks like it did the right thing (just a matter of using
> alembic correctly so the column type gets altered).
> 
> Schalk, please give it a try now, the following should work on stage:
> 
> (In reply to Robert Helmer [:rhelmer] from comment #39)
> > * gccrashes per ADU
> > * only crashes from nightly channel considered

Hmmm so now, I am getting the following data with a call such as:

http://127.0.0.1:8000/gccrashes/json_data/products/Firefox/versions/30.0a1?start_date=2013-12-13&end_date=2014-02-10

http://sneethling.pastebin.mozilla.org/4286141
Flags: needinfo?(schalk.neethling.bugs) → needinfo?(rhelmer)
(In reply to Schalk Neethling [:espressive] from comment #43)
> (In reply to Robert Helmer [:rhelmer] from comment #42)
> > OK there's a bit of cleanup work I need to do wrt DB migrations in bug
> > 972612, but I went ahead and tested the migration waiting for review on
> > stage and it looks like it did the right thing (just a matter of using
> > alembic correctly so the column type gets altered).
> > 
> > Schalk, please give it a try now, the following should work on stage:
> > 
> > (In reply to Robert Helmer [:rhelmer] from comment #39)
> > > * gccrashes per ADU
> > > * only crashes from nightly channel considered
> 
> Hmmm so now, I am getting the following data with a call such as:
> 
> http://127.0.0.1:8000/gccrashes/json_data/products/Firefox/versions/30.
> 0a1?start_date=2013-12-13&end_date=2014-02-10
> 
> http://sneethling.pastebin.mozilla.org/4286141

Some numbers are very small now, since bug 972612 does is_gc_count / ADI:

breakpad=# select 5 / 110896::real as gc_per_adu;
      gc_per_adu      
----------------------
 4.50872889914875e-05

You can use these numbers directly in JS, in JSON as well:

> var n = 4.50872889914875e-05; n
0.0000450872889914875

> JSON.parse('{"n": 4.50872889914875e-05}');
{ n: 0.0000450872889914875 }

Let me know if I am missing your point, or if this does not make sense!
Flags: needinfo?(rhelmer)
I actually do not know where those numbers come from, but as I said on IRC, we probably can take numbers per million ADI.

That said, how do we get ADI per build ID when we take a table as a source that isn't by build ID, see bug 972612 comment #6?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #45)
> I actually do not know where those numbers come from, but as I said on IRC,
> we probably can take numbers per million ADI.

OK this is doable.

> That said, how do we get ADI per build ID when we take a table as a source
> that isn't by build ID, see bug 972612 comment #6?

Working on that over there (short answer is that we are currently using ADI for all builds on the product/verision/channel which we don't want).
(In reply to Robert Helmer [:rhelmer] from comment #46)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #45)
> > I actually do not know where those numbers come from, but as I said on IRC,
> > we probably can take numbers per million ADI.
> 
> OK this is doable.

Let's review the unit multiplicator (1M, 1k, or whatever) and the amount of decimals in display once we have corrected the data in bug 972612, though, as the order of the numbers might change.
:rhelmer So there is going to be one more PR from you on this right?

:KaiRo Did we decide on dropping the decimal places (if any) or, are we going to use a precision of 2 decimal places?

Thanks!
Flags: needinfo?(rhelmer)
Flags: needinfo?(kairo)
(In reply to Schalk Neethling [:espressive] from comment #48)
> :rhelmer So there is going to be one more PR from you on this right?
> 
> :KaiRo Did we decide on dropping the decimal places (if any) or, are we
> going to use a precision of 2 decimal places?
> 
> Thanks!

Yep making requested changes in blocking bug 972612 (crashes per million ADI, use ADI for specific build ID, a few other related details). Will let you know when it lands and ready on stage!
Flags: needinfo?(rhelmer)
This is as the graph is plotted now with the data after the latest changes by :rhelmer. http://sneethling.pastebin.mozilla.org/4316328
Attachment #8375760 - Attachment is obsolete: true
Attachment #8377027 - Flags: review?(kairo)
Target Milestone: 75 → 76
Comment on attachment 8377027 [details]
Screen Shot 2014-02-17 at 12.13.46.png

This looks pretty good, I think we just should lose the decimals on the axis, though, they don't give us any additional value.
Attachment #8377027 - Flags: review?(kairo) → review+
Flags: needinfo?(kairo)
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/47f0b7c1e05e675165e1534523cb2bc012eb7130
Fix Bug 915317, display total volume of gccrashes

https://github.com/mozilla/socorro/commit/164049c1d436fab863c19b5a57b4633750732618
Merge pull request #1900 from ossreleasefeed/bug915317-display-total-volume-of-gccrashes

Fix Bug 915317, display total volume of gccrashes
Status: ASSIGNED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Steps to QA
-------------

1) Load up the main page
 -- If the selected product is Fireox, there should be an entry for GC Crashes in the Reports drop down.
 -- If any other product is selected, the entry should no exist.

2) Select GC Crashes
 -- A new page should load for the latest nightly release for Fx

JS Validation to test
----------------------

Using the form on the left:

1) If any of the two date fields is set to a future date, clicking Generate should show an error messages stating that dates cannot be in the future.
2) If the from date is greater than the to date, clicking Generate should show an error indicating that the from date should be before the to date
3) If there is no product selected, clicking Generate should show an error indicating that a product should be selected
4) If there is no version selected, clicking Generate should show an error that a version needs to be selected.
5) If a product other than Firefox is selected, a error messages should indicate that the current report only supports Firefox.

If all is well, a graph should be drawn on the right.
QA r?
Flags: needinfo?(mbrandt)
Flags: needinfo?(stephen.donner)
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/0460aeddd5196155bf2b2789a332c357f58cb59e
Fix Bug 915317, display total volume of gccrashes

https://github.com/mozilla/socorro/commit/f23990270616ae03bedecdecb57df9255251bed8
Merge pull request #1914 from ossreleasefeed/bug915317-display-total-volume-of-gccrashes

Fix Bug 976674, use minified files for d3
Is this ready for QA? It looks like there are a few dependencies that are still open.
Flags: needinfo?(mbrandt) → needinfo?(schalk.neethling.bugs)
All dependencies will be fixed by https://github.com/mozilla/socorro/pull/1918 which should be merged latest tomorrow I would think.
Flags: needinfo?(schalk.neethling.bugs)
Flags: needinfo?(stephen.donner)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: