Closed Bug 1019262 Opened 10 years ago Closed 10 years ago

implement graphs of ADU per signature in the front end

Categories

(Socorro Graveyard :: Middleware, defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lonnen, Assigned: espressive)

References

Details

(Whiteboard: [QA+])

now that the data source for crashes per adu is exposed in the middleware, django models, and api, its time to finish this with some UI.
Having not looked at this particular bug for some time, I have some questions that I need to get some clarity on before I jump in and start coding so, here goes:

For reference I am using this graph by bsmedberg:
http://benjamin.smedbergs.us/tests/OnMaybeDeQueueOne.svg (From the original bug https://bugzilla.mozilla.org/show_bug.cgi?id=915246#c0)

And then for the JSON this:
http://socorro.readthedocs.org/en/latest/middleware.html#crashes-per-adu-by-signature-service

Looking at the reference graph, I can see a single 'line' of points i.e. crashes/100adu plotted by date. Looking at the JSON however, there are two other points of interest, so to speak i.e. The os_name and the channel.

My question regarding this is:

1) Do we care about this in terms of the graph?
2) If yes, do we need to group by OS or channel?

If [2], would the end goal be to end up with a line graph similar to the current one under the graph tab on report/list? Seeing that in https://bugzilla.mozilla.org/show_bug.cgi?id=915246#c13 the idea is indeed to replace the current graph, I am thinking that answer is yes (Similar graph type, different data).

Assuming [2], this means we will have crashes/100adu plotted by date, grouped by OS (or potentially channel)

In terms of the little modal that displays when one clicks on an individual data point (on the reference graph), am I correct in my assumption below:

JSON
-----

{
    "hits": [
        {
            "signature": "gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)",
            "adu_date": "2014-03-01",
            "build_date": "2014-03-01",
            "buildid": '201403010101',
            "crash_count": 3,
            "adu_count": 1023,
            "os_name": "Mac OS X",
            "channel": "release"
        }...
    ],
    "total": 2,
}

From JSON   | Graph Modal
--------------------------
buildid     => 20130809004002
adu_count   => Users: 42994
crash_count => Crashes: 46


X and Y axis
-------------

X => adu_date
Y => adu_count (Is this already divided by 100 or do I still need to divide this number before plotting on the graph?)

I believe that is all for now, thanks!
Flags: needinfo?(rhelmer)
Flags: needinfo?(benjamin)
(In reply to Schalk Neethling [:espressive] from comment #1)
> Having not looked at this particular bug for some time, I have some
> questions that I need to get some clarity on before I jump in and start
> coding so, here goes:
> 
> For reference I am using this graph by bsmedberg:
> http://benjamin.smedbergs.us/tests/OnMaybeDeQueueOne.svg (From the original
> bug https://bugzilla.mozilla.org/show_bug.cgi?id=915246#c0)
> 
> And then for the JSON this:
> http://socorro.readthedocs.org/en/latest/middleware.html#crashes-per-adu-by-
> signature-service
> 
> Looking at the reference graph, I can see a single 'line' of points i.e.
> crashes/100adu plotted by date. Looking at the JSON however, there are two
> other points of interest, so to speak i.e. The os_name and the channel.
> 
> My question regarding this is:
> 
> 1) Do we care about this in terms of the graph?
> 2) If yes, do we need to group by OS or channel?


As far as I can tell, bsmedberg's implementation restricted crashes to only Windows and produced a graph per channel. So I think it'd be OK for us to generate separate graphs for each as well, unless bsmedberg has specific guidance here.


> If [2], would the end goal be to end up with a line graph similar to the
> current one under the graph tab on report/list? Seeing that in
> https://bugzilla.mozilla.org/show_bug.cgi?id=915246#c13 the idea is indeed
> to replace the current graph, I am thinking that answer is yes (Similar
> graph type, different data).
> 
> Assuming [2], this means we will have crashes/100adu plotted by date,
> grouped by OS (or potentially channel)
> 
> In terms of the little modal that displays when one clicks on an individual
> data point (on the reference graph), am I correct in my assumption below:
> 
> JSON
> -----
> 
> {
>     "hits": [
>         {
>             "signature":
> "gfxContext::PushClipsToDT(mozilla::gfx::DrawTarget*)",
>             "adu_date": "2014-03-01",
>             "build_date": "2014-03-01",
>             "buildid": '201403010101',
>             "crash_count": 3,
>             "adu_count": 1023,
>             "os_name": "Mac OS X",
>             "channel": "release"
>         }...
>     ],
>     "total": 2,
> }
> 
> From JSON   | Graph Modal
> --------------------------
> buildid     => 20130809004002
> adu_count   => Users: 42994
> crash_count => Crashes: 46
> 
> 
> X and Y axis
> -------------
> 
> X => adu_date
> Y => adu_count (Is this already divided by 100 or do I still need to divide
> this number before plotting on the graph?)


I believe this is just the raw ADU number (I just read through the code for build_adu where this comes from and looks like it is straight from raw_adu), so if you want crashes-per-hundred-ADU you need to divide it.


> 
> I believe that is all for now, thanks!


Let me know if I missed anything - I am sure bsmedberg will chime in if this doesn't match his expectations too.
Flags: needinfo?(rhelmer)
In general we're only going to be looking at one particular OS and channel at a time, and we should optimize for that case. But if we want to do something more complex:

* If there are multiple channels, they should be separate lines on the graph.
* If there are multiple OSes, they should be grouped/SUMmed together.
Flags: needinfo?(benjamin)
(In reply to Robert Helmer [:rhelmer] from comment #2)
> so if you want crashes-per-hundred-ADU you need to divide it.

FWIW, in my explosiveness reports, I  found that on per-signature reports, crashes-per-million-ADI works better than per-hundred. But as long as you specify what's being used, all those are easily interconvertible anyhow. :)
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → 91
The schedule for this is as follows:

Ready for review and testing: Monday 23 June, 2014
Ready for push to production: Wednesday 25 June, 2014
Adding in a conversation, with some inline comments and a question, on IRC that cleared up how to move forward on this:

rhelmer: espressive: hmm isn't this going into a new report, how is the report_list table/graph stuff involved?
espressive: rhelmer: this is supposed to replace the current graph
rhelmer: espressive: I thought we'd just have a brand-new page that only showed this graph
espressive: nope
espressive: let me get the orinal bug
espressive: rhelmer: https://bugzilla.mozilla.org/show_bug.cgi?id=915246#c13

firebot: Bug 915246 — FIXED, sdeckelmann — Graphs of crashes/ADU for individual signatures

rhelmer: espressive: oh right you are
rhelmer: espressive: sounds it like it should be additional, not replace the current ones... I'd just ignore the current stuff, leave it in place, implement the new one w/ d3
rhelmer: espressive: we can remove the old ones as a separate bug
espressive: rhelmer: so you mean we create a new page for this graph and remove the graph tab from report/list ?
espressive: as far as I understood this graph will replace the current one under the graph tab on report/list
rhelmer: espressive: hmm yeah I am not sure, that's not how I read it but I think that is a reasonable interpretatino too

rhelmer: bsmedberg: hey when you opened bug 915246, did you intend for the adu-by-signature graph to *replace* the one in the graph tab on the /report/list page
rhelmer: ?
rhelmer: bsmedberg: or for us to have two graphs in there?

bsmedberg: rhelmer: well...
bsmedberg: I have never found that graph useful as it is. But I don't know who uses it.

>>>> KaiRo, do you perhaps know whether there is a need to keep the current graph on report/list's graph tab or, are you fine with removing it?

bsmedberg: This one is strictly superior as long as it's separated by channel.

rhelmer: espressive: ah ok I didn't read the rest of the comments
rhelmer: I am not arguing for the old one :) just wanted to make sure, thanks bsmedberg 

rhelmer: espressive: so if I am understanding right, I'd break it out of it's association with the table partial... you're leaving that tab alone for now, correct?

>> This change will only affect the graph tab.

rhelmer: espressive: I really don't have strong feelings either way to be honest :) this is the first graph we're doing this way instead of flot, I'd do what makes sense in this context and not worry about being consistent with the old graphs right now... I think we'll probably want to switch everything over at some point, but we shouldn't over-engineer

>> There is a bug for that ;) https://bugzilla.mozilla.org/show_bug.cgi?id=1027660

espressive: rhelmer: ok, thanks so much for your input. I reckon I am going to do what feels right and we can take it from there once I open the PR.
rhelmer: espressive: I totally agree
espressive: awesome
Flags: needinfo?(kairo)
If this graph respects the same parameters as the overall report/list page, I'm all for this replacing the current graph. And actually, we could replace the "table" with the data from this graph as well.
Flags: needinfo?(kairo)
Quick question, seeing we are basically going to be showing (data.adu_count / data.crash_count) for a specific date (for one channel, combining all OSes), is there a preference for the type of graph?

I guess the options are, line, scatter, bar? Thoughts?

Just FYI, I have the data all ready to go I believe, so just unsure which of the above is the best/preferred option. Thanks!
Flags: needinfo?(kairo)
Flags: needinfo?(chris.lonnen)
Flags: needinfo?(benjamin)
Either a dot plot as in my sample or a line with per-build dots would be best.
Flags: needinfo?(benjamin)
(In reply to Schalk Neethling [:espressive] from comment #9)
> Quick question, seeing we are basically going to be showing (data.adu_count
> / data.crash_count) for a specific date (for one channel, combining all
> OSes), is there a preference for the type of graph?

It's (data.crash_count / data.adu_count) for a date *range*. :)

That said, I agree with bsmedberg's comment #10.
Flags: needinfo?(kairo)
Houston we may have a problem...

So, this morning I was getting ready to write the tests for the changes to the view and forms. I expected some tests to fail so, I ran the whole lot and presto, a couple of the report_list tests failed. I dug into the possible reasons and also spoke to Adrian about it on IRC but, on further investigation I found the real underlying problem.

One of the mandatory parameters of the ADUBySignature service is channel i.e. nightly, aurora etc. Now, in order to determine which channel a version of a product is, I wrote this little function:

def get_channel_for_release(version):
    api = models.CurrentProducts()
    version_info = api.get(
        versions=version
    )

    return version_info['hits'][0]['build_type']

The problem is, the version parameter to report_list is optional and has no fallback if not specified so, if someone does a search across say, ALL version of Fx, clicking on a signature in the result list, will result in a key error because, the version I pass to this function is null and hence the result set returned from the middleware is different to the one that is returned when a valid version is actually specified.

There is of course no way to infer the version, and hence no way to know which channel, which is needed for the above mentioned service.

One possibility is to check whether a channel exists, and if no channel was passed, I can show a simple message to say that the graph cannot be shown, as no channel/release/build_type was specified.

Will that work? All suggestions and ideas most welcome. Thanks!
Flags: needinfo?(kairo)
Flags: needinfo?(chris.lonnen)
Flags: needinfo?(benjamin)
Flags: needinfo?(adrian)
So, I see multiple possibilities here:
* Ideally, we'd match with the graph what the other tabs are showing, in which case we'd need to request all channels and sum them up accordingly.
* If we can't do that, we need to annotate in the top line of the tab what data selection (channel etc.) we are looking at. That said, graphs across channels and trains don't make much sense (when it's in our code and not a 3rd-party issue), they need to be either along a channel (usual case) or along a train. So, we can go two ways here:
** Either we go and display an error and no graph if multiple channels are affected by the selection. That means that someone analyzing crash data needs to do a complicated dance to get from there to a graph that helps them. That could be eased by showing some links to channel-specific versions of that reports/list so they can actually get to a graph.
** Or we could go and just default to a channel in that case (and annotate that, of course, see above). I think that's what I'd prefer, and I think defaulting to Nightly makes most sense there as "finding a volume regression range" usually means looking for that on Nightly.
Flags: needinfo?(kairo)
It doesn't make any sense to have a graph showing multiple channels in the same line.

My preference would be to just let users pick a channel in a dropdown; in cases where we're unsure, the default should be nightly. And of course the chart legend should say which channel it's showing.

This is intentionally not limited by version like other queries are, since tracking nightly you want 29.0a1/30.0a1/31.0a1/etc.
Flags: needinfo?(benjamin)
Thanks KaiRo and bsmedberg, I like the idea of the drop down, and defaulting to nightly. I will implement it as such.
Flags: needinfo?(adrian)
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/eba54c629c874f5bd58dc502a24be7a1af4c1abc
Fix Bug 1019262, implement graphs for adu by signature

https://github.com/mozilla/socorro/commit/953a58bc85dd57f7d14d272e02f1dccb36c39c44
Merge pull request #2152 from ossreleasefeed/bug1019262-implement-graphs-adu-per-signature

Fix Bug 1019262, implement graphs for adu by signature
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
How to QA
-----------

There are a couple of scenarios to test.

1) Load up the front page, navigate to top crashers and click on one of the signature.
This will load report/list
On the resulting page, click on the graph tab

1.a If there was data, you should see a graph drawn
1.b If there was no data, you should see a message that states this.

2) Navigate over to the search page.
Do a search
Click on one of the signatures in the results

If the graph tab is not shown be default, click on the graph tab.

2.a If there was no data, there should be a message as such
2.b If there was data, the graph should be drawn and there should be a select field to switch between different channels.
2.b.1 Switching to a different channel should either cause the 'no data' message to go away and be replaced by a graph or, there should be no change.
2.b.1.a Note that the heading should change to reflect the new channel i.e 
    if it was 'ADU By Signature For Nightly Channel' and you switched to release, it should change to 'ADU By Signature For Release'

There might be more, but this is what I think of right now.
Whiteboard: [QA+]
Why are the plotted points different sizes? The dots are often very small, making them difficult to hit as a mouse target.

I was able to verify "No graph data returned for signature" and some signatures with graphs in one instance, but in two others the graph tab said "loading graph" and then sat empty:

https://crash-stats.mozilla.com/report/list?signature=arena_dalloc+|+je_free+|+FinalizeArenas&product=Firefox#tab-graph

https://crash-stats.mozilla.com/report/list?signature=nsTArray_base%3CnsTArrayFallibleAllocator%2C+nsTArray_CopyWithConstructors%3CJS%3A%3AHeap%3CJSObject*%3E+%3E+%3E%3A%3AIncrementLength%28unsigned+int%29+|+mozilla%3A%3Anet%3A%3AHttpBaseChannel%3A%3AApplyContentConversions%28%29&product=Firefox#tab-graph
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Going to https://crash-stats.mozilla.com/report/list?signature=CDevice%3A%3ADriverInternalErrorCB%28long%29

I can see the graph (for Aurora), but only over a very small date range.

I'm happy to file a followup bug, but we should expect to need to extend the date ranges of this graph back several months in order to be able to find regression ranges.
(In reply to Chris Lonnen :lonnen from comment #18)
> Why are the plotted points different sizes? The dots are often very small,
> making them difficult to hit as a mouse target.

It was an experiment in which the number of crashes for each plot point roughly determines the size of the point, so the more crashes the larger (basically highlighting the ones with higher crash numbers). I guess this is not ideal and does make some points with say, less than 10 crashes small. Probably better to go with a default size for all. I will play around with some different values, thinking around 15 should be a good size.

> I was able to verify "No graph data returned for signature" and some signatures with graphs in one 
> instance, but in two others the graph tab said "loading graph" and then sat empty:

I will look into this.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #19)
> Going to
> https://crash-stats.mozilla.com/report/
> list?signature=CDevice%3A%3ADriverInternalErrorCB%28long%29
> 
> I can see the graph (for Aurora), but only over a very small date range.
> 
> I'm happy to file a followup bug, but we should expect to need to extend the
> date ranges of this graph back several months in order to be able to find
> regression ranges.

Currently this uses the date range off the overall page so, between 3 and 28 days. If we need more, perhaps we should add fields so the user can select their own date range?
(In reply to Chris Lonnen :lonnen from comment #20)
> variable size plot points leads to this:
> https://crash-stats.allizom.org/report/
> list?product=Firefox&range_value=7&range_unit=days&date=2014-06-
> 29&signature=nsTArray_base%3CnsTArrayFallibleAllocator%2C+nsTArray_CopyWithCo
> nstructors%3CJS%3A%3AHeap%3CJSObject*%3E+%3E+%3E%3A%3AIncrementLength%28unsig
> ned+int%29+|+mozilla%3A%3Anet%3A%3AHttpBaseChannel%3A%3AApplyContentConversio
> ns%28%29&version=Firefox%3A32.0a2#tab-graph

Holy Shnickeys!!! Well, that was bound to happen. I am guessing that is the spiky crashes in 32? :)
(In reply to Chris Lonnen :lonnen from comment #18)
> 
> https://crash-stats.mozilla.com/report/
> list?signature=arena_dalloc+|+je_free+|+FinalizeArenas&product=Firefox#tab-
> graph
> 
> https://crash-stats.mozilla.com/report/
> list?signature=nsTArray_base%3CnsTArrayFallibleAllocator%2C+nsTArray_CopyWith
> Constructors%3CJS%3A%3AHeap%3CJSObject*%3E+%3E+%3E%3A%3AIncrementLength%28uns
> igned+int%29+|+mozilla%3A%3Anet%3A%3AHttpBaseChannel%3A%3AApplyContentConvers
> ions%28%29&product=Firefox#tab-graph

So, the problem is that the code to remove the '.hide' class from the adubysig div is not run so, the text is there just not visible. Not 100% sure, but this is most likely a case where it is trying to select the div before it exists in the DOM.

Will investigate this further.
Found the bug, will fix.
(In reply to Schalk Neethling [:espressive] from comment #23)
> Holy Shnickeys!!! Well, that was bound to happen. I am guessing that is the
> spiky crashes in 32? :)

Yes, that's one of them.

(In reply to Schalk Neethling [:espressive] from comment #22)
> Currently this uses the date range off the overall page so, between 3 and 28
> days. If we need more, perhaps we should add fields so the user can select
> their own date range?

We definitely need way longer ranges to determine regression ranges, often up to a few months back.
Target Milestone: 91 → 92
The problems mentioned here has been resolved with:

https://github.com/mozilla/socorro/pull/2159
https://github.com/mozilla/socorro/pull/2160

https://github.com/mozilla/socorro/pull/2195
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: 92 → 94
Product: Socorro → Socorro Graveyard
You need to log in before you can comment on or make changes to this bug.