Closed Bug 915246 Opened 11 years ago Closed 10 years ago

Graphs of crashes/ADU for individual signatures

Categories

(Socorro :: Webapp, task)

x86_64
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: selenamarie)

References

Details

Attachments

(1 file)

Currently when a regression happens or is fixed on nightly/aurora, it can be difficult to identify the regression range. This is because it takes several days to get users onto builds, and users often lag behind by several days. Date-based queries are therefore very noisy.

I've prototyped a new report type which is very good at fixing this noise: it takes the 10-day window after a particular build, counts the number of ADI and the number of crashes for that build across the 10 days, and then computes the crashes/ADI number.

I'd like to get this report to be part of the normal graphs which appear on the graphs tab of report/list.

For an example of the report in graphical form, see http://benjamin.smedbergs.us/tests/OnMaybeDeQueueOne.svg

My prototype collects data using this script:
https://github.com/mozilla-metrics/socorro-toolbox/blob/master/src/main/python/nightly-signature-frequency.py

And graphs in SVG using this script:
https://github.com/bsmedberg/bsmedberg-graphing-playground/blob/master/nightly-signature-graph.py

I suspect that the data collection script will need to be significantly refactored, because it currently queries against build_adus *and* releases_raw, both of which are not indexed and use full-table scans.

I also suspect that you'll want to rewrite the graph-generation in some other tool (D3? flot sucks although an earlier prototype of this used flot).

Earlier prototype using flot:
http://benjamin.smedbergs.us/blog/2013-04-22/graph-of-the-day-empty-minidump-crashes-per-user/
https://github.com/bsmedberg/bsmedberg-graphing-playground/blob/master/emptydump-nightly-frequency.html and .js
Assignee: nobody → bsavage
Selena, there's a PR for this; can you review?

https://github.com/mozilla/socorro/pull/1547
Flags: needinfo?(sdeckelmann)
Reviewed last week!
Flags: needinfo?(sdeckelmann)
It looks like two PRs have been opened for this:

https://github.com/mozilla/socorro/pull/1547 
https://github.com/mozilla/socorro/pull/1606


both appear to be closed without merging.
An update on this bug: after talking with Laura in a 1:1 we determined that the first step we want to take is to offer you the ability to get the data through the middleware and have you verify that we're getting what you need (e.g. that the data matches what you expect). Then we will create a report for it (probably in Q1).
The data/middleware changes will land this week.
Hoping to get the report done this Q as well - I think we may have a miscommunication.
The pull request is here: https://github.com/mozilla/socorro/pull/1701 but it looks stalled for the last week. What's the status?
This is pretty much ready to land once we have a UI for me to hook up to. How is that big coming along, Schalk? Do you need anything from me?
Flags: needinfo?(schalk.neethling.bugs)
Looks like there is PG stuff still to do on the PR - Brandon, can you take care of that?
Flags: needinfo?(bsavage)
Presumably the API and the UI can land separately also, right? Once the API is landed and available via HTTP I can hack up a quick presentation for the QA people who need it; it won't be pretty but at least they'll have the data they need without having to ask me every time.
(In reply to Brandon Savage [:brandon] from comment #8)
> This is pretty much ready to land once we have a UI for me to hook up to.
> How is that big coming along, Schalk? Do you need anything from me?

Brandon, can you give me a sense of the type of data structure I would be looking at, maybe a sample JSON file I can use as a mock?
Flags: needinfo?(schalk.neethling.bugs)
Couple of items here:

1) It is mentioned that this is to be included in the graphs tab container on report/list, has this been decided as final?
2) Do you want the two graphs (the current and new) just stacked one on top of the other or, would you prefer to be able to switch between the two?
Flags: needinfo?(kairo)
It should definitely be on report/list. It can completely replace the existing by-build-date graph since it contains strictly a superset of the current information.

Although I thought report/list used to have a graph by crash date as well, which is occasionally useful for crashes caused by external events and not our code.

In the future this style of crashes/ADU report will be useful for other kinds of reports, e.g. a graph of crashes/ADU for any arbitrary supersearch results. But for now report/list and the base API is what we need, as long as the base API can be used to build the other things by hand.
Flags: needinfo?(kairo)
FYI, report/list never had a graph by crash date as far as I can remember.
Target Milestone: --- → 70
Didn't land in time for 70.
Target Milestone: 70 → 71
There's a UI for this, clearing the needinfo.
Flags: needinfo?(bsavage)
Target Milestone: 71 → 73
Reassigning to Rob, since he and :espressive had been doing some work on this, too.
Assignee: bsavage → rhelmer
Target Milestone: 73 → 74
Target Milestone: 74 → 75
Target Milestone: 75 → 76
Target Milestone: 76 → 77
Target Milestone: 77 → 78
Status: NEW → ASSIGNED
Target Milestone: 78 → 80
This PR was landed, the github robot did not comment here due to a typo in the comment:

https://github.com/mozilla/socorro/pull/1701

I don't see any middleware or UI changes outstanding or landed.
(In reply to Robert Helmer [:rhelmer] from comment #19)
> This PR was landed, the github robot did not comment here due to a typo in
> the comment:

Oops, I take that back - the branch name had a typo but the commit message was fine, comment 15.

I am working on middleware for this now.
Hm looks like the cron job is running correctly, but we haven't had new data for a while:

breakpad=# select max(build_date) from crash_adu_by_build_signature;
    max     
------------
 2014-02-05
(1 row)

I found this in the crontabber log for today:

2014-03-31 03:03:56,011 INFO  - MainThread - Notices from calling update_crash_a
du_by_build_signature([datetime.date(2014, 3, 30)]): ['NOTICE:  no new build adu
s for day 2014-03-30\n']

It looks like build_adu table is used by the query in the above job, but it does not depend on the update_build_adu job which ran later:

2014-03-31 03:21:04,077 INFO  - MainThread - Result from calling update_build_adu([datetime.date(2014, 3, 30)]): (True,)

I'll add an explicit dependency here, hopefully that is enough to get it working consistently every day.
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/95cf268ef15e990231dbe88b61d7bc992e29d3fd
bug 915246 - explicitly depend on build-adu-matview job

https://github.com/mozilla/socorro/commit/8cc7fa300cf8e05c6ced6fd126c44783ca4af579
Merge pull request #1974 from rhelmer/bug915246-crashes-by-adu-dependency

bug 915246 - explicitly depend on build-adu-matview job
OK this seems to be working on stage now:

breakpad=# select max(build_date) from crash_adu_by_build_signature;
    max     
------------
 2014-03-31
(1 row)

Working on mware.
(In reply to Robert Helmer [:rhelmer] from comment #23)
> OK this seems to be working on stage now:
> 
> breakpad=# select max(build_date) from crash_adu_by_build_signature;
>     max     
> ------------
>  2014-03-31
> (1 row)
> 
> Working on mware.

Let me know once the middleware is ready, then I can hook this up to the existing UI and make any tweaks and changes we need there, such as switching to D3 for the graph etc.
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/7dfe203cdd134cb5b1ce5971227a4d1954ff6d68
bug 915246 - add mware for crash_adu_by_build_signature table

https://github.com/mozilla/socorro/commit/d807752a0a37229c684b26a2be9879c9599d3232
Merge pull request #1978 from rhelmer/bug915246-crashes-by-adu-mware

bug 915246 - add mware for crash_adu_by_build_signature table
(In reply to Schalk Neethling [:espressive] from comment #24)
> (In reply to Robert Helmer [:rhelmer] from comment #23)
> > OK this seems to be working on stage now:
> > 
> > breakpad=# select max(build_date) from crash_adu_by_build_signature;
> >     max     
> > ------------
> >  2014-03-31
> > (1 row)
> > 
> > Working on mware.
> 
> Let me know once the middleware is ready, then I can hook this up to the
> existing UI and make any tweaks and changes we need there, such as switching
> to D3 for the graph etc.

Just landed the middleware change, it'll be up on stage shortly - it's documented at http://socorro.readthedocs.org/en/latest/middleware.html#crashes-per-adu-by-signature-service
Assignee: rhelmer → schalk.neethling.bugs
Request from Crashkill this week to build out the minimum necessary bits to expose this in the public API first and then worry about the UI second.
(In reply to Chris Lonnen :lonnen from comment #28)
> Request from Crashkill this week to build out the minimum necessary bits to
> expose this in the public API first and then worry about the UI second.

Since the mware bit is done, all we need to do is add a model on the Django side and whitelist it so it'll appear on the API. Schalk why don't I do this part?
Flags: needinfo?(schalk.neethling.bugs)
As mentioned to Rob on IRC, I am going to take a good 'ol stab at this today but, if I find I am running in circles or, I just do not have the time, I will kick the model part back over the wall to Rob.
Flags: needinfo?(schalk.neethling.bugs)
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/863219a9e331bcbc4aba48f59e8426698bc56ef4
bug 915246 - add model for AduBySignature

https://github.com/mozilla/socorro/commit/5799a1a832cafac0556ef4659df28851f8b99eee
Merge pull request #1994 from rhelmer/bug915246-crash_adu_by_build_signature-model

bug 915246 - add model for AduBySignature
(In reply to Chris Lonnen :lonnen from comment #28)
> Request from Crashkill this week to build out the minimum necessary bits to
> expose this in the public API first and then worry about the UI second.

This is done and up on stage:
https://crash-stats.allizom.org/api/AduBySignature/?channel=nightly&signature=JS_GetFunctionDisplayId%28JSFunction*%29&end_date=2014-04-14&start_date=2014-04-14

However I think this should be storing and exposing a report_date, it's pretty confusing figuring out how to graph this without that.
Something is wrong with that data, yes. For one thing it has three entries, but these keys stay the same for each:

"build_date": "2014-04-10", 
"os_name": "Windows", 
"buildid": "20140410150427", 
"adu_date": "2014-04-14", 
"signature": "JS_GetFunctionDisplayId(JSFunction*)", 
"channel": "nightly"

The only thing that is changing is adu_count and crash_count.
(In reply to Robert Helmer [:rhelmer] from comment #32)
> (In reply to Chris Lonnen :lonnen from comment #28)
> > Request from Crashkill this week to build out the minimum necessary bits to
> > expose this in the public API first and then worry about the UI second.
> 
> This is done and up on stage:
> https://crash-stats.allizom.org/api/AduBySignature/
> ?channel=nightly&signature=JS_GetFunctionDisplayId%28JSFunction*%29&end_date=
> 2014-04-14&start_date=2014-04-14
> 
> However I think this should be storing and exposing a report_date, it's
> pretty confusing figuring out how to graph this without that.

Actually thinking about this, I am not sure the query is doing the right thing.. I would think we'd get one row per adu_date/build_date/signature/os_name/channel but instead we have multiple.

Selena do you have time to help me take a look at this? The query is:
https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/raw_sql/procs/update_crash_adu_by_build_signature.sql

It should be based on the query in:
https://github.com/mozilla/socorro/blob/bug915246-crash_adu_by_build_signature-model/socorro/external/postgresql/raw_sql/procs/update_crash_adu_by_build_signature.sql
Flags: needinfo?(sdeckelmann)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #33)
> Something is wrong with that data, yes. For one thing it has three entries,
> but these keys stay the same for each:
> 
> "build_date": "2014-04-10", 
> "os_name": "Windows", 
> "buildid": "20140410150427", 
> "adu_date": "2014-04-14", 
> "signature": "JS_GetFunctionDisplayId(JSFunction*)", 
> "channel": "nightly"
> 
> The only thing that is changing is adu_count and crash_count.

Agreed
(In reply to Robert Helmer [:rhelmer] from comment #34)
> Selena do you have time to help me take a look at this? The query is:
> https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/
> raw_sql/procs/update_crash_adu_by_build_signature.sql

Ah nevermind - I just did a little debugging and this query is pulling in all products, but there's no way to tell them apart in the matview. We should group by product_name (from product_versions) and store that in the matview, and the middleware and django model need a way to specify product.
Flags: needinfo?(sdeckelmann)
Depends on: 996806
Target Milestone: 80 → 81
Target Milestone: 81 → 82
Target Milestone: 82 → 83
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/760c0932911d37d2d43cad81340ea248abed2aa2
bug 915246 - ensure that product_version_id matches in sigreports and
restrict to 1 day
OK I think I've eliminated all the sources of duplicate rows, here is a quick example. The AduBySignature API is on stage right now and migrations have run, let me know if you see any issues with it - it's on track to ship today most likely.
Flags: needinfo?(benjamin)
I ran this:

https://crash-stats.allizom.org/api/AduBySignature/?channel=nightly&product_name=Firefox&signature=js%3A%3Atypes%3A%3ATypeObject%3A%3Asweep(js%3A%3AFreeOp*)&start_date=2014-03-01

I don't understand the data. I'm getting multiple rows per buildid:

    {
      "build_date": "2014-03-20", 
      "os_name": "Windows", 
      "buildid": "20140320030203", 
      "adu_count": 7349, 
      "crash_count": 1, 
      "adu_date": "2014-03-20", 
      "signature": "js::types::TypeObject::sweep(js::FreeOp*)", 
      "product_name": "Firefox", 
      "channel": "nightly"
    }, 
    {
      "build_date": "2014-03-20", 
      "os_name": "Windows", 
      "buildid": "20140320030203", 
      "adu_count": 30326, 
      "crash_count": 1, 
      "adu_date": "2014-03-21", 
      "signature": "js::types::TypeObject::sweep(js::FreeOp*)", 
      "product_name": "Firefox", 
      "channel": "nightly"
    },

Am I supposed to be getting a row for each "adu_date and buildid pair"? If so I'd expect to get either 7 or 10 days of adu_dates, not just one or two.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #39)
> I ran this:
> 
> https://crash-stats.allizom.org/api/AduBySignature/
> ?channel=nightly&product_name=Firefox&signature=js%3A%3Atypes%3A%3ATypeObject
> %3A%3Asweep(js%3A%3AFreeOp*)&start_date=2014-03-01
> 
> I don't understand the data. I'm getting multiple rows per buildid:
> 
>     {
>       "build_date": "2014-03-20", 
>       "os_name": "Windows", 
>       "buildid": "20140320030203", 
>       "adu_count": 7349, 
>       "crash_count": 1, 
>       "adu_date": "2014-03-20", 
>       "signature": "js::types::TypeObject::sweep(js::FreeOp*)", 
>       "product_name": "Firefox", 
>       "channel": "nightly"
>     }, 
>     {
>       "build_date": "2014-03-20", 
>       "os_name": "Windows", 
>       "buildid": "20140320030203", 
>       "adu_count": 30326, 
>       "crash_count": 1, 
>       "adu_date": "2014-03-21", 
>       "signature": "js::types::TypeObject::sweep(js::FreeOp*)", 
>       "product_name": "Firefox", 
>       "channel": "nightly"
>     },
> 
> Am I supposed to be getting a row for each "adu_date and buildid pair"? If
> so I'd expect to get either 7 or 10 days of adu_dates, not just one or two.

Yes that's what I'd expect looking at the SP (unless I am misunderstanding the goal here). Note that I had to truncate and backfill the matview, so there's only 30 days of data (could backfill further if you'd like).

I'll take a closer look at the data and see what we're missing here.
Target Milestone: 83 → 82
Target Milestone: 82 → 83
Target Milestone: 83 → 84
Quick update - we backed this out due to an unrelated migration error (production data has NULL product_name in some cases which we didn't see on stage).

However, we know from looking at the data already that something is wrong with the underlying query. Selena is investigating.

The code to expose this via the API is all ready to go, I'll reland once we get the query worked out.
Assignee: schalk.neethling.bugs → sdeckelmann
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/55c8a036fca910288bb9204bf20c5ef3a06ffce8
Fixes bug 915246 Corrects JOINs and adds a column for product_name

https://github.com/mozilla/socorro/commit/61ef8f08bba74e7b324c8de2b74088f4aceeec68
Migration for bug 915246

https://github.com/mozilla/socorro/commit/a3c187c28b223fedcd8d33d166541c76a72f9ba4
Merge pull request #2031 from selenamarie/bug915246-adu-by-signature-fixup

Fixes bug 915246 Corrects JOINs and adds a column for product_name
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1003562
This is all landed and working better I think:

https://crash-stats.allizom.org/api/AduBySignature/?channel=nightly&signature=ScanTypeObject&product_name=Firefox

bsmedberg what do you think?
Flags: needinfo?(benjamin)
As mentioned on IRC, I think this looks roughly correct, although it's filtering out builds that have no crashes. I'd prefer those to still be present if possible, since that clarifies in the final chart that we're not missing data but that a crash actually started/went away properly.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #45)
> As mentioned on IRC, I think this looks roughly correct, although it's
> filtering out builds that have no crashes. I'd prefer those to still be
> present if possible, since that clarifies in the final chart that we're not
> missing data but that a crash actually started/went away properly.

Thanks! I filed bug 1007379 to follow up on that.
(In reply to Robert Helmer [:rhelmer] from comment #46)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #45)
> > As mentioned on IRC, I think this looks roughly correct, although it's
> > filtering out builds that have no crashes. I'd prefer those to still be
> > present if possible, since that clarifies in the final chart that we're not
> > missing data but that a crash actually started/went away properly.
> 
> Thanks! I filed bug 1007379 to follow up on that.

So I recon I am still going to hold of a little on finishing up the UI bits?
Because I show up here looking for it periodically, bug 1019262 covers the front end implementation.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: