Closed
Bug 915246
Opened 11 years ago
Closed 11 years ago
Graphs of crashes/ADU for individual signatures
Categories
(Socorro :: Webapp, task)
Tracking
(Not tracked)
RESOLVED
FIXED
84
People
(Reporter: benjamin, Assigned: selenamarie)
References
Details
Attachments
(1 file)
Currently when a regression happens or is fixed on nightly/aurora, it can be difficult to identify the regression range. This is because it takes several days to get users onto builds, and users often lag behind by several days. Date-based queries are therefore very noisy.
I've prototyped a new report type which is very good at fixing this noise: it takes the 10-day window after a particular build, counts the number of ADI and the number of crashes for that build across the 10 days, and then computes the crashes/ADI number.
I'd like to get this report to be part of the normal graphs which appear on the graphs tab of report/list.
For an example of the report in graphical form, see http://benjamin.smedbergs.us/tests/OnMaybeDeQueueOne.svg
My prototype collects data using this script:
https://github.com/mozilla-metrics/socorro-toolbox/blob/master/src/main/python/nightly-signature-frequency.py
And graphs in SVG using this script:
https://github.com/bsmedberg/bsmedberg-graphing-playground/blob/master/nightly-signature-graph.py
I suspect that the data collection script will need to be significantly refactored, because it currently queries against build_adus *and* releases_raw, both of which are not indexed and use full-table scans.
I also suspect that you'll want to rewrite the graph-generation in some other tool (D3? flot sucks although an earlier prototype of this used flot).
Earlier prototype using flot:
http://benjamin.smedbergs.us/blog/2013-04-22/graph-of-the-day-empty-minidump-crashes-per-user/
https://github.com/bsmedberg/bsmedberg-graphing-playground/blob/master/emptydump-nightly-frequency.html and .js
Updated•11 years ago
|
Assignee: nobody → bsavage
Comment 1•11 years ago
|
||
Selena, there's a PR for this; can you review?
https://github.com/mozilla/socorro/pull/1547
Flags: needinfo?(sdeckelmann)
Comment 3•11 years ago
|
||
It looks like two PRs have been opened for this:
https://github.com/mozilla/socorro/pull/1547
https://github.com/mozilla/socorro/pull/1606
both appear to be closed without merging.
Comment 4•11 years ago
|
||
An update on this bug: after talking with Laura in a 1:1 we determined that the first step we want to take is to offer you the ability to get the data through the middleware and have you verify that we're getting what you need (e.g. that the data matches what you expect). Then we will create a report for it (probably in Q1).
Comment 5•11 years ago
|
||
The data/middleware changes will land this week.
Comment 6•11 years ago
|
||
Hoping to get the report done this Q as well - I think we may have a miscommunication.
Comment 7•11 years ago
|
||
The pull request is here: https://github.com/mozilla/socorro/pull/1701 but it looks stalled for the last week. What's the status?
Comment 8•11 years ago
|
||
This is pretty much ready to land once we have a UI for me to hook up to. How is that big coming along, Schalk? Do you need anything from me?
Flags: needinfo?(schalk.neethling.bugs)
Comment 9•11 years ago
|
||
Looks like there is PG stuff still to do on the PR - Brandon, can you take care of that?
Flags: needinfo?(bsavage)
Reporter | ||
Comment 10•11 years ago
|
||
Presumably the API and the UI can land separately also, right? Once the API is landed and available via HTTP I can hack up a quick presentation for the QA people who need it; it won't be pretty but at least they'll have the data they need without having to ask me every time.
Comment 11•11 years ago
|
||
(In reply to Brandon Savage [:brandon] from comment #8)
> This is pretty much ready to land once we have a UI for me to hook up to.
> How is that big coming along, Schalk? Do you need anything from me?
Brandon, can you give me a sense of the type of data structure I would be looking at, maybe a sample JSON file I can use as a mock?
Flags: needinfo?(schalk.neethling.bugs)
Comment 12•11 years ago
|
||
Couple of items here:
1) It is mentioned that this is to be included in the graphs tab container on report/list, has this been decided as final?
2) Do you want the two graphs (the current and new) just stacked one on top of the other or, would you prefer to be able to switch between the two?
Flags: needinfo?(kairo)
Reporter | ||
Comment 13•11 years ago
|
||
It should definitely be on report/list. It can completely replace the existing by-build-date graph since it contains strictly a superset of the current information.
Although I thought report/list used to have a graph by crash date as well, which is occasionally useful for crashes caused by external events and not our code.
In the future this style of crashes/ADU report will be useful for other kinds of reports, e.g. a graph of crashes/ADU for any arbitrary supersearch results. But for now report/list and the base API is what we need, as long as the base API can be used to build the other things by hand.
Flags: needinfo?(kairo)
Comment 14•11 years ago
|
||
FYI, report/list never had a graph by crash date as far as I can remember.
Comment 15•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/bf0399fa78b88691da815891eb2fac34f3e83401
Adding matview and migration for bug 915246.
Updated•11 years ago
|
Target Milestone: --- → 70
Updated•11 years ago
|
Target Milestone: 71 → 73
Comment 18•11 years ago
|
||
Reassigning to Rob, since he and :espressive had been doing some work on this, too.
Assignee: bsavage → rhelmer
Updated•11 years ago
|
Target Milestone: 73 → 74
Updated•11 years ago
|
Target Milestone: 74 → 75
Updated•11 years ago
|
Target Milestone: 75 → 76
Updated•11 years ago
|
Target Milestone: 76 → 77
Updated•11 years ago
|
Target Milestone: 77 → 78
Updated•11 years ago
|
Status: NEW → ASSIGNED
Target Milestone: 78 → 80
Comment 19•11 years ago
|
||
This PR was landed, the github robot did not comment here due to a typo in the comment:
https://github.com/mozilla/socorro/pull/1701
I don't see any middleware or UI changes outstanding or landed.
Comment 20•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #19)
> This PR was landed, the github robot did not comment here due to a typo in
> the comment:
Oops, I take that back - the branch name had a typo but the commit message was fine, comment 15.
I am working on middleware for this now.
Comment 21•11 years ago
|
||
Hm looks like the cron job is running correctly, but we haven't had new data for a while:
breakpad=# select max(build_date) from crash_adu_by_build_signature;
max
------------
2014-02-05
(1 row)
I found this in the crontabber log for today:
2014-03-31 03:03:56,011 INFO - MainThread - Notices from calling update_crash_a
du_by_build_signature([datetime.date(2014, 3, 30)]): ['NOTICE: no new build adu
s for day 2014-03-30\n']
It looks like build_adu table is used by the query in the above job, but it does not depend on the update_build_adu job which ran later:
2014-03-31 03:21:04,077 INFO - MainThread - Result from calling update_build_adu([datetime.date(2014, 3, 30)]): (True,)
I'll add an explicit dependency here, hopefully that is enough to get it working consistently every day.
Comment 22•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/95cf268ef15e990231dbe88b61d7bc992e29d3fd
bug 915246 - explicitly depend on build-adu-matview job
https://github.com/mozilla/socorro/commit/8cc7fa300cf8e05c6ced6fd126c44783ca4af579
Merge pull request #1974 from rhelmer/bug915246-crashes-by-adu-dependency
bug 915246 - explicitly depend on build-adu-matview job
Comment 23•11 years ago
|
||
OK this seems to be working on stage now:
breakpad=# select max(build_date) from crash_adu_by_build_signature;
max
------------
2014-03-31
(1 row)
Working on mware.
Comment 24•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #23)
> OK this seems to be working on stage now:
>
> breakpad=# select max(build_date) from crash_adu_by_build_signature;
> max
> ------------
> 2014-03-31
> (1 row)
>
> Working on mware.
Let me know once the middleware is ready, then I can hook this up to the existing UI and make any tweaks and changes we need there, such as switching to D3 for the graph etc.
Comment 25•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/7dfe203cdd134cb5b1ce5971227a4d1954ff6d68
bug 915246 - add mware for crash_adu_by_build_signature table
https://github.com/mozilla/socorro/commit/d807752a0a37229c684b26a2be9879c9599d3232
Merge pull request #1978 from rhelmer/bug915246-crashes-by-adu-mware
bug 915246 - add mware for crash_adu_by_build_signature table
Comment 26•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/43799b3e0e91a792cc32c643b3d2f3af93d504b6
bug 915246 - fix link in docs
https://github.com/mozilla/socorro/commit/e1e77b55dc3d358e718c0908fdd6a81a71edd779
Merge pull request #1983 from rhelmer/bug915246-crashes-by-adu-mware
bug 915246 - fix link in docs
Comment 27•11 years ago
|
||
(In reply to Schalk Neethling [:espressive] from comment #24)
> (In reply to Robert Helmer [:rhelmer] from comment #23)
> > OK this seems to be working on stage now:
> >
> > breakpad=# select max(build_date) from crash_adu_by_build_signature;
> > max
> > ------------
> > 2014-03-31
> > (1 row)
> >
> > Working on mware.
>
> Let me know once the middleware is ready, then I can hook this up to the
> existing UI and make any tweaks and changes we need there, such as switching
> to D3 for the graph etc.
Just landed the middleware change, it'll be up on stage shortly - it's documented at http://socorro.readthedocs.org/en/latest/middleware.html#crashes-per-adu-by-signature-service
Assignee: rhelmer → schalk.neethling.bugs
Comment 28•11 years ago
|
||
Request from Crashkill this week to build out the minimum necessary bits to expose this in the public API first and then worry about the UI second.
Comment 29•11 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #28)
> Request from Crashkill this week to build out the minimum necessary bits to
> expose this in the public API first and then worry about the UI second.
Since the mware bit is done, all we need to do is add a model on the Django side and whitelist it so it'll appear on the API. Schalk why don't I do this part?
Flags: needinfo?(schalk.neethling.bugs)
Comment 30•11 years ago
|
||
As mentioned to Rob on IRC, I am going to take a good 'ol stab at this today but, if I find I am running in circles or, I just do not have the time, I will kick the model part back over the wall to Rob.
Flags: needinfo?(schalk.neethling.bugs)
Comment 31•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/863219a9e331bcbc4aba48f59e8426698bc56ef4
bug 915246 - add model for AduBySignature
https://github.com/mozilla/socorro/commit/5799a1a832cafac0556ef4659df28851f8b99eee
Merge pull request #1994 from rhelmer/bug915246-crash_adu_by_build_signature-model
bug 915246 - add model for AduBySignature
Comment 32•11 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #28)
> Request from Crashkill this week to build out the minimum necessary bits to
> expose this in the public API first and then worry about the UI second.
This is done and up on stage:
https://crash-stats.allizom.org/api/AduBySignature/?channel=nightly&signature=JS_GetFunctionDisplayId%28JSFunction*%29&end_date=2014-04-14&start_date=2014-04-14
However I think this should be storing and exposing a report_date, it's pretty confusing figuring out how to graph this without that.
Reporter | ||
Comment 33•11 years ago
|
||
Something is wrong with that data, yes. For one thing it has three entries, but these keys stay the same for each:
"build_date": "2014-04-10",
"os_name": "Windows",
"buildid": "20140410150427",
"adu_date": "2014-04-14",
"signature": "JS_GetFunctionDisplayId(JSFunction*)",
"channel": "nightly"
The only thing that is changing is adu_count and crash_count.
Comment 34•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #32)
> (In reply to Chris Lonnen :lonnen from comment #28)
> > Request from Crashkill this week to build out the minimum necessary bits to
> > expose this in the public API first and then worry about the UI second.
>
> This is done and up on stage:
> https://crash-stats.allizom.org/api/AduBySignature/
> ?channel=nightly&signature=JS_GetFunctionDisplayId%28JSFunction*%29&end_date=
> 2014-04-14&start_date=2014-04-14
>
> However I think this should be storing and exposing a report_date, it's
> pretty confusing figuring out how to graph this without that.
Actually thinking about this, I am not sure the query is doing the right thing.. I would think we'd get one row per adu_date/build_date/signature/os_name/channel but instead we have multiple.
Selena do you have time to help me take a look at this? The query is:
https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/raw_sql/procs/update_crash_adu_by_build_signature.sql
It should be based on the query in:
https://github.com/mozilla/socorro/blob/bug915246-crash_adu_by_build_signature-model/socorro/external/postgresql/raw_sql/procs/update_crash_adu_by_build_signature.sql
Flags: needinfo?(sdeckelmann)
Comment 35•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #33)
> Something is wrong with that data, yes. For one thing it has three entries,
> but these keys stay the same for each:
>
> "build_date": "2014-04-10",
> "os_name": "Windows",
> "buildid": "20140410150427",
> "adu_date": "2014-04-14",
> "signature": "JS_GetFunctionDisplayId(JSFunction*)",
> "channel": "nightly"
>
> The only thing that is changing is adu_count and crash_count.
Agreed
Comment 36•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #34)
> Selena do you have time to help me take a look at this? The query is:
> https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/
> raw_sql/procs/update_crash_adu_by_build_signature.sql
Ah nevermind - I just did a little debugging and this query is pulling in all products, but there's no way to tell them apart in the matview. We should group by product_name (from product_versions) and store that in the matview, and the middleware and django model need a way to specify product.
Flags: needinfo?(sdeckelmann)
Updated•11 years ago
|
Target Milestone: 80 → 81
Updated•11 years ago
|
Target Milestone: 81 → 82
Updated•11 years ago
|
Target Milestone: 82 → 83
Comment 37•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/760c0932911d37d2d43cad81340ea248abed2aa2
bug 915246 - ensure that product_version_id matches in sigreports and
restrict to 1 day
Comment 38•11 years ago
|
||
OK I think I've eliminated all the sources of duplicate rows, here is a quick example. The AduBySignature API is on stage right now and migrations have run, let me know if you see any issues with it - it's on track to ship today most likely.
Flags: needinfo?(benjamin)
Reporter | ||
Comment 39•11 years ago
|
||
I ran this:
https://crash-stats.allizom.org/api/AduBySignature/?channel=nightly&product_name=Firefox&signature=js%3A%3Atypes%3A%3ATypeObject%3A%3Asweep(js%3A%3AFreeOp*)&start_date=2014-03-01
I don't understand the data. I'm getting multiple rows per buildid:
{
"build_date": "2014-03-20",
"os_name": "Windows",
"buildid": "20140320030203",
"adu_count": 7349,
"crash_count": 1,
"adu_date": "2014-03-20",
"signature": "js::types::TypeObject::sweep(js::FreeOp*)",
"product_name": "Firefox",
"channel": "nightly"
},
{
"build_date": "2014-03-20",
"os_name": "Windows",
"buildid": "20140320030203",
"adu_count": 30326,
"crash_count": 1,
"adu_date": "2014-03-21",
"signature": "js::types::TypeObject::sweep(js::FreeOp*)",
"product_name": "Firefox",
"channel": "nightly"
},
Am I supposed to be getting a row for each "adu_date and buildid pair"? If so I'd expect to get either 7 or 10 days of adu_dates, not just one or two.
Flags: needinfo?(benjamin)
Comment 40•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #39)
> I ran this:
>
> https://crash-stats.allizom.org/api/AduBySignature/
> ?channel=nightly&product_name=Firefox&signature=js%3A%3Atypes%3A%3ATypeObject
> %3A%3Asweep(js%3A%3AFreeOp*)&start_date=2014-03-01
>
> I don't understand the data. I'm getting multiple rows per buildid:
>
> {
> "build_date": "2014-03-20",
> "os_name": "Windows",
> "buildid": "20140320030203",
> "adu_count": 7349,
> "crash_count": 1,
> "adu_date": "2014-03-20",
> "signature": "js::types::TypeObject::sweep(js::FreeOp*)",
> "product_name": "Firefox",
> "channel": "nightly"
> },
> {
> "build_date": "2014-03-20",
> "os_name": "Windows",
> "buildid": "20140320030203",
> "adu_count": 30326,
> "crash_count": 1,
> "adu_date": "2014-03-21",
> "signature": "js::types::TypeObject::sweep(js::FreeOp*)",
> "product_name": "Firefox",
> "channel": "nightly"
> },
>
> Am I supposed to be getting a row for each "adu_date and buildid pair"? If
> so I'd expect to get either 7 or 10 days of adu_dates, not just one or two.
Yes that's what I'd expect looking at the SP (unless I am misunderstanding the goal here). Note that I had to truncate and backfill the matview, so there's only 30 days of data (could backfill further if you'd like).
I'll take a closer look at the data and see what we're missing here.
Updated•11 years ago
|
Target Milestone: 83 → 82
Updated•11 years ago
|
Target Milestone: 82 → 83
Updated•11 years ago
|
Target Milestone: 83 → 84
Comment 41•11 years ago
|
||
Quick update - we backed this out due to an unrelated migration error (production data has NULL product_name in some cases which we didn't see on stage).
However, we know from looking at the data already that something is wrong with the underlying query. Selena is investigating.
The code to expose this via the API is all ready to go, I'll reland once we get the query worked out.
Assignee | ||
Updated•11 years ago
|
Assignee: schalk.neethling.bugs → sdeckelmann
Assignee | ||
Comment 42•11 years ago
|
||
Comment 43•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/55c8a036fca910288bb9204bf20c5ef3a06ffce8
Fixes bug 915246 Corrects JOINs and adds a column for product_name
https://github.com/mozilla/socorro/commit/61ef8f08bba74e7b324c8de2b74088f4aceeec68
Migration for bug 915246
https://github.com/mozilla/socorro/commit/a3c187c28b223fedcd8d33d166541c76a72f9ba4
Merge pull request #2031 from selenamarie/bug915246-adu-by-signature-fixup
Fixes bug 915246 Corrects JOINs and adds a column for product_name
Updated•11 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 44•11 years ago
|
||
This is all landed and working better I think:
https://crash-stats.allizom.org/api/AduBySignature/?channel=nightly&signature=ScanTypeObject&product_name=Firefox
bsmedberg what do you think?
Flags: needinfo?(benjamin)
Reporter | ||
Comment 45•11 years ago
|
||
As mentioned on IRC, I think this looks roughly correct, although it's filtering out builds that have no crashes. I'd prefer those to still be present if possible, since that clarifies in the final chart that we're not missing data but that a crash actually started/went away properly.
Flags: needinfo?(benjamin)
Comment 46•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #45)
> As mentioned on IRC, I think this looks roughly correct, although it's
> filtering out builds that have no crashes. I'd prefer those to still be
> present if possible, since that clarifies in the final chart that we're not
> missing data but that a crash actually started/went away properly.
Thanks! I filed bug 1007379 to follow up on that.
Comment 47•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #46)
> (In reply to Benjamin Smedberg [:bsmedberg] from comment #45)
> > As mentioned on IRC, I think this looks roughly correct, although it's
> > filtering out builds that have no crashes. I'd prefer those to still be
> > present if possible, since that clarifies in the final chart that we're not
> > missing data but that a crash actually started/went away properly.
>
> Thanks! I filed bug 1007379 to follow up on that.
So I recon I am still going to hold of a little on finishing up the UI bits?
Comment 48•10 years ago
|
||
Because I show up here looking for it periodically, bug 1019262 covers the front end implementation.
You need to log in
before you can comment on or make changes to this bug.
Description
•