Closed
Bug 629062
Opened 13 years ago
Closed 11 years ago
Detect explosive crashes
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
Future
People
(Reporter: laura, Assigned: shuhao)
References
Details
(Whiteboard: [Q42011wanted] [Q12012wanted][2.5.1])
Attachments
(3 files, 2 obsolete files)
From https://wiki.mozilla.org/Socorro:PRD_2.x#New_.2F_explosive_.2F_critical_crash_tracking We need a detection process, and a set of criteria that will need tuning. My suggested initial criteria are: - New crash in the top 100 for a product, with no associated bug - Crash that has increased more than (insert frequency formula here) in a time period. I hope chofmann will chime in with feedback on this. Once we have a process, we can turn this into a calculation that runs hourly/daily. Depending on the final complexity, this may run against PostgreSQL or HBase. What I'd like is to get a prototype of this feature up and running. We can tune the calculation later.
Reporter | ||
Updated•13 years ago
|
Assignee: nobody → chofmann
Comment 1•13 years ago
|
||
the use cases for where this would have been valuable in the past, and where it might save us in the future can be found in bugs marked with [explosive] in the status whiteboard. here is a query to find those. https://bugzilla.mozilla.org/buglist.cgi?status_whiteboard_type=allwordssubstr;query_format=advanced;status_whiteboard=explos best way to get test cases and tuning parameters and figure out at what earliest points we could predict when these crashes might have been getting out of control, then use those values for future prediction. I'll try and dig some of those numbers out in the next few days.
Comment 2•13 years ago
|
||
jotting down more notes for further refinement later... "explosive across all releases" will catch some situations where website content or external factors like new plugin releases will blow up on us. "explosive for a particular release" will be more for catching regressions that we introduce into our own code. so one of the factors for the "explosion detector" will be for which sets of releases, or all releases, where we will have detection running.
Comment 3•13 years ago
|
||
The detector should be capable of spotting the rise of entirely new crashes, but also tell us about significant volume increases on existing signatures. bug https://bugzilla.mozilla.org/show_bug.cgi?id=528798 represents an interesting use case. Here we had an existing signagture "\N" (the null signature) for which we saw a volume increase. Detection of volume increases is the first step, but then we also get bonus points if the detector could also tell us there is an abnormal increase in frequency of keywords like "Zone Alarm" in the crash report comments.
Comment 4•13 years ago
|
||
All of the data required to detect these should already be present in the PostgreSQL database. It's just a question of adding interfaces/APIs to view that data.
Reporter | ||
Updated•13 years ago
|
Assignee: chofmann → laura
Reporter | ||
Comment 5•13 years ago
|
||
Scope for this bug is: - define "explosive" - develop query - wrap it in an API call
Reporter | ||
Comment 6•13 years ago
|
||
Since we still don't have a good definition of this, this bug is going to slip.
Target Milestone: 1.7.7 → 1.7.8
Comment 7•13 years ago
|
||
I'm currently working on an algorithm for that and have some experimental reports going to find out how well it can work, but I still need to verify with the rest of the CrashKill team that what I have there is what we really want from this. Current notes and proposals are at https://wiki.mozilla.org/CrashKill/Plan/Explosive but this is not finalized yet.
Reporter | ||
Updated•13 years ago
|
Assignee: laura → chris.lonnen
Comment 9•13 years ago
|
||
KaiRo has made finishing this algorithm a Q2 goal and stated that implementing it should be a Q3 goal, anyhow.
Reporter | ||
Updated•13 years ago
|
Whiteboard: Q3
Target Milestone: 2.0 → 2.1
Reporter | ||
Updated•13 years ago
|
Target Milestone: 2.1 → 2.2
Reporter | ||
Updated•13 years ago
|
Target Milestone: 2.2 → 2.3
Comment 10•13 years ago
|
||
KaiRo do you want to take this bug until you've settled on an appropriate algorithm?
Comment 11•13 years ago
|
||
I have the algorithm as we decided we'd like to have what I'm running as prototypes, that works well for detecting rising issues. I also now know that we have the numbers the algorithm works on as that should match what we have for TCBS (esp. the new one with the daily numbers). I just need to spec it out here in a way that it is reasonably understandable. Still, the code I'm running for my prototypes is in http://hg.mozilla.org/users/kairo_kairo.at/crash-report-tools/file/tip/get-explosives.php and the meat of the algorithm is at http://hg.mozilla.org/users/kairo_kairo.at/crash-report-tools/file/tip/get-explosives.php#l469 - but it's probably more helpful if I do some writeup in words as well.
Updated•13 years ago
|
Assignee: chris.lonnen → kairo
Reporter | ||
Updated•13 years ago
|
Target Milestone: 2.3 → 2.4
Reporter | ||
Updated•13 years ago
|
Whiteboard: Q3 → Q42011wanted
Comment 12•13 years ago
|
||
I'll try on a spec here, duplicating what I have done in my own reports. The final reports - with a somewhat crude UI - look like this: https://crash-analysis.mozilla.com/rkaiser/2011-10-18/2011-10-18.firefox.7.explosiveness.html https://crash-analysis.mozilla.com/rkaiser/2011-10-18/2011-10-18.firefox.nightly.components.html The algorithms are as linked in comment #11 - http://hg.mozilla.org/users/kairo_kairo.at/crash-report-tools/file/tip/get-explosives.php - and also in https://wiki.mozilla.org/CrashKill/Plan/Explosive - but here's another description: 2 values for "explosiveness" ("1-day" and a "3-day") are being calculated for every signature and date (and product/version). An unresolved point is if we should match other aggregations by doing this per-platform or if we should sum across platforms as my prototype reports do. Both "explosiveness" values are analyzing statistically how strongly recent (the analyzed day for "1-day", the average of that and the two days leading up to it for "3-day") values rise over longer-term values. In short, "1-day" is a factor of how much the recent value is outside the previous maximum, "3-day" is a factor of how much the average of the recent 3 days is outside the standard deviation of the days before. The base values taken for the analysis are the aggregated daily numbers of crashes per (1M) ADU for the respective signature for 10 days up to the analyzed date (or less if 10 are not available, should be a minimum of 4-6 though), which is values we already should have in (new)tcbs tables (and ADU ones). The "1-day" value is being calculated taking the difference between the maximum and average values of the (9) days leading up to the analyzed day, clamping that to a minimum value (configurable), and dividing the difference of the analyzed day's value to that previous average by that other clamped difference. The "3-day" value is being calculated taking the standard deviation of the (7) days before the 3 recent days, clamping that to a minimum value (configurable), and dividing the difference of the average of the recent 3 days' values to the average underlying the deviation by the deviation. Signature are being highlighted in the UI if one of those "explosiveness" values is over a configurable limit. So, in the end, the configurable variables in the algorithm are the minima to clamp the differences in the divider to (both preventing divison by zero and cleaning out variations at low crash volume) - and the limits to highlight things in the UI. See "*** explosiveness tuning ***" in my script for the values I've used in prototyping. I hope the description here plus the two linked sources give enough of a spec to work on this, and so I'm reassigning this back to Chris. If any more questions come up, feel free to ask me.
Assignee: kairo → chris.lonnen
Comment 13•13 years ago
|
||
I can start working on the UI aspects of this if that makes sense
Comment 14•13 years ago
|
||
Oh, I missed that this had a formula now. I'll see if I can translate Kairo's formula into code.
Reporter | ||
Updated•12 years ago
|
Whiteboard: Q42011wanted → [Q42011wanted] [Q12012wanted]
Reporter | ||
Updated•12 years ago
|
Target Milestone: 2.4 → 2.4.1
Updated•12 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
Updated•12 years ago
|
Target Milestone: 2.4.1 → 2.4.2
Reporter | ||
Comment 15•12 years ago
|
||
Schalk: put this on your UI list
Assignee: chris.lonnen → sneethling
Comment 16•12 years ago
|
||
As per IRC discussion, this is a new report that will be accessible from the drop down, same as top changers and top crashers.
Comment 17•12 years ago
|
||
[:kairo] So in comment 12 you link to two separate HTML pages that covers different data. Both of these are really big so I assume you do not want these combined on one page. We agreed to link to the explosiveness report from the drop down and I am thinking that from there we can present the user with the options to switch between the different reports. I am also thinking that we might want to take a more, on demand approach to especially https://crash-analysis.mozilla.com/rkaiser/2011-10-18/2011-10-18.firefox.nightly.components.html This one, https://crash-analysis.mozilla.com/rkaiser/2011-10-18/2011-10-18.firefox.7.explosiveness.html, should probably be the default and then one can link to the other from there.
Comment 18•12 years ago
|
||
(In reply to Schalk Neethling from comment #17) > I am also thinking that we might want to take a more, on demand approach to > especially > https://crash-analysis.mozilla.com/rkaiser/2011-10-18/2011-10-18.firefox. > nightly.components.html Umm, that's not explosiveness, that's a components report and doesn't belong in this bug. I think I tried linking something like https://crash-analysis.mozilla.com/rkaiser/2012-02-08/2012-02-08.firefox.nightly.explosiveness.html there (just for a different date). > This one, > https://crash-analysis.mozilla.com/rkaiser/2011-10-18/2011-10-18.firefox.7. > explosiveness.html, should probably be the default and then one can link to > the other from there. Yes, this is the one I envision to be there.
Comment 19•12 years ago
|
||
[:kairo] I assume all 15 columns are of equal importance or, would it be an option to display on first load only the first 4 columns and then allow the user to show and hide the additional 7?
Comment 20•12 years ago
|
||
[:kairo] Would it be possible to relocate the 'Total crashes' row at the bottom of the table? It would make adding sorting to the table a whole lot simpler.
Comment 21•12 years ago
|
||
It might be reasonable to hide the daily data the explosiveness data bases on, as long as it's just one click to show it and hopefully not a reload of the page, if possible. The totals are intentionally at the top, as they are a reference point for the other values - if the totals are explosive themselves, the per-signature values may be "biased" by that.
Comment 22•12 years ago
|
||
Comment 23•12 years ago
|
||
total crashes can be hidden and shown without the need for a page reload
Comment 24•12 years ago
|
||
For one thing, we should try to have crashes per million ADU instead of total crashes - we have the needed ADU data for that (while I don't always have in my custom reports), see e.g. the Firefox 7 report linked in comment #12 or the Firefox 12 one here: https://crash-analysis.mozilla.com/rkaiser/2012-02-12/2012-02-12.firefox.12.explosiveness.html Also, it probably would be good to de-emphasize the historic data somewhat if in any way possible (but else, let's leave it). I'd like to see a mockup of something containing highlighted "explosive" crashes with one explosiveness number of 2 or higher.
Comment 25•12 years ago
|
||
'Also, it probably would be good to de-emphasize the historic data somewhat if in any way possible (but else, let's leave it).' [:kairo] Could you clarify a little more what you mean by historic data? Do you mean the data under the 'Data (total crashes / 1M ADU)' column? 'I'd like to see a mockup of something containing highlighted "explosive" crashes with one explosiveness number of 2 or higher.' [:kairo] Will send this to you ASAP
Status: NEW → ASSIGNED
Reporter | ||
Comment 26•12 years ago
|
||
Shouldn't the entry under Total Crashes/Explosiveness be blank? It's unclear to me what those numbers would mean. I'd also like to see a rank number for explosiveness, but Kairo might think that's irrelevant.
Comment 27•12 years ago
|
||
[:laura] I believe the rank number might be the one on the far left under TC#. 'Shouldn't the entry under Total Crashes/Explosiveness be blank? It's unclear to me what those numbers would mean.' [:laura] I assume here you are referring to the blue line. I am thinking this is the total of calculating all of the totals from the bottom up. [:kairo] would of course be able to answer these things better than me most likely.
Reporter | ||
Updated•12 years ago
|
Target Milestone: 2.4.2 → 2.4.3
Comment 28•12 years ago
|
||
(In reply to Schalk Neethling from comment #25) > [:kairo] Could you clarify a little more what you mean by historic data? Do > you mean the data under the 'Data (total crashes / 1M ADU)' column? Yes. (In reply to Laura Thomson :laura from comment #26) > Shouldn't the entry under Total Crashes/Explosiveness be blank? It's > unclear to me what those numbers would mean. This means how "explosive" that complete set of crashes for this version is and gives a reference value for the rest of the lines. > I'd also like to see a rank number for explosiveness, but Kairo might think > that's irrelevant. What would a rank number help there? explosiveness itself is already a pretty good indicator number, IMHO.
Updated•12 years ago
|
Target Milestone: 2.4.3 → 2.4.4
Comment 29•12 years ago
|
||
initial loading state
Attachment #596768 -
Attachment is obsolete: true
Attachment #596769 -
Attachment is obsolete: true
Comment 30•12 years ago
|
||
expanded view - no page reload needed
Comment 31•12 years ago
|
||
and again the collapsed view - no page reload needed
Comment 32•12 years ago
|
||
I like those screenshots - the number would look better if they were right-aligned, though. :)
Comment 33•12 years ago
|
||
[:kairo] Awesome. Yeah, I was toying with aligning the numbers either right or center. Will play around with it a bit.
Comment 34•12 years ago
|
||
Unfortunately, work on ESR is going to delay implementing this. Bumping to 2.5.1.
Whiteboard: [Q42011wanted] [Q12012wanted] → [Q42011wanted] [Q12012wanted][2.5.1]
Target Milestone: 2.4.4 → 2.5.1
Comment 35•12 years ago
|
||
UI work is 5.2.1 targeted for completion, backend is targeted for 2.5.2
Updated•12 years ago
|
Target Milestone: 2.5.1 → 2.5.2
Reporter | ||
Updated•12 years ago
|
Target Milestone: 2.5.2 → 3
Reporter | ||
Updated•12 years ago
|
Target Milestone: 3 → 4
Updated•12 years ago
|
Target Milestone: 4 → 5
Updated•12 years ago
|
Target Milestone: 5 → 6
Updated•12 years ago
|
Target Milestone: 6 → 7
Updated•12 years ago
|
Target Milestone: 7 → 8
Updated•12 years ago
|
Target Milestone: 8 → 9
Updated•12 years ago
|
Target Milestone: 9 → 10
Updated•12 years ago
|
Target Milestone: 10 → 11
Comment 36•12 years ago
|
||
The final version of the table for this is now deployed in v.8.0. Try "select * from explosiveness".
Comment 37•12 years ago
|
||
Actually, better example: to select just the explosiveness for a specific product and version: SELECT signature, oneday, threeday FROM explosiveness JOIN product_versions USING ( product_version_id ) JOIN signatures USING ( signature_id ) WHERE product_name = 'Firefox' AND version_string = '14.0a1' ORDER BY oneday DESC limit 20; Now, getting the data per day is a bit more complex. Table explosiveness has a column called "last_date". This is a DATE, and is the date of "yesterday" as far as explosiveness is concerned. Then it has columns day0, day1, ... day9, where day0 corresponds to the last_date, and day9 corresponds to last_date - 9 days. So: SELECT signature, oneday, threeday, last_date, day0, day1, day2, day3, day4, day5, day6, day7, day8, day9 FROM explosiveness JOIN product_versions USING ( product_version_id ) JOIN signatures USING ( signature_id ) WHERE product_name = 'Firefox' AND version_string = '14.0a1' ORDER BY oneday DESC limit 20; ... and then you need to compute the dates for the headers yourself. Note that some explosiveness charts will have less than 20 signatures available. Also, you can only select *one* product and *one* version. Kairo is aware of this.
Updated•12 years ago
|
Target Milestone: 11 → 12
Comment 38•12 years ago
|
||
Josh, when I try to use any of those queries I get ERROR: permission denied for relation explosiveness
Comment 39•12 years ago
|
||
Kairo, Ooops, sorry, I forgot to grant user Analyst permissions on that table. Will fix.
Updated•12 years ago
|
Target Milestone: 12 → 14
Updated•12 years ago
|
Target Milestone: 14 → Future
Updated•11 years ago
|
Assignee: sneethling → nobody
Status: ASSIGNED → NEW
Assignee | ||
Comment 40•11 years ago
|
||
I've been working on a newer version of this as a part of my intern projects. See PR: https://github.com/mozilla/socorro/pull/1394 for the first iteration.
Assignee: nobody → shwu
Status: NEW → ASSIGNED
Comment 41•11 years ago
|
||
merged in https://github.com/mozilla/socorro/commit/e3305abd5b75e734f055d445533ff053d3589796
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•