Closed Bug 564018 Opened 14 years ago Closed 14 years ago

surface "also found in..." version information in top crash reports.

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: chofmann, Assigned: ryansnyder)

References

Details

Attachments

(2 files, 1 obsolete file)

I've been thinking more about the sequence of events in starting the research on a stack signature and one of the events early in the process of looking at data and getting bugs on file is to figure out which Product versions are associated with a signature and when the bug got introduced.

At the highest levels we are trying to figure out if this is a new bug or if it have been around for awhile.  If its a new bug it gets more attention since we want to stamp out regressions quickly.

One place to surface this info is in the top crash reports.  For example

http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6.4

shows signature, and some statitical and trend info, platform info, and module/addon correlation info.

we could also start showing a simple version of which other versions the crashes is seen in to move faster to determining if the problem is a regression.  Here is an example.

http://people.mozilla.com/~chofmann/crash-stats/20100504/topcrash-364-in-other-releases-20100504.html

The "Also found in release..." column would show the other versions where the signature is seen during the time span of the report ( e.g. 1,3,7,28 days)

scaning down the list you can see that any of the signatures showing up only in 3.6..4 and having no bugs on file are possible regressions need some attention quickly, and can weed these signatures out from other problems that have been previously looked at.

This is sort of a poor man's version of jst's new crash reports at
http://people.mozilla.com/~jst/new-crashes/Firefox/latest/3.6.4-3.6.html

On reflection this "in other version" information is probably more interesting earlier in the analysis cycle than the information we show for "platforms/os'es".  The OS info is needed at a slightly later stage when trying to figure out how to reproduce the crash.

If we were looking to keep the keep the topcrash reports simple, we might replace the platform colums with the "also found in" column, or we could make both expand out with twisties to show the more details.
see bug 557703 for changes that are happening to the platform/os info.
Target Milestone: --- → 2.0
Assignee: nobody → ryan
Target Milestone: 2.0 → 1.8
Make it sortable by count of releases.
yeah, and hover over "count" number to see the list of affected releases.

In this way we will see low numbers, then over to see if its all recent releases and and needs priority attention, or if its been around for awhile.
Target Milestone: 1.8 → 1.7.4
This is in progress.  I attempted to insert version count and version numbers in the current Top Crashers by Signature pages, but queries were taking quadruple the time to execute and return data.  Since we already have issues with connection times on the Top Crashers page, I decided to separate version information out into its own page.

Here's a sample:
http://rsnyder.khan.mozilla.org/reporter/topcrasher/versioninfo/Firefox/4.0b7pre

This page pulls the top 100 crashing signatures for a product / version.  It also display the number of versions this signature is found within, as well as the a comma-delimited list of all of these versions.  It is automatically sorted by version count in ascending order.

This page will be accessed by clicking the dropdown menu in the navigation bar, and clicking "Top Crashers Version Info"

The PHP code is ready; I'll submit the PHP and Python code for review as soon as I finish writing the Python cron job and tests.
this is looking good.  

If I sort by rank, then sort by "number of versions"  I get a good list of possible regressions to investigate in the latest version.

it would be cool if we can we could hide the details of the "versions found within" under the "number of versions count"  so when I mouse over the count we could get the list of versions.

when I talked with ryan yesterday I also showed him http://people.mozilla.com/~chofmann/crash-stats/20100930/signatures-maybe-fixed-since4.0b6.txt

it would also be good to include this ranking comparision info in this report at some point, but it going to take more work to figure out the best way to configure the set of "comparision releases".   maybe that will be an admin pannel thing for the defaults,  then users might be able to reconfigure the comparision releases.

these are the kinds of comparisions that are useful.

current release to previous release.

 3.6.10 to 3.6.9  
 4.0.b7pre  4.0b6 3.6.10

    see if ranking on latest release are higher/lower than previous releases
    or if any new crashes are showing up

previous release to current release.

 3.6.9 to 3.6.10
     see if bugs we fixed since previous release are absent/reduced as expected
this report also needs to have the ability to filter plugin hangs and crashes.  I though I've filed another bug on that but can't seem to find it now.

that feature would work like, and look like, the "Crashes per User" chart.  

   Type:       Firefox Crashes [ X ] Plugin Hangs and Crashes [  ]  All  [  ]

default view would be only to show firefox crashes.

Firefox crashes should be determined when reports associated with the signature meet these filters

  process_type matches \N  *and*  hang_id matches \N

Plugin crashes and hangs are when

  process_type matches "plugin" *and* hang_id matches a_UUID_of_the_hang_pair

This is one of the keys to making the report successful.  Engineers aren't looking at the current topcrashes pages and are missing key firefox crashes because of all the plugin crashes and hang signatures in the list.  plugin crashes and hangs are not as actionable so we need to filter these out and/or view them separately
Thanks Choffman.

I'll spin off a ticket to create what you're looking for in Comment 5.  I don't want to add too much more to what I'm trying to accomplish in this bug.  See Bug 601633.

Hiding / Showing will be taken care of in Bug 562216 and I'm working on fitting that into this release.
Attached patch Patch 1 for 564018 (obsolete) — Splinter Review
Attached is the patch for Bug 564018.

This patch includes the creation of a new table, signature_versions, to map signatures to the versions in which they were found.  (Querying existing tables was increasing query times by 8+ seconds.)  This required adding a query to the Top Crashes by Signature cron in order to populate the signature_version table with data from the previous day.
Attachment #480822 - Flags: review?(ozten.bugs)
Attachment #480822 - Flags: feedback?(robert)
Attachment #480822 - Flags: review?(lars)
Attachment #480822 - Flags: feedback?(laura)
(In reply to comment #8)
webapp-php/application/models/topcrashers.php
line 426 - first and third arguments have default values. All three or only the third one should have default values.

line 431 - SQL
I'll defer to you and other's feedback, but I think that the direction is to keep SQL out of the PHP layer, so we can swap out backends w/o major changes. (REST API A versus REST API B, which both serve up the same representation).

Did you consider enhancing the top crashers by signature web service call to include this new data in it's output? This would be work in socorro/services/topCrashBySignatureTrends.py, instead of this PHP model class.

socorro/cron/topCrashesBySignature.py
line 222: Did you consider reusing window_start and window_end? Should the signature_versions table have the same level of granularity as the top_crashes_by_signature table?
Status: NEW → ASSIGNED
Target Milestone: 1.7.4 → 1.7.5
Comment on attachment 480822 [details] [diff] [review]
Patch 1 for 564018

Austin and I talked through a better solution for this over the phone.  I'll dig back into the code and post another patch for review tomorrow or Friday.
Attachment #480822 - Flags: review?(ozten.bugs)
Attachment #480822 - Flags: review?(lars)
Attachment #480822 - Flags: review-
Attachment #480822 - Flags: feedback?(robert)
Attachment #480822 - Flags: feedback?(laura)
you may want to consider just defining a view of the top_crashes_by_signature table like this: create view signature_versions as (select distinct signature, productdims_id from top_crashes_by_signature).  Using something like that eliminates the need for a lot of code to create and maintain another table.  The downside might be performance, but that should be tested before the idea is eliminated.
Thanks Lars.  Yes, Austin recommended going with a materialized view for this as well, and I agree that's the ideal solution in theory.  I definitely want to keep the codebase to a minimum.  However, I've been quite displeased with the materialized view query performance in the tests I've been running. Unfortunately I think the optimum solution from a performance standpoint will entail a script and an extra table.
Thanks guys for the feedback.  I worked with the materialized views a lot, but wasn't able to find a solution that was optimized enough to implement.

The solution in the patch is a bit heavier than I'd prefer, but it will ensure that the queries are efficient.  It includes a new table (signature_productdims) that will be populated by a nightly cron job (startSignatureProductdims.py).

This patch also includes the version count and a string of comma-delimited versions associated with each top crashing signature on the top crashers page, which will provide a solution for chofmann's original request.
Attachment #480822 - Attachment is obsolete: true
Attachment #482083 - Flags: review?
Attachment #482083 - Flags: feedback?(robert)
Attachment #482083 - Flags: review?(ozten.bugs)
Attachment #482083 - Flags: review?(lars)
Attachment #482083 - Flags: review?
Attached is a screenshot of the top crashers by version page that includes version count.  The page results may be re-ordered by version count.
Comment on attachment 482083 [details] [diff] [review]
Patch 3 for 564018

in the instructions on the upgrade page, make a recommendation as to when this cron job should be run.  Also make it clear somehow to the users that "today's" data never makes it into the signature_productdims table until the following day.
Attachment #482083 - Flags: review?(lars) → review+
Comment on attachment 482083 [details] [diff] [review]
Patch 3 for 564018

Nice work Ryan. We should eventually move that SQL out of webapp-php/application/models/topcrashers.php and into a Hoopsnake API.
Attachment #482083 - Flags: review?(ozten.bugs) → review+
Thanks Lars.  Thanks Austin.

I've updated the SocorroUpgrade page with release notes:
http://code.google.com/p/socorro/wiki/SocorroUpgrade

I've made a note to eventually move fetchTopcrasherVersions() out of PHP and into the middleware layer; it's in the SOLR API wiki doc which we'll be looking at in 1.9.

I'll commit as soon as Bug 603652 in place.
Committing:

==

Adding         scripts/config/signatureProductdims.py.dist
Adding         scripts/startSignatureProductdims.py
Adding         socorro/cron/signatureProductdims.py
Sending        socorro/database/schema.py
Adding         socorro/unittest/cron/testSignatureProductdims.py
Sending        socorro/unittest/database/testSchema.py
Sending        webapp-php/application/controllers/topcrasher.php
Sending        webapp-php/application/models/topcrashers.php
Sending        webapp-php/application/views/common/list_topcrashers.php
Sending        webapp-php/js/socorro/topcrash.js
Transmitting file data ..........
Committed revision 2584.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 603707
OS: Mac OS X → All
Hardware: x86 → All
I see the "Ver" column on http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/3.6.9, etc. -- what else should QA be covering?

If someone helps me write a step-by-step test (or two/more), I'll make sure that gets into Litmus; I read through the bug, but other than the above, it wasn't precisely clear to me what was implemented/how I should test.  Thanks!
Flags: in-litmus?
Attachment #482083 - Flags: feedback?(robert)
Chofmann: do you have time to help test this on staging?  Thanks!
sure.  ran a few quick checks and spotted this.

load
http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/3.6.9/14/browser

then sort on the "Ver" column to get ascending order.  I see a bunch of signatures that show "-"

We shouldn't see that.  every crash in the list should have 1 or more versions (in this case 3.6.9 might be the only version where the crash is found).

Maybe this is just a problem in the sample of test data that have on stage, but we should take a look at the logic to make sure its right.
We have a cron job set up to run nightly to collect version information for each signature.  The Versions field will display a "-" whenever there is incomplete version data for a specific signature.  Once the cron job runs overnight, you'll see the proper number of versions in that field the next day.

Related - I checked in on Bug 603652 and found that the scripts/startSignatureProductdims.py script that is used to collect version information had not been enabled in cron on stage.  It is now enabled in cron on stage and is set to run once every night.
(In reply to comment #21)

Also in this link
http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/3.6.9/14/browser 
I noticed that the 'Ver' column has 1,2,3,4 etc. but when you click on the
signature and you get the table of all reports with that signature the
'Version' column has the correct version, 3.6.9. So what are the
1,2,3...numbers doing in the 'Ver' coulumn?
@Vishal, the number in the Ver column indicates the number of versions in which this crash signature is found.  When you mouse over the number, you will see all of the versions in which this crash signature is found listed in a comma-limited format.
ah, ok.  so "-" sounds like its a good indicator that the crash is new since the last nightly version info check was run.  that's good information that we can spread the word on.  we could also add that as a note on the page later if people continue to not understand what it is.

 You are not authorized to access bug #603652. but I think I get the gist of what is going on there.   what time will we run the version check cron job?
@Chofmann - Sounds good. I've cc'd you on bug 603652 where I've requested the cron job time.
Chofmann/Vishal -- would you consider this bug verified on staging?  Thanks!
yeah,  a good test case for this was over in the "browser only report filtering bug"
(In reply to comment #28)
> yeah,  a good test case for this was over in the "browser only report filtering
> bug"

Which bug is that?  I've tried looking for it, to no avail, and I'd like to get the testcase down; thanks!
Thanks, verified FIXED.
Status: RESOLVED → VERIFIED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: