Closed Bug 577724 Opened 14 years ago Closed 13 years ago

Add custom field for crash signatures

Categories

(bugzilla.mozilla.org :: Administration, task)

task
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: lars, Assigned: dkl)

References

Details

Attachments

(1 file)

Currently in Bugzilla, developers must paste the signature of a crash into the title (short description) of a bugzilla entry.  This enables the Socorro system to make associations between the bugzilla entry and actual crashes coming in from clients.  While this works in most cases, sometimes crash signatures are longer than the allowable number of characters in the title.  Sometimes multiple crash signatures should be associated with a bugzilla entry but there is insufficient room.  Socorro goes to great length to parse bug titles every hour searching for new or changed signatures within bugs.  This is error prone and occasionally untimely, especially as signatures are becoming longer and more complex.

We need to create a better way to serve the critical needs of the developers and QA:

1 - an easier interface for developers to insert lists of crash signatures in their own field.  This would restore the title field to its original purpose of being a verbal description of a problem rather than a container of detail information.

2 - a better interface for Socorro to exploit for discovery of bug/signature associations.
letting us omit the template garbage and parameter garbage from bug summaries would be a good start.
A custom field specifically for stack signatures might be appropriate.
(In reply to comment #2)
> A custom field specifically for stack signatures might be appropriate.

Either bugzilla.redhat.com or bugzilla.gnome.org has something like that (or both). I should look into what exactly they do.
interesting idea. any thoughts about backward compatibility/how this would work for older bugs? (need to think on how this affects bugzilla queries in general)
I'd imagine that once custom fields are added, they would be available to all existing bugs.  We'd need to make a script to populate the custom field for all the old bugs that had signatures in the title.  The Socorro association script could then stop this error prone parsing of titles looking for signatures.  

The Bugzilla search page would, of course, have to accommodate the new signature field(s).  I would think that it would make searching for a signature in Bugzilla a much more efficient task as far as the underlying database was concerned.
Summary: Improve handling of crash signatures → Add custom field for crash signatures
Ok, taking this bug. 

The things that need to happen are:

1. We need to settle on a name for the new field.
2. We need to decide how large the field needs to be to handle the largest possible size of the crash signatures, i.e. normal input field or textarea.
3. We then need to decide which products the new field need to be visible for such as all or just a select few.
4. Once the new field is enabled, we need to create a script to migrate the signatures from the summary field into the new custom field so all are in the same place.
5. Searching will involved selecting the proper boolean field from the query.cgi page and filling in the search text. 

Dave
Assignee: nobody → dkl
OS: Mac OS X → All
Hardware: x86 → All
Is a freeform text field like we have currently in the whiteboard easier than a list? Since many bugs have multiple signatures, it would be more logical to have a list field, but I imagine that's a lot more complicated in terms of database/search.

/me suggests the following values:

Field name: signatures
Max field size: 8k or larger if possible
Products: All "client software" and "components", Mozilla Labs, mozilla.org, Tech Evangelism

Let's deal with the script after the field is available and in BzAPI: then it should be fairly straightforward for anyone to do the migration in spare time.
Blocks: 634276
(In reply to comment #7)
> Let's deal with the script after the field is available and in BzAPI: then it
> should be fairly straightforward for anyone to do the migration in spare time.

Until then (once the field addition is live), all humans and machines will have to update their searches to search both fields for bugs?

I'm not opposed to that, as long as it doesn't last a long period of time, and certainly I'm all for unblocking something that will get us useful crash signatures in bugs and bug-crash relationships in Socorro; I just don't want the field migration to be forgotten and have to be searching two fields for crash info until the end of time….
depending on how well summary change tracking is surviving in gmail, this might result in bugs being considerably harder to find for gmail users, not that we care about those users.
(In reply to comment #9)
> depending on how well summary change tracking is surviving in gmail, this might
> result in bugs being considerably harder to find for gmail users, not that we
> care about those users.

I think not just gmail users. And in bugzilla query will this be added to the standard form, or require using boolean chart.

this is stop energy I know, but I'm quite skeptical the current few (perhaps even rare) failures of socorro-bugzilla match is worth the work effort and side effects of this solution. And IMO the crash sig is no less "key" than anything else one tends to put in the summary.  It's not that I'm not open to change - I'm just not sure this is the best way forward.
We experience the problem of the summary field not being long enough for signatures currently very often, at least in the plugin bugs. Especially for bugs which are associated with multiple signatures, the current situation sucks a lot.

It also makes the bug summary needlessly complex. I don't think we *want* the crash signatures in the summary in general.

This change would improve my gmail-bugmail experience considerably, because the threading wouldn't be broken every time scoobidiver adds a new signature to the bug summary.

If I can make the executive decision here, I don't think the concerns about bugmail are valid, and we should continue to implement this change.
My concern as someone who operates intensely in both firefox and thunderbird crash bugs is not about adding the custom field per se. I pretty much have no objection to doing this if no major changes to bug summary are advocated and current practices in this area are maintained.

My concerns are about several comments of how summary might/should used without wider discussion, uncertainty about how "old bugs" will or won't change, that (whatever the solution) we agree and document how summary be effectively used and that it be uniformly applied by all bugzilla users, and how bugzilla search is enhanced or degraded by such changes.


> I don't think we *want* the crash signatures in the summary in general.

This direction shouldn't be taken without wider discussion outside the bug. And current usage of summary doesn't bear out that people tend to use the summary in crash bugs for anything other than signature. (see below)


> We experience the problem of the summary field not being long enough for
signatures currently very often, at least in the plugin bugs.

You may be right about the bias in plugin bugs - of ~600 bugs in http://bit.ly/dMEv90 it looks like 20-30 (5%) ran out of room in summary? (just guessing - it's hard to say without someone citing examples)

But in the bugs I've touched (about 3,000) it is exceedingly rare. Still, I agree a solution is needed.


> This change would improve my gmail-bugmail experience considerably, because the
threading wouldn't be broken every time scoobidiver adds a new signature to the
bug summary.

Noted. But fundamentally 
a) that is a gmail problem, not a bugzilla problem. 
b) you should be seeing this a lot because summaries *should* be improved as the bug evolves (though I don't think it happens enough)
c) it must certainly be more of a problem in non-crash bugs, no?  (crashes are just a small subset our bugs)


Regarding populating the field:
Of open crash bugs - http://bit.ly/dGFEaP - will all have the new field populated?
Of closed bugs, which ones will have the custom field populated?
(FWIW I'm not in favor of changing a wide swath of old bugs)

And thus, how will changing some bugs and not others affect being able to effectively bugzilla search to find related bugs (or a specific bug), across all time frames and statuses?


===

Prior to comment 11 I started a small investigation of how bug summary is being used.  I've limited my examination to 45 days of firefox+core crash bugs that were closed fixed. (if anyone thinks this is a poor selection I'll be happy to consider examining some other population)

http://bit.ly/gWguVK - 143 fixed crash bugs 
http://bit.ly/hRGcRC - 120 fixed crash bugs with summary with mapable signature
Of these, only about 20-25 have anything substantial beyond signature.
And of these, roughly half (so about than 7%) have anything remotely "technical" in the summary about the cause of the crash.

My take away - for bugs that have a signature, summary is NOT currently being used for anything other than the signature, even in cases where the signature is small (which is the norm).  I certainly not against that, quite the contrary - as long non-developers can find bugs they care about.
What was the final decision on this bug and the field(s) needed to be a satisfactory resolution?
We definitely need something to connect long signatures between Bugzilla and Socorro, and we need something to connect multiple signatures that currently overrun the summary field.

What's the status in getting this solved so we have a solution for our problems?
It sounds like the closest thing we have to a workable solution is bsmedberg sugggestion in commment 7.  the side effect is that is going make searching for bugs by signature more problematic.  to reduce the effects of this problem could we do some things on the "backend"?  this would be things like appending the new signature field with the title field when we send bugzilla mails, and automatically hook in the signature field into searches for title.

the basic problem here is that the signature works just fine recording it the bug title, but the bug title is just not long enough.  the main thing to fix here is that we need to extend bug titles when we have signatures.
(In reply to comment #15)
>  to reduce the effects of this
> problem could we do some things on the "backend"?  this would be things like
> appending the new signature field with the title field when we send bugzilla  mails, 

like this idea. suggest it be a prerequisite 


> and automatically hook in the signature field into searches for title.

this would certainly help make transparent the searching of both existing bugs with sig in summary and new bugs with sig in "signature" field
We could extend bug title sizes to 8k, but that seems like a less nice solution than a custom field.

bsmedberg has, I think, the authority to make the decision to do this, and has done in comment 11.

(In reply to comment #6)
> The things that need to happen are:
> 
> 1. We need to settle on a name for the new field.

cf_signatures.

> 2. We need to decide how large the field needs to be to handle the largest
> possible size of the crash signatures, i.e. normal input field or textarea.

Single-line text custom fields are 255 characters, which is too small. Multi-line textarea custom fields are SQL MEDIUMTEXT, which can store up to 16MB. So we should use one of those.

> 3. We then need to decide which products the new field need to be visible
> for such as all or just a select few.

See comment #7.

> 4. Once the new field is enabled, we need to create a script to migrate the
> signatures from the summary field into the new custom field so all are in
> the same place.

We should:

1) Decide on a suitable regexp for extracting the signatures from fields (kairo)
2) Publicise the fact that this field is going to appear (bsmedberg)
3) Write a migration script using the regexp and test-run it on stage (dkl/glob)
4) Ship the field and run the script (dkl/glob)
5) Occasionally for a few months afterwards run the script again, and educate those who put the data in the wrong place (dkl/glob)

The aim here is to avoid people having to search _both_ fields. Avoiding this means we avoid some of the issues which lead to the suggestions in comments 15 and 16.

> 5. Searching will involved selecting the proper boolean field from the
> query.cgi page and filling in the search text. 

We should wait and see if this proves sufficient in practice before doing some ugly hacks to make searching field A actually search field B.

Gerv
(In reply to comment #17)
> 1) Decide on a suitable regexp for extracting the signatures from fields
> (kairo)

Well, I'm by far no regexp guru, but we need to copy all /\[@ .+ \]/ blocks (or better /\[@ [^\[\]]+ \]/?) to that new signature field. Note that not all those blocks need to follow each other when multiple signatures are mentioned in the summary. We also cannot remove the blocks from the summary because they may be mentioned in useful descriptions, people should remove them from the summary themselves.
Sorry, I wasn't sufficiently clear. 

> 1) Decide on a suitable regexp for extracting the signatures from fields 
> (kairo)

should have been:

1) Decide on a suitable algorithm for accurately identifying bugs with crash signatures and then a regexp for extracting the signatures from the subject line (kairo)

Your regexp, if run against all bugs in Bugzilla, would have a load of false positives. How do we know which bugs have a signature? The "crash" keyword? Some other way?

Gerv
(In reply to comment #19)
> Your regexp, if run against all bugs in Bugzilla, would have a load of false
> positives. How do we know which bugs have a signature? The "crash" keyword?
> Some other way?
> 
> Gerv

Not as many false positives as you might thing.  But you make a great point. To expand on that ... 

"To be absolutely positively sure" of high quality in the first pass of populating a crash-sig field, IMO we would want bugs to have 
a) crash or topcrash keyword, 
b) sev=critical, 
c) not have [notacrash] in whiteboard, 
d) have a properly formulated [@ ...]

Having cleaned up many crash bugs over the past two years, I could undertake the task of ensuring we have in place whatever level of quality is desired before making the first run to populate crash-sig.  

There are of course crash bugs that don't have sigs in the summary, and there will always such bugs.  But, as a side project, someone might want to undertake determining
 e) which of them can have useful sigs added to them, and
 f) whether any useful data about our processes can be gained from examining these bugs as to why they don't have sigs

A quick, preliminary pass at these items:
a) missing keyword - a few dozen in http://bit.ly/jik1Jp
b) not sev=crit - a percentage of http://bit.ly/kSBoGJ and http://bit.ly/jgGRmj (200-300 bugs)
c) I'm pretty sure very few or none exist - need examination
d) improper [@...] (there are several) -  at least 4 in http://bit.ly/mBQUDJ   at least 4 in http://bit.ly/mAfhzT


b) sev=crit may not an absolute requirement, and I know there is a percentage of bugs that are not sev critical because some people seem to have an aversion of bothering to mark them as such.  However, it does help narrow population of bugs to be examined by a thousand-fold.

disclaimer: I don't presume that the above list is complete.
I don't think we need to overthink this: pick a regex which is mostly-good, and there's little harm in putting some slightly bad data into the crash sig field. The severity/keyword metadata isn't consistently applied.
It's not just about bad data in the crash sig field; if we are going to send a big signal to people to stop putting crash data in the title, then we want to _remove_ it from there as we add it to the crash data field. So false positives are bad.

Gerv
(In reply to comment #19)
> Your regexp, if run against all bugs in Bugzilla, would have a load of false
> positives.

Hmm, I haven't actually looked at it - unfortunately, the code of Socorro that does this is so different to regexp filtering that it can't just be compared.
I personally think I haven't seen the pattern that this regexp matches in other than crash bugs.

> How do we know which bugs have a signature? The "crash" keyword?
> Some other way?

I guess the "crash" keyword is the one thing that can mostly narrow it down, yes. Any bugs that don't have it should be manually corrected anyhow.

I mostly agree with Ben in comment #21 though - and I can't agree with comment #22 as removing it from the summary would in many cases make the summaries look very strange, sometimes completely crippled, in a lot of cases just be "crash". Summaries need to be manually adjusted once Socorro has been changed to not use them but the new field instead (also not that we can't count on the Bugzilla and Socorro changes to be made at the same time).
(In reply to comment #23)
> > How do we know which bugs have a signature? The "crash" keyword?
> > Some other way?
> 
> I guess the "crash" keyword is the one thing that can mostly narrow it down,
> yes. Any bugs that don't have it should be manually corrected anyhow.
> 
> I mostly agree with Ben in comment #21 though - and I can't agree with
> comment #22 as removing it from the summary would in many cases make the
> summaries look very strange, sometimes completely crippled, in a lot of
> cases just be "crash". Summaries need to be manually adjusted once Socorro
> has been changed to not use them but the new field instead (also not that we
> can't count on the Bugzilla and Socorro changes to be made at the same time).

I wonder if there is really is much educational value in removing them.  And yes, there will be a substantial percentage of bugs that have little more than the word crash if signature is removed.  


(In reply to comment #21)
> The severity/keyword metadata isn't consistently applied.

on the contrary,  it is consistently applied except for a couple people who seem to consider it optional (~5% of crashes, of which js crashes are a substantial portion) and the "random quickly filed bug" (~3% of crashes). And it's in our specs.  I'm not suggesting it be a criteria for populating the sig field. But as long as we are touching all these bugs we may as well change all the sevs to critical.
:wsmwk mentioned that I'm one of the deadbeats that fail to update the severity field when filing crash bugs.  guilty as charged, and I'll try to do better at this, but I'll also offer some perspectives.

I know that I sometimes try to mine bugzilla for general statistics on crash bugs.  When I do this I generally get good results by just looking for "crash" or "top crash" in the keywords, so additional constraints of:

b) sev=critical, 
c) not have [notacrash] in whiteboard, 
d) have a properly formulated [@ ...]

aren't that important to me.   If they are to others, and/or others are mining bugzilla for stats on crash bugs they should speak up.

When gerv/bsmedberg mentioned that severity isn't adjusted beyond the default for a large pct. of bugs I think he might be referring to bug population in general.  Yeah, its nice to have uniform application of all the fields in bugzilla, but it really comes down to just needing the ability to get critical information back out of the system when we need it.  I don't know of any release management or other related queries that depend on examining the severity fields to help make informed decisions.   Occasionally, when trying to characterize a bug bug list I might look at something like "and X pct. of these bugs or more might be critial..."   but that's the extent of my use of the critical field.

probably the most often seen case for crash or topcrash in keywords and 
no "[@ signature" is where the crash is caught in debugger, yet we aren't able to get breakpad to generate a report.  That's a perfectly valid case where we won't have consistency applied.

I agree with the idea that we shouldn't over think, or over work, this too much.  Lets just add the new field, and start to add new signatures to it starting forward.  Lets make the searching for signatures apply to both the title and the new signature field.
OK: next step is over to dkl/glob to write a data copying script using the info above, to do the initial crashsig field population.

Gerv
Ok to summarize. We need in the following order:

1. Create a new text (large or small?) custom field called "Crash Signature" (cf_crash_signature) and enable for all or specific products.
2. Write a custom migration script to copy (not remove) the current text from the summary line into the new signature field using the following criteria:

a) crash or topcrash keyword, 
b) sev=critical, 
c) not have [notacrash] in whiteboard, 
d) have a properly formulated [@ ...]

Questions:

1. Should we remove crash/topcrash keywords after the migration to discourage their use?
2. Should we search on sev=critical or disregard that field as some have mentioned that it was not always used properly?
3. How to encourage people to use the new field without removal of the current text from the bug summaries?

dkl
Status: NEW → ASSIGNED
(In reply to comment #27)
> 3. How to encourage people to use the new field without removal of the
> current text from the bug summaries?

If we change Socorro to use this new field for matching crashes with bugs, people will use it. People already bend over backwards to match bug summaries to Socorro's scraping (c.f. bug 614966).
(In reply to comment #27)
> 1. Should we remove crash/topcrash keywords after the migration to
> discourage their use?

IMHO, no. Also, this is completely orthogonal to creating and using the new field.

> 2. Should we search on sev=critical or disregard that field as some have
> mentioned that it was not always used properly?

For the "copy from sig to new field" operation, it's best to disregard it and only use a+c+d here. Event though crash bugs _should_ have sev=critical in addition to the crash keyword (and ideally a signature), that additional filter shouldn't change the set to go through too much.

> 3. How to encourage people to use the new field without removal of the
> current text from the bug summaries?

I agree with comment #28 - once Socorro uses that field instead of (or in addition to, but I prefer "instead" for simplicity) summaries, we'll all be happy to use the new field.
agree with Kairo on all points in comment 29
justdave, I need the following custom field added to BMO please:

Name: cf_crash_signature
Description: Crash Signature
Type: Large Text Box
Bug Entry: Yes
Bug Mail: No
Mandatory: No

Everyone else, should this field be visible to all products or specific ones and if the latter, what would be the product list?

dkl
Do we want to add this field, and test the migration script, on stage first?

Gerv
(In reply to comment #32)
> Do we want to add this field, and test the migration script, on stage first?

Yes. I assumed that justdave would do that first but I should have been more specific. Please add to stage first and I will start to work on the migrate script we can test with. I also have a copy of the sanitized db locally that I can test with as well.

dkl
(In reply to comment #27)
> 2. Write a custom migration script to copy (not remove) the current text
> from the summary line into the new signature field using the following
> criteria:
> 
> a) crash or topcrash keyword, 

 ... or has "crash" in the summary

> c) not have [notacrash] in whiteboard, 
> d) have a properly formulated [@ ...]

No b), you should ignore severity.

> 1. Should we remove crash/topcrash keywords after the migration to
> discourage their use?

OMG no. "topcrash" in particular carries additional information. If you remove the crash keyword then lots of queries and charts will stop working (because unknown keywords is a fatal query error). Dunno how many active charts look for that keyword, but since you cannot edit chart queries (only add replacements) anyone looking at historical trends will suffer gaps and disjointed charts.

Use is inconsistent. There are 2300 open Fx bugs with crash keywords
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=prod:Firefox,core,toolkit%20!crash%2Ctopcrash

There are an addition 900 bugs that have "crash" in the summary but no crash keyword (and aren't "[notacrash]"):
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=prod:Firefox,core,toolkit%20sum%3Acrash%20-!crash%2Ctopcrash%20-sw%3A[notacrash

I think we just have to live with that.

> 3. How to encourage people to use the new field without removal of the
> current text from the bug summaries?

As Ted says, if it's used by crash-stats then people will use it.
(In reply to comment #31)
> Name: cf_crash_signature
> Description: Crash Signature

Will this be quicksearchable? (please!) If so can we get a short alias that will work in addition to cf_crash_signature ('sig','signature'?)? Since we'd eventually like signatures to move out of the summary can we have this field be part of the default quicksearch?

> Everyone else, should this field be visible to all products or specific ones
> and if the latter, what would be the product list?

In comment 7 Benjamin said 
> Products: All "client software" and "components", Mozilla Labs,
> mozilla.org, Tech Evangelism

Sounds good to me (not sure what in mozilla.org he wanted it for).

Define "visible"? Will everyone see a big ugly text field or will it be more like the summary and CC fields where you click a link to edit it?
(In reply to comment #34)
> I think we just have to live with that.

We surely will need to do some amount of manual adjustments when this all has happened. If we have a reasonably good algorithm for the automated filling of the field, that surely is good, but as always, the perfect is the enemy of the good here, esp. as we can manually adjust the edge cases later. Thanks for your look at summary vs. keyword - given that data, I agree that we should use those with "crash" in the summary as well.

(In reply to comment #35)
> can we have this field
> be part of the default quicksearch?

+1
Can we move forward with this?
Working on this now.

dkl
Blocks: 661396
This patch adds "sig" to the quicksearch map to allow use in quicksearch queries. Also it filters the new crash field to only show up in the following products:

"Add-on SDK"             
"Calendar"               
"Camino"                 
"Composer"               
"Fennec"                 
"Firefox"                
"Mozilla Localizations"  
"Mozilla Services"       
"Other Applications"     
"Penelope"               
"SeaMonkey"              
"Thunderbird"           
"Core"                   
"Directory"              
"JSS"                    
"MailNews Core"          
"NSPR"                   
"NSS"                    
"Plugins"                
"Rhino"                  
"Tamarin"                
"Testing"                
"Toolkit"                
"Mozilla Labs"          
"mozilla.org"            
"Tech Evangelism"

Thanks
dkl
Attachment #536757 - Flags: review?(glob)
Comment on attachment 536757 [details] [diff] [review]
Patch to add crash signature field support to BMO extension

Review of attachment 536757 [details] [diff] [review]:
-----------------------------------------------------------------

r=glob

we'll have to update addcustomfield.pl too as the current code only supports adding single-select fields.
Attachment #536757 - Flags: review?(glob) → review+
Code committed to support the new field:

Committing to: bzr+ssh://dlawrence%40mozilla.com@bzr.mozilla.org/bmo/4.0
modified extensions/BMO/Extension.pm
modified extensions/BMO/lib/Data.pm
added contrib/reorg-tools/migrate_crash_signatures.pl
Committed revision 7746.
Depends on: 661678
The field is available now for use. We will work with IT to schedule the migration of the old signatures for early this week.

dkl
Component: Bugzilla: Other b.m.o Issues → Administration
Product: mozilla.org → bugzilla.mozilla.org
QA Contact: general → administation
migration script has been ran on production. Please open a new bug if any issues are found.

dkl
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 664117
Blocks: 664124
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: