Open Bug 280786 Opened 20 years ago Updated 9 years ago

bayesian analysis for auto-categorization of bugs

Categories

(Bugzilla :: Creating/Changing Bugs, enhancement)

enhancement
Not set
normal

Tracking

()

People

(Reporter: asa, Unassigned)

Details

The easiest wins might be for dupe finding and invalid bug recognition (see bug
280750 for an example of a probably easy match) but it might also work well for
product and component adjustments or suggestions. 

Folks at Sun attached a patch for Mozilla Mail auto-categorization (not just the
binary "is junk" "is not junk" that our current filters have) at bug 168905
which may have some useful algorithms.
There are actually perl modules in CPAN that will do this without too much trouble.

See AI::Categorizer (not AI::Categorize, which is an older version)
and  AI::NaiveBayes1
See Bug 22353 "Duplicate bug detection."

Bayesian analysis for classification can do many things for us. 

What are we trying to achieve:
1. Picking 'best' bugs for Developers's attention?
   Identifying Layout bugs out of all the others?
   ...

2. Identifying Dups
   Identifying Garbage
   ...

3. Making the use of Bugzilla more pleasant for infrequent users
   Making the Bugzilla easier for infrequent users
   ...

4. Deterring non-developers from submitting bugs
   Deterring non-developers from making empty comments
   ...

5. Pointing Reporters of non-bugs to resources that will help them

Personally, (putting myself in the position of 4. above) I think that 
the most useful thing would be an end-to-end assessment of how likely 
the information that I have put together is to lead to an improvement
to Firefox. This would be the sum of the Bug's being resolved FIXED
and improvements to documentation, the website, help system,
internationalisation and so forth. This could perhaps only be done
if each Bug was rated for its value when CLOSED or RESOLVED. The drawback
of doing this is thta we may end up informing Reporters that their
bugs rated 0 out of 5, which is not the way to make friends and
influence people.

The biggest win for Bugzilla is to get Reporters, no matter how awful
their first attempts, interested and enthusiastic in identifying
problems with Firefox and fixing them.

I think that the next most beneficial use is the 1. above, to wit
picking out the (few) reports onto which our scarce resources of
time and attention should be directed.

Merely identifying bugs of low value or actual garbage (2 above) is
probably of less benefit as such bugs may sit in Bugzilla as
UNCONFIRMed, not garnering much attention, nor causing much harm.
Since there seems to be a thing about the early detection of Dups,
lets go for it. As you say: It is an easy win, and we need easy
wins.

You probably have your own ideas some of which I suspect overlap
with mine, and some that I have not yet thought of.   
Assignee: gerv → charting
Assignee: charting → create-and-change
Component: Reporting/Charting → Creating/Changing Bugs
Hardware: x86 → All
You need to log in before you can comment on or make changes to this bug.