Closed Bug 25693 Opened 21 years ago Closed 20 years ago

request for a table for duplicates

Categories

(Bugzilla :: Bugzilla-General, enhancement, P3)

x86
Windows NT
enhancement

Tracking

()

VERIFIED FIXED
Bugzilla 2.12

People

(Reporter: cbegle, Assigned: Chris.Yeh)

References

Details

Attachments

(3 files)

It would be nice if we could query for bugs which had the most duplicates to
make it easy to generate the "Most Frequently Filed bugs".

I think a table that ahd the bug number and a tally for the # of bugs that are 
duplicates of that bug would be fine (ie not have to manage a list of bugs that 
are duplicates of the bug)... If we want to see details fo the duplicates, that 
info is stored in the comment.
Doing some kind of half-assed tables to create duplicates will only cause more
problems and confusion.

I completely agree that the way duplicates are handled by Bugzilla is woefully
inadequate, and that reports like this should at least be feasable.
Status: NEW → ASSIGNED
Summary: request for a table for duplicates → request for a table for duplicates
Bug 26053, "[RFE] Ability to limit search to bugs with DUPs", if implemented, 
may provide the basis for a simple, not quite perfect, but potentially
good enough solution for this problem.  If a field were added to each record
that simply counted how many bugs had been made a DUP of a given bug as
they come in, it would allow not only searching for bugs with DUPs, but,
with the addition of a column for the bug list ( "#DUP"?), ordering the list
by number of duplicates. 

There would be at least two inaccuracies. Sometimes bugs are marked as DUPs
incorrectly (mistaken resolution or typo), which would inflate the number
recorded over reality. Also, sometimes trees of DUPs form, with some of the
bugs marked as DUPs of the bug that will get fixed having DUPs themselves,
which would cause under-recording of the true number of DUPs in all.

But for the purposes of finding the most frequently reported bugs, these
inaccuracies may not matter greatly.

Any solution that would properly handle those cases would probably require
a complete revamp of the way DUPs are handled, and would probably be a real
pain to get to work for the existing records.
*** Bug 26053 has been marked as a duplicate of this bug. ***
I'm actually not scared of the work to handle existing records.  All I have to
do is write some code that searches for bugs that are closed with a resolution
of DUPLICATE, and find the last chunk of text in such bugs that says "*** This
bug has been marked as a duplicate of <something> ***".  Given that information,
it can go populate the database properly.

I'm just not sure how to represent this all in the database.  I suppose I could
just add two fields: one is the bug number that a given bug is a duplicate of,
and one is the count you propose of how many duplicates a bug has.  Given those,
I can do reasonable sanity checking and stuff, and aleviate the problems of
typos and bugs that are REOPENED.

Hmm.
You are right that it would be easier to start from the "bug that is a DUP"
end than the "bug that has DUPs" end, and doing so would provide more 
potential benefits. FWIW, bug 26053 started out as a proposal for a 
quick-n-dirty way of identifying the bugs that have DUPs, but I'd rather 
see a solution that addresses more problems than that.

How much additional work would it be to flatten those trees of DUPs for
the purposes of getting a better count? Could this count be done without 
actually changing which bug a leaf bug is a DUP of from a branch bug to the 
root bug? Or should those bugs be changed to match the bug that the branch 
bug is made a DUP as that is done? And should this be a new bug report?
I think it would be possible to maintain the count as being the size of the
total tree, not just the number of direct duplicates.

I don't see anything worthy of another bug report...
To get the other half you'd need to have a "at most x bugs" capability (no bug
report that I'm aware of), as well as the ability to specify the columns in the
URL (bug #12284).

Instead, if you want to order this within product categories, then you're
looking at general summary reports (bug #12282).
tara@tequilarista.org is the new owner of Bugzilla and Bonsai.  (For details,
see my posting in netscape.public.mozilla.webtools,
news://news.mozilla.org/38F5D90D.F40E8C1A%40geocast.com .)
Assignee: terry → tara
Status: ASSIGNED → NEW
is bug 38850 a dup or a blocker of this bug?
Bug 38850 is one possible solution to the problem discussed here. It recommends 
simply storing the bug number a bug is a dupe of, as a field.

Then, all the intelligence could be put in the script which walks the database 
to generate the most-frequent-bugs list. 

In outline, it would search the entire thing, keeping a running total:

Bug 34567   3
Bug 34590   6
...

Then, it would go through every bug on that list and, if it itself were a dupe, 
move the number, i.e. if 34590 were marked a dupe of 34567, the above would 
become:

Bug 34567   9
...

At the end of this procedure, you'd have the info you need - which bugs were 
most duped, and how many times, even indirectly. And it hardly requires any 
Bugzilla changes AFAICS.

Gerv
Tara: What are the chances of this happening any time soon? If none, I'll go 
away and do most-frequent-bugs another way...

Gerv
*** Bug 38850 has been marked as a duplicate of this bug. ***
I'm about to attach a zip of a duplicates.cgi (the most-frequent-bugs list) and 
a diff of changes to a reasonably recent CVS version which implements 
much-improved duplicate handling in Bugzilla. 

It would be very cool if this patch could be tested on Landfill or evaluated by 
other people, for as-soon-as-possible addition to bugzilla.mozilla.org (as it 
would make my life, as maintainer of most-frequent-bugs at the moment) a lot 
easier :-)

Gerv
Keywords: patch
Chris--can you get this on landfill, and at some point could you give me access
to landfill?

:)
Assignee: tara → cyeh

*** This bug has been marked as a duplicate of 38850 ***
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
And verifying bud as true duplicate
Status: RESOLVED → VERIFIED
Since when do you get to verify your own change?  I thought that was the point of 
verifying was to get a second opinion.  And I disagree with this being marked a 
duplicate of bug 38850.  And according to your comments on 38850, so do you, so 
I'm confused as to why you did this...
*** Bug 38850 has been marked as a duplicate of this bug. ***
Based on this statement from bug 38850:

>I believe the duplicate should be marked the other way round. Bug 25693 has
>more information than this bug. I'll try to mark 25693 as the initial bug and
>this one (38850) as the duplicate

And the fact that I agree with the reasoning behind this statement, I am changing 
it back to match, and making this the original, and 38850 the duplicate.  
Reopening.
Status: VERIFIED → REOPENED
Resolution: DUPLICATE → ---
If anyone disagrees, please discuss before switching it back.  As someone who 
stands a good chance of helping with development in making this change, I want 
this bug open because this is the one that has the most information in it.  If 
you have a valid reason to close this in favor of a different duplicate, speak 
up, and I might be convinced, but otherwise please leave this one open.

Thanks, and apologies to everyone else reading this for the spam.
Status: REOPENED → ASSIGNED
This is all very well, but is there any chance anyone is going to get around to 
evaluating and checking in my PATCH for this problem? Then we can close both 
bugs ;-)

Gerv
Yeah, I left that out, this bug has a patch on it, too, which is a definite plus 
for keeping this one open. :)

I believe the hold up at the moment is waiting for the configuration changes on 
landfill so some others of us besides Chris can apply patches to landfill.  
Otherwise we're waiting on Chris for anything that needs trial time on landfill.

(Chris?)
landfill has been both moved and reconfigured to allow for outside people 
access. working with Gerv on getting the patch in. It's a large enough change 
that we'll want to have people hammer it and give it some bake time.

trying to install your patch onto landfill, and i ran into two things:
1) $regenerateshadow = 1 should be $::regenerateshadow = 1;
2) collectstats.pl makes calls to dbmopen and dbmclose, which barfs the script 
with the following error:
>Can't locate NDBM_File.pm in @INC (@INC contains: 
/usr/local/lib/perl5/5.6.0/i686-linux /usr/local/lib/perl5/5.6.0 
/usr/local/lib/perl5/site_perl/5.6.0/i686-linux 
/usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl .) at (eval 
7) line 3.

Is this a perl module that I need to get from CPAN, or is my perl installation 
screwed up?

yes.
fixed it. a use DB_File; line was missing from the patch
okay, i got this working. I also had to add a use DB_File; to duplicates.cgi, 
and to fix up a typo in collectstats.pl where you wrote the file without a .db 
extension, yet were expecting to read in a file that had it. i added .db to the 
output file and it works.

you did test this before you gave me the patch, right? :)

it's up and running on landfill. please pound it and see if it behaves the way 
you expect it to. if you generate a lot of duplicates i'll have to re-run 
collectstats.pl by hand to get those to show up in a quick fashion.
> you did test this before you gave me the patch, right? :)

Er, yeah - but quite a while ago now...

I'll pound on it on landfill. Did the import work correctly?

Gerv
import? what am i importing where?
When you update (run checksetup.pl) it should have automatically picked up any 
dupes in the database. But there weren't any, I now discover ;-) So that bit 
will need testing in some way (I did test it, but only on a small scale.)

Gerv
How's it importing the existing duplicates?  If it's going by the "this bug has 
been marked a duplicate of xxxxx" in the text, you might want to ask for manual 
intervention if loops are detected (or are you already doing that? I haven't 
looked it over real good yet).  This bug right here is a real good example of 
one that'll trip it up, since it got closed as a duplicate of another bug that 
was already a dupe of this one, and then reversed again.
It detects the text on bugs currently resolved DUPLICATE, and adds the two bugs 
in a relationship in a hash - this means that only the last dupe marking counts, 
as earlier ones get overwritten. It doesn't need to do loop detection - as 
there's no "right answer" in that case, it just does "something". I remember 
checking the code against loops; I can't remember how it decides what to do, but 
it turns out it doesn't need to do anything special.

Gerv
*** Bug 38857 has been marked as a duplicate of this bug. ***
adding myself and endico to this bug.
Blocks: 6999
Checked in Gervase' patch.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago20 years ago
Resolution: --- → FIXED
Blocks: DupeLoop
In search of accurate queries....  (sorry for the spam)
Target Milestone: --- → Bugzilla 2.12
reopening... the duplicate scan failed miserably on bugzilla.mozilla.org's 
database when Dawn tried to do a test update on a copy of the database.  The 
regexp that tests for the duplicate string in the bug text is apparently not 
strict enough, and is catching invalid dupes.  Bug 26913 is an example of a bug 
that tripped it up.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
DBD::mysql::db do failed: You have an error in your SQL syntax near 't feel like 
generating more spam just to be picky, making VERIFIED.
', '26913')' at line 1 at ./checksetup.pl line 2048.
The attached patch worked on my database, and also worked correctly on b.m.o's 
data.
at least, as correctly as it could. The highest numbered bug in my test db
(a backup from this morning) is 70984. I searched for dups and dups of, and
found quite a lot.  


mysql> select * from duplicates where dupe_of > 70984;
+---------+-------+
| dupe_of | dupe  |
+---------+-------+
|   71024 | 71040 |
|   71042 | 71043 |
|   71024 | 71061 |
|   71024 | 71072 |
|   71082 | 71083 |  all of these except the last entry have both the 
|   71024 | 71084 |  dup and dup_of field bogus. i can understand one field
|   71096 | 71104 |  being bad, but both?
|   71109 | 71111 |
|   71024 | 71120 |
|   91026 | 21027 |  dave fixed this bug
+---------+-------+
10 rows in set (0.01 sec)

mysql> select * from duplicates where dupe > 70984;
+---------+-------+
| dupe_of | dupe  |
+---------+-------+
|   64100 | 70985 |
|   68336 | 70986 |
|   70773 | 70997 |
|   50758 | 71001 |
|   70756 | 71007 |
|   67574 | 71017 |
|   70773 | 71039 |
|   71024 | 71040 |
|   71042 | 71043 |
|   71024 | 71061 |
|   71024 | 71072 |
|   49141 | 71073 |
|   71082 | 71083 |
|   71024 | 71084 |
|   70361 | 71092 |
|   70057 | 71093 |
|   60151 | 71095 |
|   70924 | 71103 |
|   71096 | 71104 |
|   59655 | 71109 |
|   71109 | 71111 |
|   71024 | 71120 |
|   43847 | 71149 |
+---------+-------+
23 rows in set (0.03 sec)
those are all accurate.

select max(bug_id) from bugs

instead of 

select count(*) from bugs    :)
oops, never mind. I computed the max bug number incorrectly. There was only
one incorrect value and Dave just fixed the bug. Someone had marked a bug
a dup of an invalid bug number.
Since this worked on b.m.o and on mine, I'm thinking to just go ahead and check 
it in, but since Gerv knows this code better, I'll wait for his say-so.
Oh, come on, how was I supposed to know people would do pathologically evil 
things like bug 26913? ;-)

Looks good to me. Sorry I haven't got to this earlier; I've been, er, asleep :-)

Gerv
OK, it's checked in.
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
When printing bug summaries, the summary text should first quote html
meta-characters.

The summary of bug 39992 begins with <SELECT> and a little dropdown
box is displayed instead of the summary.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Jake's patch from bug 22041 fixed this bug too. marking fixed.
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Verified fixed. We have a table for duplicates now :-)
Status: RESOLVED → VERIFIED
Moving closed bugs to Bugzilla product
Component: Bugzilla → Bugzilla-General
Product: Webtools → Bugzilla
QA Contact: matty
Version: other → unspecified
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.