Closed Bug 25693 Opened 25 years ago Closed 24 years ago

request for a table for duplicates

Categories

(Bugzilla :: Bugzilla-General, enhancement, P3)

x86
Windows NT
enhancement

Tracking

()

VERIFIED FIXED
Bugzilla 2.12

People

(Reporter: cbegle, Assigned: Chris.Yeh)

References

Details

Attachments

(3 files)

It would be nice if we could query for bugs which had the most duplicates to make it easy to generate the "Most Frequently Filed bugs". I think a table that ahd the bug number and a tally for the # of bugs that are duplicates of that bug would be fine (ie not have to manage a list of bugs that are duplicates of the bug)... If we want to see details fo the duplicates, that info is stored in the comment.
Doing some kind of half-assed tables to create duplicates will only cause more problems and confusion. I completely agree that the way duplicates are handled by Bugzilla is woefully inadequate, and that reports like this should at least be feasable.
Status: NEW → ASSIGNED
Summary: request for a table for duplicates → request for a table for duplicates
Bug 26053, "[RFE] Ability to limit search to bugs with DUPs", if implemented, may provide the basis for a simple, not quite perfect, but potentially good enough solution for this problem. If a field were added to each record that simply counted how many bugs had been made a DUP of a given bug as they come in, it would allow not only searching for bugs with DUPs, but, with the addition of a column for the bug list ( "#DUP"?), ordering the list by number of duplicates. There would be at least two inaccuracies. Sometimes bugs are marked as DUPs incorrectly (mistaken resolution or typo), which would inflate the number recorded over reality. Also, sometimes trees of DUPs form, with some of the bugs marked as DUPs of the bug that will get fixed having DUPs themselves, which would cause under-recording of the true number of DUPs in all. But for the purposes of finding the most frequently reported bugs, these inaccuracies may not matter greatly. Any solution that would properly handle those cases would probably require a complete revamp of the way DUPs are handled, and would probably be a real pain to get to work for the existing records.
*** Bug 26053 has been marked as a duplicate of this bug. ***
I'm actually not scared of the work to handle existing records. All I have to do is write some code that searches for bugs that are closed with a resolution of DUPLICATE, and find the last chunk of text in such bugs that says "*** This bug has been marked as a duplicate of <something> ***". Given that information, it can go populate the database properly. I'm just not sure how to represent this all in the database. I suppose I could just add two fields: one is the bug number that a given bug is a duplicate of, and one is the count you propose of how many duplicates a bug has. Given those, I can do reasonable sanity checking and stuff, and aleviate the problems of typos and bugs that are REOPENED. Hmm.
You are right that it would be easier to start from the "bug that is a DUP" end than the "bug that has DUPs" end, and doing so would provide more potential benefits. FWIW, bug 26053 started out as a proposal for a quick-n-dirty way of identifying the bugs that have DUPs, but I'd rather see a solution that addresses more problems than that. How much additional work would it be to flatten those trees of DUPs for the purposes of getting a better count? Could this count be done without actually changing which bug a leaf bug is a DUP of from a branch bug to the root bug? Or should those bugs be changed to match the bug that the branch bug is made a DUP as that is done? And should this be a new bug report?
I think it would be possible to maintain the count as being the size of the total tree, not just the number of direct duplicates. I don't see anything worthy of another bug report...
To get the other half you'd need to have a "at most x bugs" capability (no bug report that I'm aware of), as well as the ability to specify the columns in the URL (bug #12284). Instead, if you want to order this within product categories, then you're looking at general summary reports (bug #12282).
tara@tequilarista.org is the new owner of Bugzilla and Bonsai. (For details, see my posting in netscape.public.mozilla.webtools, news://news.mozilla.org/38F5D90D.F40E8C1A%40geocast.com .)
Assignee: terry → tara
Status: ASSIGNED → NEW
is bug 38850 a dup or a blocker of this bug?
Bug 38850 is one possible solution to the problem discussed here. It recommends simply storing the bug number a bug is a dupe of, as a field. Then, all the intelligence could be put in the script which walks the database to generate the most-frequent-bugs list. In outline, it would search the entire thing, keeping a running total: Bug 34567 3 Bug 34590 6 ... Then, it would go through every bug on that list and, if it itself were a dupe, move the number, i.e. if 34590 were marked a dupe of 34567, the above would become: Bug 34567 9 ... At the end of this procedure, you'd have the info you need - which bugs were most duped, and how many times, even indirectly. And it hardly requires any Bugzilla changes AFAICS. Gerv
Tara: What are the chances of this happening any time soon? If none, I'll go away and do most-frequent-bugs another way... Gerv
*** Bug 38850 has been marked as a duplicate of this bug. ***
I'm about to attach a zip of a duplicates.cgi (the most-frequent-bugs list) and a diff of changes to a reasonably recent CVS version which implements much-improved duplicate handling in Bugzilla. It would be very cool if this patch could be tested on Landfill or evaluated by other people, for as-soon-as-possible addition to bugzilla.mozilla.org (as it would make my life, as maintainer of most-frequent-bugs at the moment) a lot easier :-) Gerv
Keywords: patch
Attached file ZIP of patch
Chris--can you get this on landfill, and at some point could you give me access to landfill? :)
Assignee: tara → cyeh
*** This bug has been marked as a duplicate of 38850 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
And verifying bud as true duplicate
Status: RESOLVED → VERIFIED
Since when do you get to verify your own change? I thought that was the point of verifying was to get a second opinion. And I disagree with this being marked a duplicate of bug 38850. And according to your comments on 38850, so do you, so I'm confused as to why you did this...
*** Bug 38850 has been marked as a duplicate of this bug. ***
Based on this statement from bug 38850: >I believe the duplicate should be marked the other way round. Bug 25693 has >more information than this bug. I'll try to mark 25693 as the initial bug and >this one (38850) as the duplicate And the fact that I agree with the reasoning behind this statement, I am changing it back to match, and making this the original, and 38850 the duplicate. Reopening.
Status: VERIFIED → REOPENED
Resolution: DUPLICATE → ---
If anyone disagrees, please discuss before switching it back. As someone who stands a good chance of helping with development in making this change, I want this bug open because this is the one that has the most information in it. If you have a valid reason to close this in favor of a different duplicate, speak up, and I might be convinced, but otherwise please leave this one open. Thanks, and apologies to everyone else reading this for the spam.
Status: REOPENED → ASSIGNED
This is all very well, but is there any chance anyone is going to get around to evaluating and checking in my PATCH for this problem? Then we can close both bugs ;-) Gerv
Yeah, I left that out, this bug has a patch on it, too, which is a definite plus for keeping this one open. :) I believe the hold up at the moment is waiting for the configuration changes on landfill so some others of us besides Chris can apply patches to landfill. Otherwise we're waiting on Chris for anything that needs trial time on landfill. (Chris?)
landfill has been both moved and reconfigured to allow for outside people access. working with Gerv on getting the patch in. It's a large enough change that we'll want to have people hammer it and give it some bake time.
trying to install your patch onto landfill, and i ran into two things: 1) $regenerateshadow = 1 should be $::regenerateshadow = 1; 2) collectstats.pl makes calls to dbmopen and dbmclose, which barfs the script with the following error: >Can't locate NDBM_File.pm in @INC (@INC contains: /usr/local/lib/perl5/5.6.0/i686-linux /usr/local/lib/perl5/5.6.0 /usr/local/lib/perl5/site_perl/5.6.0/i686-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl .) at (eval 7) line 3. Is this a perl module that I need to get from CPAN, or is my perl installation screwed up?
yes.
fixed it. a use DB_File; line was missing from the patch
okay, i got this working. I also had to add a use DB_File; to duplicates.cgi, and to fix up a typo in collectstats.pl where you wrote the file without a .db extension, yet were expecting to read in a file that had it. i added .db to the output file and it works. you did test this before you gave me the patch, right? :) it's up and running on landfill. please pound it and see if it behaves the way you expect it to. if you generate a lot of duplicates i'll have to re-run collectstats.pl by hand to get those to show up in a quick fashion.
> you did test this before you gave me the patch, right? :) Er, yeah - but quite a while ago now... I'll pound on it on landfill. Did the import work correctly? Gerv
import? what am i importing where?
When you update (run checksetup.pl) it should have automatically picked up any dupes in the database. But there weren't any, I now discover ;-) So that bit will need testing in some way (I did test it, but only on a small scale.) Gerv
How's it importing the existing duplicates? If it's going by the "this bug has been marked a duplicate of xxxxx" in the text, you might want to ask for manual intervention if loops are detected (or are you already doing that? I haven't looked it over real good yet). This bug right here is a real good example of one that'll trip it up, since it got closed as a duplicate of another bug that was already a dupe of this one, and then reversed again.
It detects the text on bugs currently resolved DUPLICATE, and adds the two bugs in a relationship in a hash - this means that only the last dupe marking counts, as earlier ones get overwritten. It doesn't need to do loop detection - as there's no "right answer" in that case, it just does "something". I remember checking the code against loops; I can't remember how it decides what to do, but it turns out it doesn't need to do anything special. Gerv
*** Bug 38857 has been marked as a duplicate of this bug. ***
adding myself and endico to this bug.
Blocks: 6999
Checked in Gervase' patch.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Blocks: DupeLoop
In search of accurate queries.... (sorry for the spam)
Target Milestone: --- → Bugzilla 2.12
reopening... the duplicate scan failed miserably on bugzilla.mozilla.org's database when Dawn tried to do a test update on a copy of the database. The regexp that tests for the duplicate string in the bug text is apparently not strict enough, and is catching invalid dupes. Bug 26913 is an example of a bug that tripped it up.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
DBD::mysql::db do failed: You have an error in your SQL syntax near 't feel like generating more spam just to be picky, making VERIFIED. ', '26913')' at line 1 at ./checksetup.pl line 2048.
The attached patch worked on my database, and also worked correctly on b.m.o's data.
at least, as correctly as it could. The highest numbered bug in my test db (a backup from this morning) is 70984. I searched for dups and dups of, and found quite a lot. mysql> select * from duplicates where dupe_of > 70984; +---------+-------+ | dupe_of | dupe | +---------+-------+ | 71024 | 71040 | | 71042 | 71043 | | 71024 | 71061 | | 71024 | 71072 | | 71082 | 71083 | all of these except the last entry have both the | 71024 | 71084 | dup and dup_of field bogus. i can understand one field | 71096 | 71104 | being bad, but both? | 71109 | 71111 | | 71024 | 71120 | | 91026 | 21027 | dave fixed this bug +---------+-------+ 10 rows in set (0.01 sec) mysql> select * from duplicates where dupe > 70984; +---------+-------+ | dupe_of | dupe | +---------+-------+ | 64100 | 70985 | | 68336 | 70986 | | 70773 | 70997 | | 50758 | 71001 | | 70756 | 71007 | | 67574 | 71017 | | 70773 | 71039 | | 71024 | 71040 | | 71042 | 71043 | | 71024 | 71061 | | 71024 | 71072 | | 49141 | 71073 | | 71082 | 71083 | | 71024 | 71084 | | 70361 | 71092 | | 70057 | 71093 | | 60151 | 71095 | | 70924 | 71103 | | 71096 | 71104 | | 59655 | 71109 | | 71109 | 71111 | | 71024 | 71120 | | 43847 | 71149 | +---------+-------+ 23 rows in set (0.03 sec)
those are all accurate. select max(bug_id) from bugs instead of select count(*) from bugs :)
oops, never mind. I computed the max bug number incorrectly. There was only one incorrect value and Dave just fixed the bug. Someone had marked a bug a dup of an invalid bug number.
Since this worked on b.m.o and on mine, I'm thinking to just go ahead and check it in, but since Gerv knows this code better, I'll wait for his say-so.
Oh, come on, how was I supposed to know people would do pathologically evil things like bug 26913? ;-) Looks good to me. Sorry I haven't got to this earlier; I've been, er, asleep :-) Gerv
OK, it's checked in.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
When printing bug summaries, the summary text should first quote html meta-characters. The summary of bug 39992 begins with <SELECT> and a little dropdown box is displayed instead of the summary.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Jake's patch from bug 22041 fixed this bug too. marking fixed.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Verified fixed. We have a table for duplicates now :-)
Status: RESOLVED → VERIFIED
Moving closed bugs to Bugzilla product
Component: Bugzilla → Bugzilla-General
Product: Webtools → Bugzilla
QA Contact: matty
Version: other → unspecified
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: