24957 - (dupesinsearch) include duplicates in search, but return only their originals (-> better search results, less duplicates)

Reporter

Description

•

26 years ago

It's sometimes very difficult to check the database for a paticular bug before filing a new bug so that a duplicate isn't created. Sometimes the text of the bug is too technical to match up with the search, but bugs marked as duplicates of that bug may contain the text searched on. A on RESOLVED and VERIFIED and DUPLICATE might find that text, but it will also return duplicates of fixed bugs and such. For these reasons, I think it'd be very beneficial to have a checkbox for including duplicates of the bugs in the search of summary and description fields. For example, searching for NEW or ASSIGNED or REOPENED bugs in the "bugzilla" component with description containing "duplicate", will select all those NEW or ASSIGNED or REOPENED bugs in the "bugzilla" component, and then search the description of those bugs as well as their duplicates for the text "duplicate". If the query matches against the duplicate, then the bug that it duplicates is returned. There may be a better way of achieving the same results, but something similar may help reduce the number of duplicates submitted.

Terry Weissman

Comment 1

•

26 years ago

Duplicate bugs have never been represented in the database very well.

Status: NEW → ASSIGNED

Terry Weissman

Comment 2

•

26 years ago

tara@tequilarista.org is the new owner of Bugzilla and Bonsai. (For details, see my posting in netscape.public.mozilla.webtools, news://news.mozilla.org/38F5D90D.F40E8C1A%40geocast.com .)

Assignee: terry → tara

Status: ASSIGNED → NEW

Matthew Tuck [:CodeMachine]

Comment 3

•

26 years ago

As far as I can see this should be possible by searching on all statuses with resolution of --- or DUPLICATE.

Chris Yeh

Comment 4

•

25 years ago

maybe what you want is to do someting clever like "if this bug is marked as a duplicate, save the summary off into a seperate searchable table" comment?

Matthew Tuck [:CodeMachine]

Updated

•

25 years ago

QA Contact: matty

Whiteboard: Future-Target

Stephan Niemz

Comment 5

•

25 years ago

moving to real milestones...

Target Milestone: --- → Future

Andreas Franke (gone)

Comment 6

•

24 years ago

-> Bugzilla product, Query component, reassigning.

Assignee: tara → endico

Component: Bugzilla → Query/Bug List

Product: Webtools → Bugzilla

Version: other → unspecified

Andreas Franke (gone)

Updated

•

24 years ago

Whiteboard: Future-Target → [relations:dupl]

Jesse Ruderman

Comment 7

•

24 years ago

See also bug 105295, query should show closed duplicate bugs that have the main/parent bug open.

Daniel Mota Leite

Comment 8

•

24 years ago

*** Bug 105295 has been marked as a duplicate of this bug. ***

Daniel Mota Leite

Comment 9

•

24 years ago

i'm the reporter of the 105295 dupe 8) i have searched for dupes but as i searched with the word "query" and "find", not with "search" i didnt find it... a good exemple why a solution for this is needed Thanks Jesse Ruderman for point this out below is the text of my submit: the sumary usually is limited to some words and there are many diferent ways for naming bugs one of the main reason that people post dupes is because they cant find the open bug because they are using the "wrong" words that arent in the sumary one option turn on by default would search the dupes bugs(all states?!) too, but instead of showing all dupes bugs, it would show up only dupes that have their main /parent bug is still open so if one bug have many definitions, dupes will popup, but stop after the main and the dupes have all the words combinations in their sumary if reporters cant find the main bug because of the "wrong" words, they will find it in the closed "open" dupes, and the dupe will point to the main open bug

Stefan Seifert

Comment 10

•

24 years ago

*** Bug 107982 has been marked as a duplicate of this bug. ***

Vadim Berezniker

Comment 11

•

24 years ago

NOTE: I'm not familiar with the bugzilla code nor the database structure it uses. If the user does not explicitely select to search DUPLICATEs or if we add a special checkbox that says something like "Find matches by resolving duplicate targets" and be on by default (Note: the text would have to be far less confusing then what I have =), then we do the following: (I'm assuming that bugzilla has a seperate relational table that relates bugs and their duplicates) We perform a JOIN between the bug number in the search results and the bug number in the duplicates table so that the result set has a column with a bug number of a duplicate. (I don't know how much of a performance hit the JOIN would be in mysql) With bugs that are duplicates, we display the target bugs in their place. (Obviously we'd need to filter out multiple bugs that point to the same bug and only display one in their place.)

Fergal Daly

Comment 12

•

24 years ago

Ironically I logged a dupe of this bug too. I think the problem is that Bugzilla equates bug reports with bugs and this is not the case. Bug reports should exist separately from bugs. For a given bug there may be many reports, so it's a one to many relationship. In fact it's a many to many relationship as a report may involve several different bugs, although this is usually because it's a badly written report, so I would say make it one to many and allow reports to be broken up if they need it. Having a one-one relationship leads to the whole DUPLICATE problem. DUPLICATE is not a valid resolution, the bug may or may not be resolved. You are actually using the resolution field to flag that this row has some sort of parent-child relationship with another row. This relationship should be made explicit in the schema, with a table for bugs and a table for reports and each report is linked to a bug and each bug is linked to 1 or more reports. Ideally, reports would come in and be assigned a report number, an expert can then assess each report and either create a new bug or simply attach the report to an existing bug (including a special type of bug called the "non-bug"). Reports should be mobile. If after some research it turns out that bug A is just bug B in disguise then you could drop bug A and assign all its reports to bug B. Not only do you have 1 less bug but you also have all the reported information available from a central point. Keyword searches are done by searching in the individual reports which then lead back to bugs. This eliminates DUPLICATE and now all the resolutions actually are resolutions. Under this scheme, bugs have a status, resolution, priority, owner etc. Reports have a reporter, a parent bug, platform, OS, build ID and by examining the linked reports, Bugzilla can figure out things like what platforms are effected by the bug. This allows you to record multiple build IDs and OS versions against a single bug (one in each report) it also allows the possibility of other types of reports like success reports (confirmation that the bug does not effect a given platform). By combining bug reports and success reports, a bug can know what platforms are and what aren't effected by the bug. Success reports were previously entered as comments, now you can use the drop downs. Yes, I know this is heaps of work and I'm not sure how (or even if) you can migrate from one scheme to the other, I'm just throwing in my .02 euro.

Jouni Heikniemi

Comment 13

•

24 years ago

*** Bug 151964 has been marked as a duplicate of this bug. ***

John Levon

Comment 14

•

23 years ago

Is there some sort of hack to fix this on bmo ? By a rough estimate, this is causing around 20% of the dupes. That translates into thousands of the UNCO bugs sitting there.

Alias: dupesinsearch

Adam Katz

Comment 15

•

23 years ago

duplicates should be searched for by default! (the proposed checkbox should start checked.) this would significantly decrease the number of dupes. think of a dup as a pointer or symlink; it should be treated as an aide in searching and indexing. ...isn't much other use for them ;^) perhaps we need some better way of identifying where they point to on buglists, like having the buglist link point to the comment of the dup'ed bug (which should probably mirror the dup's summary and comment count). ...and why is the cool <a ... title=bugname> link-hover feature on the buglist?

Antonis Christofides

Comment 16

•

23 years ago

I started to work today on a patch for this. The essential idea is: 1) Add an 'include duplicates' checkbox under the 'Status' field of the query form. 2) If it's checked, then add the following to the query: i) in the tables list, add: LEFT JOIN duplicates ON duplicates.dupe = bugs.bug_id LEFT JOIN bugs parentbug ON parent.bug_id = duplicates.dupe_of ii) in the where clause, change (bugs.bug_status IN (...)) to (bugs.bug_status IN (...) OR parentbug.bug_status IN (...)) So the meaning for the user is, I believe, very simple and straightforward: if they select NEW, ASSIGNED, and REOPENED and check "include duplicates", it means "search for bugs that are NEW, ASSIGNED, REOPENED, or duplicates of bugs that are NEW, ASSIGNED, or REOPENED".

Adam Katz

Comment 17

•

23 years ago

OS -> All was Linux

OS: Linux → All

Antonis Christofides

Comment 18

•

23 years ago

Attached patch Patch — Details — Splinter Review

Here it is. I have the following things to note about it: First, that's my first larger-than-a-line Bugzilla hack. I'm far from fully understanding Bugzilla::Search->init, Template::Toolkit, or from having a clear general picture of Bugzilla. So, although I believe the patch works, I understand that it may well be a bad hack. If it is, give me some clues so that I can improve it. Second, I can't test it adequately. On my test Bugzilla installation with six bugs :-), it seems to work alright. I don't know how such patches are tested in real conditions, but we may be confident that, if the new field is unchecked, Bugzilla will work OK; except that the joins and "where" expressions may be in a different order, the query will be exactly the same.

Antonis Christofides

Comment 19

•

23 years ago

Attached image Screenshot of the query page — Details

Gervase Markham [:gerv]

Comment 20

•

23 years ago

As far as I can tell from reading the description, this will only add immediate dupes to the search - dupes which are one or more levels removed won't get included. I have issues with the UI as well, and I think this will also negatively impact performance. You are also all thinking about the search page as a tool for bug filers and QA, and this is not its only use. So I'm not convinced that this is a good idea... Gerv

Michael Lefevre

Comment 21

•

23 years ago

I don't think database performance should be too much of a consideration, the goal should be making the database more effective (whatever that means). Reporters and QA people find it hard to find dupes because the interface sucks. My standard way of searching is to do a limited search for open bugs in the right component, and if that fails, then to search open and resolved bugs in all components using keywords in the summary. I'm doing the performance-hurting queries anyway because I know how. Reporters that don't know how will just fail to find anything (as they do now) and file dupes. but this seems a rather kludgy way of solving a specific instance of a general problem... we don't actually need to find the dupes, we just want to find the original bug based on summary/component info from the dupes. it would make sense for the database to facilitate that, so it's not necessary to trawl the whole database each time.

Adam Katz

Comment 22

•

23 years ago

screenshot says 'search for duplicates' ... as I see it, the main use of this will not be to find duplicates, but rather to find bugs those dups point to. that said, I think the text should be something like 'search duplicates' or 'include duplicates in search' I would also like to repeat my request that this be checked by default.

Antonis Christofides

Comment 23

•

23 years ago

Comment 20: That's right, it does not search duplicates recursively. Not only would this be difficult to implement and bad for performance, but, I believe, unnecessary as well. You have bug A with 15 duplicates, and one day it is discovered that A is a duplicate of B. I have a feeling that this happens late in the lifetime of B, when filing duplicates (which is what we are trying to avoid here) is not so important. But it might be good practice to change the 15 dups so that they are dups of B instead of A. (I have no opinion on the more general remarks.) Comment 22: The default for bugzilla.mozilla.org is the administrator's decision; the default query is specified in the bugzilla operating parameters page. What _is_ the developers' decision is the default for new Bugzilla installations, which I believe should be off; that's why I did not change defparams.pl.

Gervase Markham [:gerv]

Comment 24

•

23 years ago

> I don't think database performance should be too much of a consideration Believe me, in the real world, database performance is very much a consideration. Or are you happy with the speed of bugzilla.mozilla.org? :-| Gerv

Fergal Daly

Comment 25

•

23 years ago

The problem is the structure of the database. Right now buzilla thinks an bug and bug report are the same thing. They're not. Some reports are not bugs at all and some bugs have many reports. If there was a table for bugs and table for reports and a 1 to many relationship, that'd remove the need for recursive queries. A new report comes in, someone who knows what they're doing attaches it to an existing bug or creates a new one for it. Obviously dups will still occur by accident but now you can just transfer all the reports from the dup to the original bug and close the dup. It's a big change but it seems to me that the dup issue is getting bigger all the time and distinguishing between bugs and reports of bugs corresponds better to the reality that bugs and bug reports are 2 very different things. For even more about this see comment number 12 earlier.

Vincent Deconinck

Comment 26

•

23 years ago

You're right, the two concepts should be separated. By the way, this would also solve bug 121805 ... But I guess migrating existing bugs into that new structure would be a little bit tricky.

Fergal Daly

Comment 27

•

23 years ago

Re #26. I think the migration would require careful work but is kinda straight forward. As things stand every (or almost every) field which is filled in when creating a new bug is really a bug report field - OS, version, description of symptoms, how to reproduce etc. All other fields belong to the bug, like status, votes, assigned, resolution. For migratation: 1: Go through all the non-dup bugs and add a row in the BUGS table and the REPORTS table, filling the fields in the new tables with data from the old table. Link the row in the REPORTS table to the row in the BUGS table. 2: Go through all dup bugs and add a row in the REPORTS table for each one. Do not create an entry in the BUGS table. Find the master bug" that this is a duplicate of and link the report to the master bug in the BUGS table. When converting dups, you will have to throw away some information, basically anything that doesn't fit into the bug_reports table but if the fields have been chosen well then nothing important will be lost. For some info, like votes, it can be added to the master bug. This scheme also allows for new kinds of reports like success reports, analysis, extra symptoms etc, solving #121805 also.

Antonis Christofides

Comment 28

•

23 years ago

Strictly speaking, the discussion about separating bugs from bug reports is off-topic. It is true that this bug, and bug 121805, and bug 145588, and the idea about separating reports from bugs, are four different attempts to solve the same problem: "Do something to reduce duplicates." I think I'll start a discussion on netscape.public.mozilla.webtools about this. I briefly searched for such a discussion and found http://groups.google.com/groups?hl=el&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=aa4c53a552207cba&seekm=an26vi%24dl0%241%40news5.svr.pol.co.uk#link1, which the people new to this bug may want to read. Meanwhile, I think it would be nice if we focused this discussion on including duplicates in search. Most of the on-topic discussion has been skepticism on performance impact. The thing is that it is impossible to know the performance impact just by looking at the query. Even if someone can tell that turning on the duplicates option will take three times more machine time, what does this tell us? That machine response time would become nine times longer? For all we know, the option might help people find what they're searching for with fewer queries, thus actually improving performance. So, if we agree on the user interface, if we agree that the feature may be worth, and our only problem is performance, let's put it there, have it off by default, mark it experimental, discourage its use, and try it. See how much longer it takes, how much better the returned results are, and make some better-informed guesses as to its implications.

Fergal Daly

Comment 29

•

23 years ago

Re #28 - off-topicness. I disagree. Separating bugs and bug reports allows you to eliminate the whole concept of duplicates. With no dups, there's no need to include them in searches, The triagers now deal with reports. When the reports is for a previously unknown problem they create a new bug which will have the report attached. When someone reports an already known problem, the triager just attaches the report to the existing bug. This second action is what used to lead to a dup but now just leads to an extra report attached to the bug. The only problem is if 2 existing bugs A and B later turn out to be different aspects of a single bug or if a triager creates bug B from a report when they should have attached it bug A. Then we need to merge A and B together. The mergeing process would attach all the reports from bug B to bug A and delete bug B. Maybe some other data needs to be merged also, like votes and CCs. Also whoever is merging may have to choose a status and who to assign it to. This mergeing process should be fairly rare if bug reports are assigned correctly in the first place. Keyword searches are done on fields in the bug reports and since all the reports about bug A are attached to bug A there's no need to worry about dups. The performance should be no worse than before and now no one needs to even think about dups when doing their query.

Brant Gurganus

Updated

•

23 years ago

Attachment #108780 - Flags: review?

Myk Melez [:myk] [@mykmelez]

Comment 30

•

23 years ago

Re: searching duplicates recursively, that's what bug 68611 would fix (by storing the end of the duplicate chain in addition to the next dupe in the chain), at which point this fix to switch over to using that "end of chain ID" instead of the "dupe_of" ID. See also bug 204209 about simplifying the duplicates schema, since the current schema is unnecessarily complex. Re: performance, performance matters. This feature needs to be disablable for installations that can't afford it (f.e. b.m.o at the moment; I hope that b.m.o becomes an installation that can afford it, but that depends on us getting new machines, how fast they are, and how much our traffic grows). Re: the UI, I haven't had a chance to look at it closely yet, but in general I agree with Michael that the goal of this fix should be to find the original bugs to which the duplicates refer, not the duplicates themselves, and the UI should reflect that. Ideally the user shouldn't have to select an option at all, Bugzilla should just do the duplicates search and return the matching original bugs, but we would have to be careful to limit this to a known set of fields for which it doesn't matter if we find duplicates (i.e. if someone searches for open bugs assigned to a particular person, we don't want to find open bugs assigned to someone else but which have duplicates assigned to the person being searched for). Perhaps this means that we should limit this feature to searches where it's clear that the user is looking for a specific bug via non-specific criteria (i.e. key words) ala bug 145588. Re: the implementation, again, I haven't looked closely yet, but MySQL 4.0 provides UNIONs, which would probably make this much easier (especially for returning originals instead of duplicates) and more performant. We will probably start requiring MySQL 4.0+ soon after Bugzilla 2.18 ships (bug 204217), which is likely to happen in the next few months, so it's worthwhile looking into doing this with UNION, i.e.: (SELECT <columns> FROM bugs WHERE <conditions>) UNION (SELECT <columns> FROM bugs INNER JOIN duplicates ON bugs.bug_id = duplicates.dupe INNER JOIN bugs AS originals ON duplicates.dupe_of = originals.bug_id WHERE <conditions>) ORDER BY <order columns>;

Vlad Dascalu

Updated

•

22 years ago

Hardware: PC → All

Jouni Heikniemi

Comment 31

•

22 years ago

Comment on attachment 108780 [details] [diff] [review] Patch Bitrotten, although I'm surprised how little.

Attachment #108780 - Flags: review? → review-

Dave Miller [:justdave]

Updated

•

22 years ago

Assignee: endico → nobody

Myk Melez [:myk] [@mykmelez]

Updated

•

22 years ago

Depends on: 204217

Myk Melez [:myk] [@mykmelez]

Comment 32

•

22 years ago

Attached patch work in progress — Details — Splinter Review

Here's a work in progress that returns duplicates for fulltext searches, attempting to aggregate the relevances of the duplicate bugs so that bugs with more duplicates show up higher in the list. A test of this functionality is available on b.m.o by running a fulltext search and then changing "buglist.cgi" in the URL to "buglist-bm.cgi".

Aaron Peterson

Comment 33

•

22 years ago

i submitted a duplicate of this bug :O Basically, if we did a search of the duplicates, and then ranked the bugs that the duplicates pointed to by the number of matched duplicates to the user querry, We'd have a nice system.. another way of dealing with this would be to take the summaries of duplicate bugs, and attach their extra words to the summary of the origional bug. I don't know which one would be easier. But I'm overwhelmed by the numbers of duplicates and i'm only an end user!

Dave Miller [:justdave]

Comment 34

•

22 years ago

*** Bug 249372 has been marked as a duplicate of this bug. ***

Dan Mellem

Comment 35

•

21 years ago

This bug seems to be the closest match for my suggestion so I'll start here. I think the easiest way to include dups is to add a checkbox to the Quicksearch: Enter a bug # or some search terms: __________________ [ Show ] [Help] [ ] Include RESOLVED, VERIFIED, and CLOSED bugs. This doesn't resolve the whole issue of tracking and backreferencing dups in a search, but it'd be an easy fix since it'd include all bugs (including WONTFIX, INVALID, DUPLICATE, etc.).

Frédéric Buclin

Comment 36

•

21 years ago

*** Bug 216360 has been marked as a duplicate of this bug. ***

Olav Vitters

Updated

•

20 years ago

QA Contact: mattyt-bugzilla → default-qa

victory <never@receive.bug.mails.i.hate.spammer>

Updated

•

19 years ago

Target Milestone: Future → ---

Serge Gautherie (:sgautherie)

Updated

•

17 years ago

Assignee: nobody → query-and-buglist

Priority: P3 → --

Thomas D. (:thomas8)

Comment 39

•

15 years ago

The benefit of this long-standing bug is immediately obvious: -> easier to find just the right bug as we exploit the potential of duplicate bugs' data to find their originals (while not cluttering results with duplicates) -> less duplicates will be filed Therefore, I'd suggest changing the following flags: Priority: P1 or P2 Target Milestone: something as near as possible

Summary: include duplicates in search → include duplicates in search, but return only their originals (-> better search results, less duplicates)

Patch 23 years ago Antonis Christofides 4.99 KB, patch	jouni : review-	Details \| Diff \| Splinter Review
Screenshot of the query page 23 years ago Antonis Christofides 13.67 KB, image/jpeg		Details
work in progress 22 years ago Myk Melez [:myk] [@mykmelez] 2.05 KB, patch		Details \| Diff \| Splinter Review