Open Bug 204209 Opened 21 years ago Updated 11 years ago

Duplicates should be stored in main bugs table

Categories

(Bugzilla :: Database, enhancement, P4)

2.19.3
enhancement

Tracking

()

REOPENED

People

(Reporter: myk, Unassigned)

References

(Blocks 1 open bug)

Details

Duplicates are currently stored in a separate "duplicates" table.  This would be
necessary if duplicates and their original bugs had a many to many relationship,
but they don't--each bug can only be a duplicate of one other bug--so a separate
table is unnecessary complexity.  Instead, we should be storing the IDs of
original bugs in a "dupe_of" or "original_id" column in the bugs table and drop
the duplicates table altogether.  Then, to get duplicates of a bug we can say:

SELECT * FROM bugs WHERE original_id = ###

and to get a list of bugs and their duplicates we can say:

SELECT bugs.bug_id, duplicates.bug_id FROM bugs JOIN bugs AS duplicates WHERE
bugs.bug_id = duplicates.original_id
how would this affect bug 68611?
From what I know of 68611, it would mean that the "end of the dupe chain" field
would be added to the bugs table instead of the duplicates table.  Otherwise
nothing would change.
The reason we didn't do this, I seem to remember, was that most bugs aren't
duplicates, and so the field will be null. In other words, the dupe in every
dupe relationship is just one bug, but not every bug is in a dupe relationship.
Also, the bugs table is pretty big already.

Gerv
NULLs are often 'free', if they're at the end of the column list.

I'm mainly concerned about things getting out of sync.
Even when not free, indexed integer fields are small and cheap.  How would this
cause duplicates to get more out of sync, or are you commenting on bug 68611?
oh, yeah, sorry, I got confused with bug 68611
Just vaguely remembering reading a comment on another bug or in the perl code 
about the bugs table already having the allowed maximum of 16 indexed fields...

would that throw a spanner in the works here?
I believe that was only true of old MySQL versions, which we no longer support.
But I could be wrong.

Gerv
Reassigning bugs that I'm not actively working on to the default component owner
in order to try to make some sanity out of my personal buglist.  This doesn't
mean the bug isn't being dealt with, just that I'm not the one doing it.  If you
are dealing with this bug, please assign it to yourself.
Assignee: justdave → general
QA Contact: mattyt-bugzilla → default-qa
So what do we do with this bug?? My vote to add a dupe_of field in the 'bugs'
table and drop the 'duplicates' table.
Yes, that's what we should do.
I thought that Gerv specifically wrote the duplicates table for the
duplicates.cgi page. The nice thing about the duplicates table is that it's
fixed-width, so searches on it are *really* fast.
Presumably duplicates searches on the bugs table are just as fast if they only
touch indexes.
OK, let's put them in the bugs table, then. :-) I agree. :-)
Severity: normal → enhancement
Summary: duplicates should be stored in main bugs table → Duplicates should be stored in main bugs table
Target Milestone: --- → Bugzilla 2.22
Version: unspecified → 2.19.3
Over on bug 281354 comment 10, Shane H. W. Travis was saying (about a sometimes-
NULL column to be added to a table):

> For DB normalization, though, you don't want this as another column on the 
> [main] table; you want a third table joining the two, with exactly two 
> columns; [main]_id (unique) and [linked]_id. This allows a one-to-n 
> relationship between the [main]_id and the [linked]_id, without ever storing 
> either duplicate information [...] or nulls (i.e. rows in 
> the [main] table with no corresponding comment).

I didn't understand why NULLs were to be avoided in that situation, but if 
there is a valid reason, I believe that would apply here also, and this bug is 
proposing to do a change in exactly the opposite direction.
(In reply to comment #15)
> Over on bug 281354 comment 10, Shane H. W. Travis was saying (about a
> sometimes-NULL column to be added to a table):
> 
> > For DB normalization, though, you don't want this as another column on the 
> > [main] table; you want a third table joining the two...

I don't know what Shane was refering to, but making this a column in the bugs
table conforms to the rules of normalization, as it is independent of all other
non-key columns in the table and fully dependent on the bug_id key column.

Perhaps Shane was talking about a sparse column, which it is sometimes useful to
store in a separate table for improved query performance.  In this case, though,
duplicates aren't very dense, but they aren't particularly sparse either, and
because they are represented by a single indexed integer column, we should be
able to achieve plenty good query performance in our queries on them.  We might
even get better performance in queries that will no longer require a join.
What's the relationship between bug 176002 and this bug? Has 176002 actually
been completed and this bug is about changing the table in the DB? 
There's no relationship between the two bugs.  Bug 176002 is about storing
historical data on duplicates in the database instead of the filesystem (where
we store it today).  This bug is about changing the way we store current data
about duplicates in the database from a separate table to a column in the bugs
table.
The trunk is now frozen to prepare Bugzilla 2.22. Enhancement bugs are retargetted to 2.24.
Target Milestone: Bugzilla 2.22 → Bugzilla 2.24
What to do with bug 68611 when this one gets fixed?
(In reply to comment #20)
> What to do with bug 68611 when this one gets fixed?

  That can still be stored in the bugs table -- it's still a one-to-one relationship.
This bug is retargetted to Bugzilla 3.2 for one of the following reasons:

- it has no assignee (except the default one)
- we don't expect someone to fix it in the next two weeks (i.e. before we freeze the trunk to prepare Bugzilla 3.0 RC1)
- it's not a blocker

If you are working on this bug and you think you will be able to submit a patch in the next two weeks, retarget this bug to 3.0.

If this bug is something you would like to see implemented in 3.0 but you are not a developer or you don't think you will be able to fix this bug yourself in the next two weeks, please *do not* retarget this bug.

If you think this bug should absolutely be fixed before we release 3.0, either ask on IRC or use the "blocking3.0 flag".
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2
Blocks: 363867
Bugzilla 3.2 is now frozen. Only enhancements blocking 3.2 or specifically approved for 3.2 may be checked in to the 3.2 branch. If you would like to nominate your enhancement for Bugzilla 3.2, set the "blocking3.2" flag to "?". Then, either the target milestone will be changed back, or the blocking3.2 flag will be granted, if we will accept this enhancement for Bugzilla 3.2.

This particular bug has not been touched in over eight months, and thus is being retargeted to "---" instead of "Bugzilla 4.0". If you believe this is a mistake, feel free to retarget it to Bugzilla 4.0.
Target Milestone: Bugzilla 3.2 → ---
Assignee: general → LpSolit
Component: Bugzilla-General → Database
Target Milestone: --- → Bugzilla 4.0
I have actually decided to WONTFIX this bug. I do a lot of things, particularly in duplicates.cgi, where I really need the duplicates table to be an extremely small, fixed-size table. That is, even though it violates normal database practices to have it in a separate table, there are useful performance advantages for us in having it separate.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Target Milestone: Bugzilla 4.0 → ---
Okay, as LpSolit pointed out on IRC, I haven't actually proven there's a perf problem, so we can keep this open until that's proven or disproven. It's possible that a covering index would solve the issue, so that could be OK.
Status: RESOLVED → REOPENED
Priority: -- → P4
Resolution: WONTFIX → ---
Assignee: LpSolit → database
You need to log in before you can comment on or make changes to this bug.