Open
Bug 281354
Opened 21 years ago
Updated 12 years ago
Create a bug_update table
Categories
(Bugzilla :: Bugzilla-General, enhancement)
Tracking
()
NEW
People
(Reporter: bugzilla-mozilla, Unassigned)
References
(Blocks 2 open bugs)
Details
I think this proposed enhancement is best described by a pseudo-patch to the
table creation part of checksetup.pl. This would also add a unique / primary
key to the longdescs and bugs_activity tables (solving Bug 225221), and split
comments and timetracking info into separate tables (allowing possibility of
fix for Bug 271913 - currently WONTFIXed merely because db schema does not
allow). Possibly also useful for bug 11368. It also avoids excessive
duplication of bug_when fields etc. (and could even be referred to for
delta_ts, creation_ts etc. on a bug?)
+$table{bug_updates} =
+ 'update_id mediumint not null auto_increment primary key,
+ bug_id mediumint not null,
+ who mediumint not null,
+ bug_when datetime not null, # ... or possibly timestamp?
+ index(bug_id),
+ index(who),
+ index(bug_when)';
$table{longdescs} =
- 'bug_id mediumint not null,
- who mediumint not null,
- bug_when datetime not null,
+ 'update_id mediumint not null,
- work_time decimal(5,2) not null default 0,
thetext mediumtext,
isprivate tinyint not null default 0,
- index(bug_id),
- index(who),
- index(bug_when),
+ unique(update_id),
fulltext (thetext)';
+$table{work_time} =
+ 'update_id mediumint not null,
+ work_time decimal(5,2) not null default 0,
+ unique(update_id)';
$table{bugs_activity} =
- 'bug_id mediumint not null,
+ 'update_id mediumint not null,
attach_id mediumint null, # perhaps this could move too?
- who mediumint not null,
- bug_when datetime not null,
fieldid mediumint not null,
added tinytext,
removed tinytext,
- index (bug_id),
- index (bug_when),
+ unique(update_id,fieldid),
index (fieldid)';
# similar change to attachements added as an afterthought...
$table{attachments} =
'attach_id mediumint not null auto_increment primary key,
- 'bug_id mediumint not null,
- creation_ts datetime not null,
+ update_id mediumint not null,
description mediumtext not null,
mimetype mediumtext not null,
ispatch tinyint,
filename varchar(100) not null,
thedata longblob not null,
- submitter_id mediumint not null,
isobsolete tinyint not null default 0,
isprivate tinyint not null default 0,
- index(bug_id),
- index(creation_ts)';
+ unique(update_id)';
# possible changes in line with this to bugs table?
$table{bugs} =
...
- creation_ts datetime not null,
- delta_ts timestamp not null,
+ first_update_id mediumint not null,
+ last_update_id mediumint not null,
...
- reporter mediumint not null, # -> who from first_update_id ???
...
- lastdiffed datetime not null, # if this is always equal to a bug_when
+ diff_update_id mediumint not null,
...
- index (creation_ts),
- index (delta_ts),
+ unique(first_update_id),
+ unique(last_update_id),
Possibly also stuff with flag request / fulfilment could refer to the relevant
update too...
Entering as UNCONFIRMED until someone more familiar with database design
confirms whether:
a) this is a good change from a database POV (e.g. would it have perf
advantages or perf hits?)
b) the (probably fairly wide-ranging) changes to code to use this layout are
ever going to be worth it... or even possible!
Comment 1•21 years ago
|
||
I'd definitely have to think about this a lot, and look at the code, to
determine whether or not it was a good idea.
We already have the bugs_activity table -- all you've done is move four fields
from that table to the bug_updates table. Instead, we should use the
bugs_activity table as it is, now, because it represents the same thing that
that "bug_update" table would represent.
Perhaps the good part of the idea is to add an "update_id" to the bugs_activity
table, and then reference this from the other tables in JOINs. That certainly
would make certain logic easier for BugMail.
Severity: normal → enhancement
| Reporter | ||
Comment 2•21 years ago
|
||
(In reply to comment #1)
> We already have the bugs_activity table -- all you've done is move four fields
> from that table to the bug_updates table.
Under the current schema, if you change 4 fields at once, the bugs_activity
table has the bug_id, who, bug_when fields all replicated across all 4 rows,
and these correspond with the bug_id, who, bug_when of a comment, etc... with
show_activity.cgi then having to examine them to determine which to show as
part of the same update.
The idea is that there is a single copy of the bug_id, who, bug_when triple
referred to from all these different places. show_activity.cgi etc. would then
use update_id to group rows by the same update.
I suggested a unique(update_id,fieldid) in bugs_activity on the assumption that
a field cannot change twice in the same update, but perhaps this needs another
field (attach_id ?) to guarantee uniqueness (e.g. more than one attachment can
be marked obsolete simultaneously), or should be replaced simply with an index
(update_id).
If this is determined to be a Good Thing, then perhaps it might be possible to
do this in stages, such that stage 1 adds the update_id and bug_updates table
(with bug_id,who,bug_when continuing to be redundantly stored), stage 2
migrates existing code to get this data from the bug_update table, and finally
stage 3 removes the redundant data from the longdescs, bugs_activity tables.
Comment 3•21 years ago
|
||
Could this perhaps be broken down into a smaller target of update? Perhaps
targetting solely the separation of time from comments (longdescs). This would
make it easier to have a bug time listing (and modification perhaps).
Comment 4•21 years ago
|
||
There's already an open bug to give longdescs a primary key; even if we go
through with this, that part of it has already been discussed and that bug
already has an owner. You may want to coordinate efforts with him.
Depends on: 225221
| Reporter | ||
Comment 5•21 years ago
|
||
(In reply to comment #4)
> There's already an open bug to give longdescs a primary key;
I did mention bug 225221 in comment 0... (although made no comment on the other
bug yet. I'll go and do that shortly - had just thought I'd see whether it this
got immediately shot down in flames first!)
I think it's more of a mutual "see also" rather than a dependency though - this
enhancement would solve the original problem in bug 225221 as it adds a UNIQUE
key that can be used as though it was PK. A genuine PK for longdescs could be
made redundant by this enhancement, as it may lead to each row having two
unique identifiers.
| Reporter | ||
Comment 6•21 years ago
|
||
One thing that could be worth checking is if the following query returns any
results on b.m.o, and if so whether it is enough to worry about (e.g. beyond
possibility of manual fix-up):
SELECT bug_id,bug_when,who,count(*)
FROM longdescs
GROUP BY bug_id,bug_when,who
HAVING count(*)>1;
It would be nice to be able to rely on the triple (bug_id,bug_when,who) to act
as a unique identifier for a comment (and to pull out all bugs_activity
associated with that comment). In principle this could be violated e.g. by a
user marking bug B and bug C as duplicates of bug A simultaneously.
| Reporter | ||
Comment 7•21 years ago
|
||
As above, a "phase 1" would simply be to add the additional columns in order to
allow other code to migrate towards using the new table. In the short term this
would duplicate the (bug_when, bug_id, who) data, but once code has been
migrated to use the new table, these columns could be dropped from their
original tables.
It may be that THIS bug just becomes the "phase 1", and that other code changes
to actually make use of this new table / split work_time from longdescs, etc go
into separate RFEs.
(Pseudo-code for / description of) changes for suggested first stage:
SELECT DISTINCT bug_id, who, bug_when FROM bugs_activity
ORDER BY bug_when, bug_id, who;
SELECT bug_id, who, bug_when FROM longdescs ORDER BY bug_when, bug_id, who;
Merge results from both queries (should be no need to examine attachments table
directly since an attachments creation_ts should always correspond with a
comment containing the magic "created an attachment" string).
CREATE TABLE bug_updates ( [ as described in comment 0 ] );
insert the (bug_id, who, bug_when) triples into this new table, in the correct
order (so that update_id order matches bug_when order).
add update_id column to longdescs, attachments and bugs_activity, and populate
based on matching the appropriate 3 fields.
Other changes necessary for phase 1:
[ in any place that new rows are inserted to these tables, insert a
corresponding row into bug_updates, or add a reference to one created earlier
during the same (bug_id, form post) ]
e.g. during process_bug.cgi, a row could be inserted into bug_updates just
after these two lines:
1352 SendSQL("select now()");
1353 $timestamp = FetchOneColumn();
... and then the newly created update_id should be used everywhere that
$timestamp is used in the following code for that bug.
Similarly in LogDependencyActivity a new row must be inserted, differing in
bug_id from the row originally added.
Doubtless other locations exist that insert data into the longdescs,
bugs_activity and/or attachments tables. All would need to be adjusted, but in
a way that avoids adding two bug_updates rows for the same bug from the same
form post.
[ probably add sanity checks to make sure that where there is a reference to an
update_id in a table that the columns corresponding to bug_id, who, bug_when
also have the correct values ]
Comment 8•21 years ago
|
||
I think something like this could be useful for custom fields, also, now that I
think of it. Because of database normalization, we want to have bug data in many
different tables, but we want to keep track of a bug update as an atomic unit.
Here's the way we'd want to go with it:
1) Resolve bug 225221.
2) Add an update_id field to bugs_activity.
3) Figure out what useful functionality-enhancing improvements we want, and how
a bug_update table could help us get there. Then we have a solid example in mind
as to where we want to go, and can plan from there. It may turn out that the
update_id field in bugs_activity could be enough...
Blocks: bz-majorarch
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Changes to database schema for longdescs and bugs_activity - primary key for a bug change/update → Create a bug_update table
| Reporter | ||
Comment 9•21 years ago
|
||
I had been just about to remove the dep on 225221, because as I had originally
intended, this would have been orthogonal to it.
However, I just thought of a slight change to the proposed schema that would
have big advantages for bulk changes, and would make use of a comment_id added
in bug 225221...
Rather than adding 'update_id mediumint not null' with a UNIQUE key to
longdescs (making comment_id redundant), instead add 'comment_id medium null'
to the bug_update table.
In cases where there is NO comment for a bug_update, then this would be null...
but the big plus is that in a bulk change, the same row in longdescs can (in
principle) be referenced by multiple bugs.
In this case, after all of the changes proposed here, all that would remain in
the longdescs table is comment_id, is_private and thetext. This would imply
that if a comment is private on one bug, it must be private on all of them...
This would seem reasonable for a bulk update, but it may be prudent to remove
is_private, leaving the longdescs as simply comment_id and thetext, which would
allow a new comment to merely add a reference to an existing comment_id. (We
don't want someone maliciously copy-pasting a comment somewhere else and then
marking it private after all!)
This should significantly reduce the number of rows in longdescs. Any thoughts?
Comment 10•21 years ago
|
||
(In reply to comment #9)
> but the big plus is that in a bulk change, the same row in longdescs can (in
> principle) be referenced by multiple bugs.
If you do some searching, I recall seeing a bug already open with this as its
initial description; the author bemoaned the fact that mass changes stored the
same comment many times.
For DB normalization, though, you don't want this as another column on the
bug_update table; you want a third table joining the two, with exactly two
columns; bug_update_id (unique) and comment_id. This allows a one-to-n
relationship between the update_id and the comment_id, without ever storing
either duplicate information (i.e. the same update_id many times in the
longdescs table, with the comment repeated many times) or nulls (i.e. rows in
the bug_update table with no corresponding comment).
> it may be prudent to remove
> is_private, leaving the longdescs as simply comment_id and thetext, which
> would allow a new comment to merely add a reference to an existing
> comment_id.
Not sure that you'll ever be able to create a UI to allow a user to simply copy
an existing comment by pasting in a reference without it being rather arcane; I
see this functionality as being useful almost exclusively for mass updates.
I can see arguments on both sides of the coin as far as keeping the isprivate
boolean attached to the comment itself. One side is that comments should either
be private or not; the privacy of a comment is not bug-dependent. The other
side is that maybe a user WANTS it to be bug-dependent, especially if they have
groups set up so that only certain people can see certain bugs... but one could
then say that they shouldn't have done a mass-update. (Also, allowing privacy
on a per-comment-per-bug basis would mean another table.)
Either way, if you DO take isprivate out of longdescs, you'd definitely need to
implement a check to see if the comment applies to more than one bug; whether
you warn the user or prevent it from happening at all is a decision you can
make later. At the beginning, I'd leave it in and simply prevent them from
changing it; add the later functionality in an enhancement.
> This should significantly reduce the number of rows in longdescs.
I dunno if it would be *significant* -- and unless you're somehow going to do
it retroactively, it would only affect future changes -- but I still see it
as 'worth doing'.
The more I think about it, the more I like the idea of one atomic bug_update_id
that stores all the unchanging information (who did it, when it was done, and
what bug it was done to) in a single place. It's a *big* change, though, and
touches a lot of things; you'd really have to have some drive to push through a
change of this magnitude, even broken up into smaller pieces.
| Reporter | ||
Comment 11•21 years ago
|
||
(In reply to comment #10)
> For DB normalization, though, you don't want this as another column on the
> bug_update table; you want a third table joining the two, with exactly two
> columns; bug_update_id (unique) and comment_id.
I remember talk of doing the opposite at one point - with a suggestion to move
the duplicates table into additional columns in the bugs table...
Presumably there's a reason why an additional table is preferable to an
additional column for which some values are NULL (corresponding with the
minority (?) of occasions where a bug_update is done without a comment). Would
the same apply to all other columns on other tables that are permitted to be
NULL?
> Not sure that you'll ever be able to create a UI to allow a user to simply
> copy an existing comment by pasting in a reference without it being rather
> arcane;
That would be where the system pro-actively searches for an identical comment.
e.g. the automatically generated comment for a DUPE (where no user comment was
made), a user that uses a stock comment frequently, or simply a "likely to be
repeated" comment such as "OK" or "Yes".
Similarly for a mass update that is in fact split across several queries (with
the same comment), or done a batch at a time.
> One side is that comments should either
> be private or not; the privacy of a comment is not bug-dependent.
Almost certainly true if it came from the same mass-update. Perhaps not if
different users made the same comment to different bugs.
> (Also, allowing privacy
> on a per-comment-per-bug basis would mean another table.)
In principle, couldn't it slip in to the table you suggested adding above, or
does that literally need to be 2 columns for optimisation reasons?
> > This should significantly reduce the number of rows in longdescs.
> I dunno if it would be *significant*
Depends on your "significance" threshold! Especially when looking back at
historic bugs, there seem to me to be a "signficant" number of mass changes.
> and unless you're somehow going to do
> it retroactively, it would only affect future changes
Certainly this would take quite a long time on a huge DB.
> you'd really have to have some drive to push through a
> change of this magnitude, even broken up into smaller pieces.
Probably worth hunting around for bugs that this would enable / make easier...
I mentioned a couple in comment 0.
Comment 12•20 years ago
|
||
I'd like to do this as the backend of Bugzilla::ChangeSet. At the least, the bugs_activity table can become enhanced.
Updated•19 years ago
|
Target Milestone: Bugzilla 3.0 → ---
Updated•15 years ago
|
Assignee: mkanat → general
You need to log in
before you can comment on or make changes to this bug.
Description
•