Open Bug 281354 Opened 21 years ago Updated 12 years ago

Create a bug_update table

Categories

(Bugzilla :: Bugzilla-General, enhancement)

2.19.2
enhancement
Not set
normal

Tracking

()

People

(Reporter: bugzilla-mozilla, Unassigned)

References

(Blocks 2 open bugs)

Details

I think this proposed enhancement is best described by a pseudo-patch to the table creation part of checksetup.pl. This would also add a unique / primary key to the longdescs and bugs_activity tables (solving Bug 225221), and split comments and timetracking info into separate tables (allowing possibility of fix for Bug 271913 - currently WONTFIXed merely because db schema does not allow). Possibly also useful for bug 11368. It also avoids excessive duplication of bug_when fields etc. (and could even be referred to for delta_ts, creation_ts etc. on a bug?) +$table{bug_updates} = + 'update_id mediumint not null auto_increment primary key, + bug_id mediumint not null, + who mediumint not null, + bug_when datetime not null, # ... or possibly timestamp? + index(bug_id), + index(who), + index(bug_when)'; $table{longdescs} = - 'bug_id mediumint not null, - who mediumint not null, - bug_when datetime not null, + 'update_id mediumint not null, - work_time decimal(5,2) not null default 0, thetext mediumtext, isprivate tinyint not null default 0, - index(bug_id), - index(who), - index(bug_when), + unique(update_id), fulltext (thetext)'; +$table{work_time} = + 'update_id mediumint not null, + work_time decimal(5,2) not null default 0, + unique(update_id)'; $table{bugs_activity} = - 'bug_id mediumint not null, + 'update_id mediumint not null, attach_id mediumint null, # perhaps this could move too? - who mediumint not null, - bug_when datetime not null, fieldid mediumint not null, added tinytext, removed tinytext, - index (bug_id), - index (bug_when), + unique(update_id,fieldid), index (fieldid)'; # similar change to attachements added as an afterthought... $table{attachments} = 'attach_id mediumint not null auto_increment primary key, - 'bug_id mediumint not null, - creation_ts datetime not null, + update_id mediumint not null, description mediumtext not null, mimetype mediumtext not null, ispatch tinyint, filename varchar(100) not null, thedata longblob not null, - submitter_id mediumint not null, isobsolete tinyint not null default 0, isprivate tinyint not null default 0, - index(bug_id), - index(creation_ts)'; + unique(update_id)'; # possible changes in line with this to bugs table? $table{bugs} = ... - creation_ts datetime not null, - delta_ts timestamp not null, + first_update_id mediumint not null, + last_update_id mediumint not null, ... - reporter mediumint not null, # -> who from first_update_id ??? ... - lastdiffed datetime not null, # if this is always equal to a bug_when + diff_update_id mediumint not null, ... - index (creation_ts), - index (delta_ts), + unique(first_update_id), + unique(last_update_id), Possibly also stuff with flag request / fulfilment could refer to the relevant update too... Entering as UNCONFIRMED until someone more familiar with database design confirms whether: a) this is a good change from a database POV (e.g. would it have perf advantages or perf hits?) b) the (probably fairly wide-ranging) changes to code to use this layout are ever going to be worth it... or even possible!
I'd definitely have to think about this a lot, and look at the code, to determine whether or not it was a good idea. We already have the bugs_activity table -- all you've done is move four fields from that table to the bug_updates table. Instead, we should use the bugs_activity table as it is, now, because it represents the same thing that that "bug_update" table would represent. Perhaps the good part of the idea is to add an "update_id" to the bugs_activity table, and then reference this from the other tables in JOINs. That certainly would make certain logic easier for BugMail.
Severity: normal → enhancement
(In reply to comment #1) > We already have the bugs_activity table -- all you've done is move four fields > from that table to the bug_updates table. Under the current schema, if you change 4 fields at once, the bugs_activity table has the bug_id, who, bug_when fields all replicated across all 4 rows, and these correspond with the bug_id, who, bug_when of a comment, etc... with show_activity.cgi then having to examine them to determine which to show as part of the same update. The idea is that there is a single copy of the bug_id, who, bug_when triple referred to from all these different places. show_activity.cgi etc. would then use update_id to group rows by the same update. I suggested a unique(update_id,fieldid) in bugs_activity on the assumption that a field cannot change twice in the same update, but perhaps this needs another field (attach_id ?) to guarantee uniqueness (e.g. more than one attachment can be marked obsolete simultaneously), or should be replaced simply with an index (update_id). If this is determined to be a Good Thing, then perhaps it might be possible to do this in stages, such that stage 1 adds the update_id and bug_updates table (with bug_id,who,bug_when continuing to be redundantly stored), stage 2 migrates existing code to get this data from the bug_update table, and finally stage 3 removes the redundant data from the longdescs, bugs_activity tables.
Could this perhaps be broken down into a smaller target of update? Perhaps targetting solely the separation of time from comments (longdescs). This would make it easier to have a bug time listing (and modification perhaps).
There's already an open bug to give longdescs a primary key; even if we go through with this, that part of it has already been discussed and that bug already has an owner. You may want to coordinate efforts with him.
Depends on: 225221
(In reply to comment #4) > There's already an open bug to give longdescs a primary key; I did mention bug 225221 in comment 0... (although made no comment on the other bug yet. I'll go and do that shortly - had just thought I'd see whether it this got immediately shot down in flames first!) I think it's more of a mutual "see also" rather than a dependency though - this enhancement would solve the original problem in bug 225221 as it adds a UNIQUE key that can be used as though it was PK. A genuine PK for longdescs could be made redundant by this enhancement, as it may lead to each row having two unique identifiers.
One thing that could be worth checking is if the following query returns any results on b.m.o, and if so whether it is enough to worry about (e.g. beyond possibility of manual fix-up): SELECT bug_id,bug_when,who,count(*) FROM longdescs GROUP BY bug_id,bug_when,who HAVING count(*)>1; It would be nice to be able to rely on the triple (bug_id,bug_when,who) to act as a unique identifier for a comment (and to pull out all bugs_activity associated with that comment). In principle this could be violated e.g. by a user marking bug B and bug C as duplicates of bug A simultaneously.
As above, a "phase 1" would simply be to add the additional columns in order to allow other code to migrate towards using the new table. In the short term this would duplicate the (bug_when, bug_id, who) data, but once code has been migrated to use the new table, these columns could be dropped from their original tables. It may be that THIS bug just becomes the "phase 1", and that other code changes to actually make use of this new table / split work_time from longdescs, etc go into separate RFEs. (Pseudo-code for / description of) changes for suggested first stage: SELECT DISTINCT bug_id, who, bug_when FROM bugs_activity ORDER BY bug_when, bug_id, who; SELECT bug_id, who, bug_when FROM longdescs ORDER BY bug_when, bug_id, who; Merge results from both queries (should be no need to examine attachments table directly since an attachments creation_ts should always correspond with a comment containing the magic "created an attachment" string). CREATE TABLE bug_updates ( [ as described in comment 0 ] ); insert the (bug_id, who, bug_when) triples into this new table, in the correct order (so that update_id order matches bug_when order). add update_id column to longdescs, attachments and bugs_activity, and populate based on matching the appropriate 3 fields. Other changes necessary for phase 1: [ in any place that new rows are inserted to these tables, insert a corresponding row into bug_updates, or add a reference to one created earlier during the same (bug_id, form post) ] e.g. during process_bug.cgi, a row could be inserted into bug_updates just after these two lines: 1352 SendSQL("select now()"); 1353 $timestamp = FetchOneColumn(); ... and then the newly created update_id should be used everywhere that $timestamp is used in the following code for that bug. Similarly in LogDependencyActivity a new row must be inserted, differing in bug_id from the row originally added. Doubtless other locations exist that insert data into the longdescs, bugs_activity and/or attachments tables. All would need to be adjusted, but in a way that avoids adding two bug_updates rows for the same bug from the same form post. [ probably add sanity checks to make sure that where there is a reference to an update_id in a table that the columns corresponding to bug_id, who, bug_when also have the correct values ]
I think something like this could be useful for custom fields, also, now that I think of it. Because of database normalization, we want to have bug data in many different tables, but we want to keep track of a bug update as an atomic unit. Here's the way we'd want to go with it: 1) Resolve bug 225221. 2) Add an update_id field to bugs_activity. 3) Figure out what useful functionality-enhancing improvements we want, and how a bug_update table could help us get there. Then we have a solid example in mind as to where we want to go, and can plan from there. It may turn out that the update_id field in bugs_activity could be enough...
Blocks: bz-majorarch
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Changes to database schema for longdescs and bugs_activity - primary key for a bug change/update → Create a bug_update table
I had been just about to remove the dep on 225221, because as I had originally intended, this would have been orthogonal to it. However, I just thought of a slight change to the proposed schema that would have big advantages for bulk changes, and would make use of a comment_id added in bug 225221... Rather than adding 'update_id mediumint not null' with a UNIQUE key to longdescs (making comment_id redundant), instead add 'comment_id medium null' to the bug_update table. In cases where there is NO comment for a bug_update, then this would be null... but the big plus is that in a bulk change, the same row in longdescs can (in principle) be referenced by multiple bugs. In this case, after all of the changes proposed here, all that would remain in the longdescs table is comment_id, is_private and thetext. This would imply that if a comment is private on one bug, it must be private on all of them... This would seem reasonable for a bulk update, but it may be prudent to remove is_private, leaving the longdescs as simply comment_id and thetext, which would allow a new comment to merely add a reference to an existing comment_id. (We don't want someone maliciously copy-pasting a comment somewhere else and then marking it private after all!) This should significantly reduce the number of rows in longdescs. Any thoughts?
(In reply to comment #9) > but the big plus is that in a bulk change, the same row in longdescs can (in > principle) be referenced by multiple bugs. If you do some searching, I recall seeing a bug already open with this as its initial description; the author bemoaned the fact that mass changes stored the same comment many times. For DB normalization, though, you don't want this as another column on the bug_update table; you want a third table joining the two, with exactly two columns; bug_update_id (unique) and comment_id. This allows a one-to-n relationship between the update_id and the comment_id, without ever storing either duplicate information (i.e. the same update_id many times in the longdescs table, with the comment repeated many times) or nulls (i.e. rows in the bug_update table with no corresponding comment). > it may be prudent to remove > is_private, leaving the longdescs as simply comment_id and thetext, which > would allow a new comment to merely add a reference to an existing > comment_id. Not sure that you'll ever be able to create a UI to allow a user to simply copy an existing comment by pasting in a reference without it being rather arcane; I see this functionality as being useful almost exclusively for mass updates. I can see arguments on both sides of the coin as far as keeping the isprivate boolean attached to the comment itself. One side is that comments should either be private or not; the privacy of a comment is not bug-dependent. The other side is that maybe a user WANTS it to be bug-dependent, especially if they have groups set up so that only certain people can see certain bugs... but one could then say that they shouldn't have done a mass-update. (Also, allowing privacy on a per-comment-per-bug basis would mean another table.) Either way, if you DO take isprivate out of longdescs, you'd definitely need to implement a check to see if the comment applies to more than one bug; whether you warn the user or prevent it from happening at all is a decision you can make later. At the beginning, I'd leave it in and simply prevent them from changing it; add the later functionality in an enhancement. > This should significantly reduce the number of rows in longdescs. I dunno if it would be *significant* -- and unless you're somehow going to do it retroactively, it would only affect future changes -- but I still see it as 'worth doing'. The more I think about it, the more I like the idea of one atomic bug_update_id that stores all the unchanging information (who did it, when it was done, and what bug it was done to) in a single place. It's a *big* change, though, and touches a lot of things; you'd really have to have some drive to push through a change of this magnitude, even broken up into smaller pieces.
(In reply to comment #10) > For DB normalization, though, you don't want this as another column on the > bug_update table; you want a third table joining the two, with exactly two > columns; bug_update_id (unique) and comment_id. I remember talk of doing the opposite at one point - with a suggestion to move the duplicates table into additional columns in the bugs table... Presumably there's a reason why an additional table is preferable to an additional column for which some values are NULL (corresponding with the minority (?) of occasions where a bug_update is done without a comment). Would the same apply to all other columns on other tables that are permitted to be NULL? > Not sure that you'll ever be able to create a UI to allow a user to simply > copy an existing comment by pasting in a reference without it being rather > arcane; That would be where the system pro-actively searches for an identical comment. e.g. the automatically generated comment for a DUPE (where no user comment was made), a user that uses a stock comment frequently, or simply a "likely to be repeated" comment such as "OK" or "Yes". Similarly for a mass update that is in fact split across several queries (with the same comment), or done a batch at a time. > One side is that comments should either > be private or not; the privacy of a comment is not bug-dependent. Almost certainly true if it came from the same mass-update. Perhaps not if different users made the same comment to different bugs. > (Also, allowing privacy > on a per-comment-per-bug basis would mean another table.) In principle, couldn't it slip in to the table you suggested adding above, or does that literally need to be 2 columns for optimisation reasons? > > This should significantly reduce the number of rows in longdescs. > I dunno if it would be *significant* Depends on your "significance" threshold! Especially when looking back at historic bugs, there seem to me to be a "signficant" number of mass changes. > and unless you're somehow going to do > it retroactively, it would only affect future changes Certainly this would take quite a long time on a huge DB. > you'd really have to have some drive to push through a > change of this magnitude, even broken up into smaller pieces. Probably worth hunting around for bugs that this would enable / make easier... I mentioned a couple in comment 0.
I'd like to do this as the backend of Bugzilla::ChangeSet. At the least, the bugs_activity table can become enhanced.
Assignee: general → mkanat
Blocks: 301447
Target Milestone: --- → Bugzilla 2.24
Target Milestone: Bugzilla 3.0 → ---
Assignee: mkanat → general
You need to log in before you can comment on or make changes to this bug.