Closed Bug 225221 Opened 21 years ago Closed 19 years ago

longdescs table needs a primary key

Categories

(Bugzilla :: Bugzilla-General, defect, P1)

Tracking

()

RESOLVED FIXED
Bugzilla 3.0

People

(Reporter: chicks, Assigned: altlist)

References

(Blocks 3 open bugs)

Details

Attachments

(2 files, 3 obsolete files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Previously discussed developers@bugzilla.org on mailing list. My currently preferred solution is pretty simple: alter table longdescs add cid int(10) not null auto_increment primary key; Reproducible: Always Steps to Reproduce:
That would cause horrendously unreadable comment numbers. I dare not even guess what sort numbers would we be going on this particular installation, but I'd suffer seriously if I had to enter "comment 125192519" to reference a bug comment... Bugzilla has enough too big numbers already ;-) CCing people who took part in the mailing list discussion.
Assignee: justdave → chicks
Just adding a primary key doesn't mean the UI has to use it as the sole way to access a record. The UI could still permit a comment to be referred to as comment 1 and even generart a URL using the #c1 anchor while the DB maintains a primary key.
The ideas in comment 1 and comment 2 are good. There's no reason we can't include anchors for the human readable value #c1, #c2, etc. and the more interesting for automation unique primary key - #c=1234, #c=34567 and so on. As for what "appears" for the user, while most folks would be utterly uninterested in knowing the big number PK value, it could be made a popup for those who do want to know.
Target Milestone: --- → Bugzilla 2.18
Remind us again why it needs a primary key? Does it make searches faster? If there is some sort of internal comment number, it should not be exposed anywhere in the UI. It would be just too confusing. Gerv
Why do we need this? -------------------- - The database gods said every table shall have a primary key or it isn't a real table. This is wisdom I've found to be worth heeding. - The database gods were expecting things like I experienced when I tried to refine my bugzilla-based billing system. It's a pain to refer to something if you don't have some way of knowing that you're referring to one thing. I wanted a table that linked a given chunk of time expended to a given invoice. It should be a simple two-column table, but when I looked at longdescs it had no PK. - MysqlTool is my favorite if somewhat abandoned web GUI for tinkering with MySQL databases. It is handy for tweaking records in longdescs to move comments back in time for instance. Yet when you try to do this it doesn't give you the option because there's no unique key for it to refer to to be confidant it's updating one record at a time. I suspect any effort to make comments editable would suffer without some primary key. Bug 540 for instance. - bbaetz made a comment on the mailing list about this being desirable for making it easier to display comment #'s. I'm not sure exactly clear what performance bottleneck he is referring to honestly. However, anything that's significantly interacting with the longdescs table should find a performance increase with the availability of the primary key. Instead of queries needing three fields to have a hope of a unique identifier, there would be only field.
Blocks: 540
The lack of a repeatable key (as opposed to enumerating comments at run-time) makes processing things like comment privacy a pain. A better reason, though, is that Windows ODBC programs like M$ACCESS get confused when there is no primary key.
By 'comment number' I was thinking of a per-bug comment, so the pk is (bug_id, comment_id). This avoids the daylight saving problem, too.
Status: UNCONFIRMED → NEW
Ever confirmed: true
To bbaetz: (1) Per bug comment ids aren't necessary to solve the dst sort order problem. Unique ids across all of the longdescs records will sort just as well as per bug ids. (2) Unique ids across the entire table are necessary to solve various problems already mentioned. (3) Per bug ids are much harder to implement and maintain. It's much easier to have FK's be single fields under nearly all circumstances. Do you still feel a two-field PK is the best way? If so, can you explain why better? P.S.: I can't find the bug you're referring to as for dst causing comments sort order issues.
Status: NEW → ASSIGNED
Attached patch simple approach to fixing (obsolete) — Splinter Review
I was hoping to get more consensus before submitting a patch, but here's an initial attempt. Comments welcomed.
> Comments welcomed. Can you use cvs diff -u (Unified) for making the diff? It's much easier to read. > $dbh->do("ALTER TABLE longdescs ADD comment_id mediumint not null auto_increment primary key;"); Can you split that around 72 chars? :) If you upload a new patch with those, request review on it from a Bugzilla Reviewer ( http://bugzilla.org/who_we_are.html ). If you want a fast reply, kiko, jouni, gerv are suitable names. :)
Followed jocuri's suggestions.
Attachment #135697 - Attachment is obsolete: true
Comment on attachment 135731 [details] [diff] [review] simple fix with cleanups I think this takes care of it.
Attachment #135731 - Flags: review?(gerv)
Comment on attachment 135731 [details] [diff] [review] simple fix with cleanups Christopher, the DST issue Bradley mentioned is bug 176979. I agree with you that having a primary key and ordering by it is enough to order comments correctly when DST strikes, but let's ask bbaetz to comment on the change.
Attachment #135731 - Flags: review?(bbaetz)
Blocks: 176979
Comment on attachment 135731 [details] [diff] [review] simple fix with cleanups The conversion code isn't sufficient - you need to set the id on all of the existing bits, and do it in order (ie order by bug_when, and assign them one by one, then update the current value for the auto_increment field) That allows us to select the comments ordering by this number, and so on.
Considering that generally comments are already going to be in the database in order this seems like a significant waste of effort. I'll do patch 3 tomorrow.
Whiteboard: sheesh
>Considering that generally comments are already going to be in the database in >order this seems like a significant waste of effort. In what cases (apart from the DST issue) could the longdescs currently be out of order in the db? Point being, if we already show some old comments in wrong order, I don't think we're going to have to fix it now if it makes the conversion code significantly bulkier and more complex.
I doubt there are going to be that many cases when longdescs is naturally out of order - DST issue or not. Based on running bugzilla on MySQL 3.23 and 4.0 it seems that records are added in order to the end. Deletions and updates seem to be the only way someone can "easily" get longdescs and out of order. Since longdescs has lacked a primary key and thus has been effectively unmodifiable with standard tools so I doubt almost any people will have data that is naturally out of order. Regardless of what exists preconverstion, worrying about the effects of such modifications and pedantically getting the PK's in useful order during conversion to PK doesn't address the bigger long term issue of what happens when people start modifying things. If I change the date in the database of a comment does it make more sense for it still be to comment 10 or should it now be comment 5 (or whatever)? Should I reorder the PK's again then?
2 examples of where things can normally happen "out of order": (1) Timestamping comments correctly when adding nightly cvs summaries to relevant bugs. (2) Using an e-mails date when doing an e-mail interface into bugzilla.
To me, it sounds very dubious whether it's a good idea to insert emailed/cvs updated comments above already existing comments anyway. It would potentially make discussion _very_ hard to follow. I'd expect things to be shown in the order in which they're entered.
Indeed; and the (related) reason longdescs has, up to this point, been hard to edit is because the Bugzilla team has often taken the attitude that history is history, and you shouldn't alter it. If adding a PK to longdescs allows people to edit it, is that something we actually want to encourage? Gerv
jouni: Thast hwy we need this. If I email in a response,and its delayed by 2-3 days for some reason, we don't want to renumber the comments. gerv: Hidden comments is one reason - we are able to change those. Christopher: "Based on running bugzilla on MySQL 3.23 and 4.0 it seems that records are added in order to the end" You mean as in a select without an ORDER BY? don't do that - the database system can do whatever it wants in that case, including a different order each time you run the query. I still think that (bug_id, comment_id) is enough for uniqueness. The only issue would be if we start suporting more complex hidden sementics for bugs, and put the restriction into a different table - we'd need a two column FK in that case.
Blocks: 113096
I am utterly discouraged by this process. What I'm trying to do here is little different than the way attachments are currently handled. Using a single field simple unique identifier for that is fine and people see the numbers going up into infinity without anybody being disturbed. Whether people can accept that there are valid reasons why comments might arrive "late", there are good reasons. We use bugzilla for tracking all of our time. Putting the time on the day the work was performed instead of the data entry day makes people much, much happier and saner. Forcing a comment number into the database is going to make that harder with little gain. Bradley has desires for this that go far beyond what I was ever trying to solve. Considering that what I've done solves the current problem in a nice clean way and lays the necessary basis for these other things it really torques me that this had gotten bogged down. I strongly feel Bradley is wrong technically and managerially. Pushing all of this onto somebody trying to get their first simple patch through seems a very effective way to keep the database structure dorked. checksetup.pl will continue to grow and grow without check. Simple database fixes will just die because it's not worth the battle. It's sad.
I'm sorry to hear your disappointment. It's a sad day when the issues with this sort of development process steal from the desire to contribute code. Everyone of us encounters the bug they want to solve simply, but everybody else wants a perfect solution. It's not the most motivating of all things to happen on your first bug. I think we all agree that comment numbers have to be per bug in the UI. The question is, whether or not the uniqueness per bug should be implemented in the database or not. And while I understand your frustration, I wouldn't blame Bradley for neither db-wise or managerial bad stance here. Let's face it: Bug-wise comment ids are significantly harder to implement than what you're suggesting. That does not indicate your solution is necessarily right. If the needs call for a complex solution, then a simple one won't usually do. From a code management perspective, it would be bad to check in code that fails to achieve what we want out of this. Now we just have to find out whether or not there will be a significant need to handle comment references on the db level. I know a couple of situations where gathering comment content by its relative number is necessary; one of these is automatically pulling data from references like "bug 225221 comment 4". Another important one is bug 113096. Joel, Bbaetz and everybody else: Do you have further ideas where we might need this comment number in the db?
What I don't get is: - Comment numbers need to be per-bug - Comment numbers need to not change How can you possibly allow the insertion of out-of-order comments without violating these two constraints? Number the comments in increments of ten so you can insert others later? Gerv
Comment on attachment 135731 [details] [diff] [review] simple fix with cleanups From bbaetz' comment 14.
Attachment #135731 - Flags: review?(bbaetz) → review-
Attachment #135731 - Flags: review?(gerv)
Target Milestone: Bugzilla 2.18 → Bugzilla 2.20
Alright, I'm going to give this going a whirl again. Here's my fresh start. (A) I believe all tables in bugzilla should have a primary key. This is a necessity for being able to uniquely refer to rows. (B) I believe a single column key is easier to work with. We use a single column key in attachments which is exposed to users and I don't think any harm has been done, so using a single column key that's not exposed to the user in any way should cause even less harm. (C) I'm not promising that fixing this problem is going to fix any other problem. This is particularly relevant for bug 176979 which I regret promising to fix. If we can agree on how to fix this bug (which I feel is primarily a debate between multi-column PK's and single column PK's) then I'll be happy to move onto fixing bug 176979 by working on a patch to make the single-column PK be in order. I don't appreciate folks who have expanded the scope of this bug to include fixing these other bugs (which I haven't been assigned).
Flags: blocking2.20?
(In reply to comment #24) > What I don't get is: > > - Comment numbers need to be per-bug > - Comment numbers need to not change > > How can you possibly allow the insertion of out-of-order comments without > violating these two constraints? Number the comments in increments of ten so > you can insert others later? Comment numbers can go up only incremented by 1, but the real choice is whether to show the comments in comment# order or date order.
> Comment numbers can go up only incremented by 1, but the real choice is whether > to show the comments in comment# order or date order. The thing is, having the two orders be not the same isn't good. It would be really confusing to have: Additional Comment #1 From Jouni Heikniemi 2003-11-10 07:27 PDT [reply] Additional Comment #17 From Gervase Markham 2003-11-11 14:33 PDT [reply] Additional Comment #9 From Christopher Hicks 2003-11-12 19:25 PDT [reply] Additional Comment #2 From Dave Miller 2003-11-14 09:21 PDT [reply] but it would be even more confusing to have: Additional Comment #1 From Jouni Heikniemi 2003-11-10 07:27 PDT [reply] Additional Comment #2 From Dave Miller 2003-11-14 09:21 PDT [reply] Additional Comment #9 From Christopher Hicks 2003-11-12 19:25 PDT [reply] Additional Comment #17 From Gervase Markham 2003-11-11 14:33 PDT [reply] Gerv
BTW, I should say that I happily agree with your A), B) and C), and I'm not trying to frustrate you here. I don't have a particular objection to a primary key for longdescs if we can show that one is vaguely useful (and you seem to strongly think it is.) But I'm very concerned about allowing the insertion of out-of-order comments. I can't see any way to permit this without ending up with something that sucks from a usability perspective. So if that's the only reason for a longdescs primary key, then I don't see the need. If there are other reasons, cool, let's enumerate them and do it, and have the out-of-order discussion in another bug. Gerv
(In reply to comment #28) > The thing is, having the two orders be not the same isn't good. Comment 18 explains the need, what else would you propose to do to meet those needs?
(In reply to comment #29) > BTW, I should say that I happily agree with your A), B) and C), and I'm not > trying to frustrate you here. That's good. > I don't have a particular objection to a primary key for longdescs if we can > show that one is vaguely useful (and you seem to strongly think it is.) Its truly difficult for me to believe that noone with any database design knowledge hasn't pointed this out long, long ago. Read any of CJ Dates books and every other chapter will mention the fact in passing. Its some of the most basic common database design knowledge out there! http://www.standardreporting.net/survival/view.aspx?_@id=53484 > But I'm very concerned about allowing the insertion of out-of-order comments. Reality is that email and PDAs are asynchronous devices. > I can't see any way to permit this without ending up with something that sucks > from a usability perspective. We're doing this now and it sucks much less than the alternative. The alternative was to have everything asynchronous say in the note "originated on X date". That was ugly, confusing, and required repeated explanations to clients. > So if that's the only reason for a longdescs > primary key, then I don't see the need. Have you ever read anything on database design? > If there are other reasons, cool, let's > enumerate them and do it, and have the out-of-order discussion in another bug. All tables need primary keys. Its necessary so that you can uniquely refer to a given row. This is a basic precept of relational database design. Every table needs a primary key. How bugzilla got this far without PK's is a really fascinating question though.
(In reply to comment #31) > All tables need primary keys. Its necessary so that you can uniquely refer to a > given row. This is a basic precept of relational database design. Every table > needs a primary key. How bugzilla got this far without PK's is a really > fascinating question though. I agree with Chris here, for the record. The specifics here may make that difficult, but I do think it's certainly a goal to have a PK applied to longdescs. I recently added variable comment tags locally, and replicating the data over into related tables was horrible. I wish this bug was solved back then.
OK, then. Let's do the PK thing, and if Chris wants to try and get something that does out-of-order comments into Bugzilla in another bug, we can have a fight about it there :-) bbaetz is our database design guru, and we normally get him to sign off on schema changes. If you can convince him of what you want to do, it's cool with me. Comment 21 was his last thought... Gerv
(In reply to comment #33) > OK, then. Let's do the PK thing, and if Chris wants to try and get something > that does out-of-order comments into Bugzilla in another bug, we can have a > fight about it there :-) Bug 264057 it is. Feel free to assign it to me. > bbaetz is our database design guru, and we normally get him to sign off on > schema changes. If you can convince him of what you want to do, it's cool with > me. Comment 21 was his last thought... So I'm back to being nowhere on this. Lovely.
I'd work with comment 21, just ignoring the last paragraph, since we've agreed that a primary key on longdescs is indeed convenient (I'm joining on it to other tables in some custom patches I have flying around).
(In reply to comment #35) > I'd work with comment 21, just ignoring the last paragraph, since we've agreed > that a primary key on longdescs is indeed convenient How can I ignore it? Gerv says bbaetz is the database guru and he's not weighing in any further. I disagree with his last paragraph and without some agreement on how to fix this I'm dead in the water. Is someone official willing to say "a single value PK is fine" so I can proceed? > (I'm joining on it to other > tables in some custom patches I have flying around). I do an outer join on longdescs to a "billed" table to determine what's not billed. I can do that with a one or two part key, but a one part key still seems much cleaner for all involved.
Both Gerv and I have said it. I'm second-place in reviews so I hold some authority at least I hope :-)
(In reply to comment #37) > Both Gerv and I have said it. I'm second-place in reviews so I hold some > authority at least I hope :-) Since Gerv seemed to be deferring to bbaetz I wasn't clear. I'll produce a new patch for this against a current version.
(In reply to comment #33) > bbaetz is our database design guru, and we normally get him to sign off on > schema changes. If you can convince him of what you want to do, it's cool with > me. Comment 21 was his last thought... Based on the rest of the comments I've seen here, I'll sign off on it. :) Do we have an updated patch yet? I won't block release for this, but I'll certainly take it if it gets done before then.
Flags: blocking2.20? → blocking2.20-
Blocks: 281354
Bug 281354, which I recently filed proposes an alternative mechanism by which a single-column UNIQUE key would be generated for the longdescs table (which I think MySql treats as a PK in absence of a "real" PK). Comments on here about the desirability of properly increasing ID values for historical data would also apply to that. A simple comment_id as proposed here would do no harm, but would become redundant if both were implemented.
Just ran across a comment in https://bugzilla.mozilla.org/show_bug.cgi?id=52557#c187 saying: "This patch also includes a patch for bug 225221. I was too lazy to remove that patch from this one." Haven't looked at his solution there.
(In reply to comment #41) > https://bugzilla.mozilla.org/show_bug.cgi?id=52557#c187 > Haven't looked at his solution there. It's not a different solution - he had applied attached to this bug.
I would agree that a single-column PK is more flexible, even if a dual-column PK seems to make more sense based on the architecture that we have now. I think the reason that bbaetz was thinking in terms of dual-column PK is all of the "bug X comment Y" links that exist in the world. That is, Bugzilla already has a sort of history where comments aren't alterable. However, having a single-column PK works for Attachments, you're right. And it would probably make threaded comments easier, and also would probably make solving bug 540 easier. Of course, we'd still have to always display the correct bug-relative comment number to the end-user (and those numbers can't change -- think of all the #c34 or something like that links in the world). So why not make a single-column PK for this bug, and then on another bug we can add comment_id? I'll do the review for the single-column PK, if you will write it.
Blocks: bz-majorarch
The r- for the current patch was based on comment 14, which only applies if we need to be able to do an "ORDER BY comment_id" instead of "ORDER BY bug_when". This seems short-sighted, since the logical solution to bug 540 would involve a new row to longdescs that replaces the existing row. This would (by definition) have a higher comment_id than any comments submitted in-between, but should still appear in the original comment's position. Bug 281354 would provide an update_id that was linked to a particular bug change, and this could be used (amongst other things) for ordering comments. This could also allow extensions that would make a comment_id completely meaningless except for referencing a comment's text (bug 281354 comment 9). IMHO, this bug should merely provide a PK to the longdescs table, AND NOTHING ELSE. i.e. the existing patch would be all that is required for the version to which it applies. I expect that being over a year old it is rather bitrotten though.
Sorry for being out of touch and not cleaning this up for submission yet. I got Max's note and decided to punt until 2.20.
Target Milestone: Bugzilla 2.20 → Bugzilla 2.22
The lack of a primary key for this table is really irritating me. Could you please submit a patch asap? We should unfreeze in the next few days and I would be really happy to see thix bug being fixed. Adding bug 152899 in the blocking list, in order to know when to hack AppendComment to return the comment's ID.
Blocks: 152899
Whiteboard: sheesh
Target Milestone: Bugzilla 2.22 → Bugzilla 2.24
This is now blocking enough bugs and causing enough trouble that I think it's become a major issue.
Severity: normal → major
Priority: -- → P1
I would vote to have comment_id represent the comment # for that specific bug. Main reason is that one can nicely record changes to "private" comments in the Bug Activity page. Moreover, the Bug.pm and template routines could then use the sql tables to get the comment #, rather than auto-calculate this on the fly.
Blocks: 318063
Attached patch cvs tip patch (obsolete) — Splinter Review
(In reply to comment #44) > IMHO, this bug should merely provide a PK to the longdescs table, AND NOTHING > ELSE. i.e. the existing patch would be all that is required for the version to > which it applies. I thought about it some more and agree with comment #44. A simple PK will do for now and can use this patch for some enhancements I'm working on.
Attachment #205346 - Flags: review?(LpSolit)
Attached patch cvs tip patch v2 (obsolete) — Splinter Review
My bad, patch was created incorrectly.
Attachment #205346 - Attachment is obsolete: true
Attachment #205349 - Flags: review?(LpSolit)
Attachment #205346 - Flags: review?(LpSolit)
Comment on attachment 205349 [details] [diff] [review] cvs tip patch v2 Sure, I agree. We can add a comment_num field later, if we want a per-bug comment_id.
Attachment #205349 - Flags: review?(LpSolit) → review+
Comment on attachment 205349 [details] [diff] [review] cvs tip patch v2 >+# 2005-12-07 -- Bug 225221 >+$dbh->bz_add_column('longdescs', 'comment_id', >+ {TYPE => 'MEDIUMSERIAL', NOTNULL => 1, PRIMARYKEY => 1}); >+ Oh, except, wait -- you put this in the wrong place. The new Dev Guide explains where this is *supposed* to go. Look for the last occurence of "--TABLE--".
Attachment #205349 - Flags: review+ → review-
Assignee: chicks → altlst
Status: ASSIGNED → NEW
Attached patch cvs tip patch v3Splinter Review
Ok, moved the checksetup code in the right place. Thanks.
Attachment #205349 - Attachment is obsolete: true
Attachment #205401 - Flags: review?(mkanat)
Comment on attachment 205401 [details] [diff] [review] cvs tip patch v3 Yes, good. :-) I'm tired of this bug being open. :-)
Attachment #205401 - Flags: review?(mkanat) → review+
Status: NEW → ASSIGNED
Flags: approval?
Flags: approval? → approval+
Checking in checksetup.pl; /cvsroot/mozilla/webtools/bugzilla/checksetup.pl,v <-- checksetup.pl new revision: 1.471; previous revision: 1.470 done Checking in Bugzilla/DB/Schema.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v <-- Schema.pm new revision: 1.47; previous revision: 1.46 done
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Blocks: 328063
This might have caused bug 328063 regression.
Keywords: relnote
(In reply to comment #51) > (From update of attachment 205349 [details] [diff] [review] [edit]) > Sure, I agree. > > We can add a comment_num field later, if we want a per-bug comment_id. I need a per-bug comment_id to produce a Version Description Document in Scmbug by querying the Bugzilla database. I would need to list for a specific bug the comments that were added between a date range. So I would need to report something like: Bug 1, comment 65: Bug 1, comment 68: etc. Without per-bug comment ids, this will be very difficult to do.
test
test
Okay, this doesn't *actually* need a relnote, now that I think of it. I was just so excited that it was fixed that I added the relnote keyword.
Keywords: relnote
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: