Open Bug 277796 Opened 20 years ago Updated 11 years ago

Profiles.userid should be profiles.id

Categories

(Bugzilla :: Database, enhancement)

enhancement
Not set
normal

Tracking

()

People

(Reporter: shane.h.w.travis, Unassigned)

Details

In a database full of inconsistencies, this column is in a class of its own. 

1) The majority of tables with an 'id' field in the database these days now use 
simply that as a column name. To wit: classifications.id, components.id, 
flags.id, flagtypes.id, groups.id, keyworddefs.id, products.id, 
series_categories.id, whine_events.id, whine_queries.id, whine_schedules.id

2) The remaining tables with 'id' in a column name are: attachments.attach_id, 
bugs.bug_id, fielddefs.fieldid, profiles.userid, quips.quipid, series.series_id.

Note that even in category two, profiles stands out as an anomaly. All others 
have some variation of 'table.table_id'... not so 'profiles', which 
uses 'userid' as a column name in complete defiance of even the minimal 
conventions of the rest of the schema. I propose to make a patch to change it, 
and to take one step at fixing up the schema to be consistent throughout. This 
bug is about the discussion of that patch - how it should be done, and even 
*whether* it should be done.

First of all... should it be done. I strongly believe that it should, because I 
believe that it is a Good Thing for the database to be consistent throughout. 
It will, however, break existing customizations... and unlike the discussion on 
bug 155628, I don't think that there's any 'aliasing' that would work to get 
around that. In my opinion, however the current hodge-podge is a sign of poor 
guidance and reflects poorly on the tool as a whole. 

Secondly, if this is going to be a first step in making the database 
consistent, it needs to be decided what that consistency is going to *be*. I'm 
ambivalent on whether the overall design should include columns named foo.id or 
foo.foo_id, but the fact that both exist bothers me. I personally like 
foo.foo_id better, but not so much that I'd fight for it if the decision was 
otherwise. Looking at the most recent additions to the database, they are 
almost all of the type foo.id (except for quips, which somehow snuck past with 
foo.fooid), and they outnumber the foo.foo_id category by 11 to 6, so it should 
be less work to change everything (eventually) to foo.id than foo.foo_id. Of 
course we can all speak up and argue one way or the other, but at the end of 
the day, if we're going to go ahead with this, then there is going to need to 
be a Command-Level Decision on How Things Should Be from here on in, because 
then we won't be making any *new* mistakes.

I'm not going to start this today, or even this week. I'd like to get it into 
2.22, but even that may be too optomistic. Nonetheless, the time for discussion 
on the issue is sooner rather than later, so that when the time DOES come that 
it can be done and checked in, there are no more objections.
Per DB normalization standards that I've been taught, fields should be named
dbname_fieldname.  This prevents any collision of field names.  So, in this
case, it would be named profiles.profiles_id.  Better yet, the table should be
renamed to profile and then it could be called profile.profile_id.
Nice comments, Shane - I agree with you wholeheartedly.  A tremendous amount of
clarity, however, can be gained by normalizing the field names.  That way, you
know which ID you're comparing in the select.  While a little ease in coding is
gained by having "id" as the field name, a lot more clarity is gained by having
to do a little more typing.
(In reply to comment #1)
> case, it would be named profiles.profiles_id.  Better yet, the table should be
> renamed to profile and then it could be called profile.profile_id.

I disagree. Table names are all plural and I think it is fine like that. Keep
plural for table names and singular for columns. This renders the code even
easier to read IMO (=> profiles.profile_id, bugs.bug_id, quips.quip_id, etc...).
(In reply to comment #3)
> (In reply to comment #1)
> > case, it would be named profiles.profiles_id.  Better yet, the table should be
> > renamed to profile and then it could be called profile.profile_id.
> 
> I disagree. Table names are all plural and I think it is fine like that. Keep
> plural for table names and singular for columns. This renders the code even
> easier to read IMO (=> profiles.profile_id, bugs.bug_id, quips.quip_id, etc...).

Sounds great to me as long as we agree on a standard that is implemented DB-wide.

I'd like to weigh in for the camp that wants to see tables.id instead of
tables.table_id.  The reason for this is several:

(1) If you've see a .id column you know several things about it naturally: (a)
its a primary key, (b) its almost certainly autogenerated by the database, (c)
its not null, etc.  A .blah_id column could be a foreign key or a primary key,
so it lacks the clarity of a .id column without further context such as the
position in the table or the table name.

(2) Unless you've aliased table names as part of the query you have "bugs.id =
table.bugid" which makes it very clear that the .id is a bug ID.  Even with
aliases you'd have "b.id = c.bugid" which since you've defined b and c shouldn't
be confusing in context.

(3) Any tools for manipulating the database can assume that the .id column is a
primary key.

So basically, it seems to me to be a handy convention that rarely impacts
clarity and in fact more clarity is provided when you can tell which column in a
link is the PK and which is the FK.

Regardless of which direction the consensus ultimately lies, having it be
consistant is the most important thing and I applaud Sean's effort to improve
the consistancy of the bugzilla database schema.
Explicitly adding a few more people to the cc: list whom I know think/care 
about this sort of thing. Yeah, I know it's like herding cats to get everyone 
to agree, but I'm hoping that we can generate *enough* agreement that Dave has 
a fairly large consensus when he decides on the direction for the future.

'Generating a standard that is implemented DB-wide' is what this bug is all 
about, ultimately, so if you've got input on DB schema... please feel free to 
add your two cents worth. Logic and justification is better than just 'I want 
this' ... but something is better than nothing.
(In reply to comment #5)
> I applaud Sean's effort to improve
> the consistancy of the bugzilla database schema.

Oh sure, I do all the skullsweat, and some guy named Sean gets all the credit. 

;-)
(In reply to comment #7)
> (In reply to comment #5)
> > I applaud Sean's effort to improve
> > the consistancy of the bugzilla database schema.
> 
> Oh sure, I do all the skullsweat, and some guy named Sean gets all the credit. 
> 
> ;-)

Its just my sort of luck that I would show my ghastly ineptitude at remembering
names in the midst of discussing naming conventions.  So I'll applaud Shane for
being so poignant in many regards.  :)
This is the kind of thing that I would really love to see happen. It's also the
kind of thing that I think we should keep on the backburner and get it in the
queue when we decide we're ready to call it 3.0.  I'm willing to make major
breaking changes like this when we do a full version number bump.  Such as
fixing all of the primary key columns at once to be consistant all the way
through isntead of having two different rules (either id or table_id).

We have a number of major things coming down the pike soon that'll make it worth
doing that (hack-free packaging capability for distros, multiple database
support, etc).  I have a feeling the cleanest way to make checksetup.pl work
with multiple databases is to start from scratch and pretend it's the first ever
installation when we do 3.0.  We can stash the old checksetup.pl (the database
upgrade part) in a subdirectory of utilities somewhere for people to use when
they upgrade from a version older than 3.0 with MySQL, stick pgsetup.pl along
side it for people upgrading from RH's branch.  Both can update from 2.x to 3.0
and then tell them to run the new Makefile.PL or whatever we call it.  :)
(In reply to comment #9)
> This is the kind of thing that I would really love to see happen.

Good to hear.

> It's also the kind of thing that 
> I think we should keep on the backburner and get it in the
> queue when we decide we're ready to call it 3.0.

Not so good to hear, even though I feel some sympathy for your position and can 
mostly understand your logic.

I feel that the problems with this approach, however, are as follows;

#1: Delay. I'm the one bringing this issue up because I'm willing to work on 
it. I am certain that I'll still be interested in improving Bugzilla tomorrow, 
and that I'll have time to do so. I'm fairly certain that I'll have the time 
and interest to do so when the 2.22 branch opens. the further this gets 
from 'today', however, the hazier this picture gets. Perhaps if I'm not 
around/willing/able to do it then, someone else will pop up with the same 
enthusiasm... but perhaps not.

#2: Size. One cannot swallow a watermelon whole; it is necessary to take bites. 
Even in my short time here, I've seen good ideas snuffed out because they 
didn't do EVERYTHING that needed doing, which is what you seem to be proposing; 
we can't fix *this* table now because it doesn't fix ALL tables now. It seems 
to me that fixing just this one table and doing it right is going to be a big 
enough change already; requiring that whoever do it also take on the burden of 
fixing every other table before they can proceed seems to be destined to kill 
the effort... again.

#3: Paradox. You say that you're willing to allow sweeping, localization-
breaking changes when we uprev to 3.0, but (I'm guessing) there's no reason to 
uprev to 3.0 unless we have some sweeping, localization-breaking changes that 
need to go in. Chicken and egg situation.


If there is going to be a 3.0, then a decision needs to be made at some point 
that 2.xx will be *the last* 2.xx version. From there until the next release -- 
which will be 3.0 -- changes can be as radical and schema-breaking and 
localization-breaking as it needs to be to GET IT DONE RIGHT; rename the 
tables, rename the fields, reorder the directories, fix all the variables, etc. 
That decision will create a lot of work and a pressing need for a lot of 
attention in a LOT of areas; you've already made some mention of the size of 
the issue in your previous comment. I would place money down now that it cannot 
be done on the currently projected six-month release schedule (although I'd be 
pleased to be proven wrong). 

If that's really your stand -- that this bug cannot be resolved until 3.0 when 
every other table is fixed at the same time -- then please set the milestone 
appropriately. Regardless, I think that it's still worth the discussion on how 
it should be done on that day (by whoever is still around at the time) so that 
there is at least *one* less thing that needs to be ironed out and resolved. 
3.0 is going to be busy enough that anything that can be decided ahead of time 
should be.
Tables should be named singularly. It makes a lot of logic dealing with
perl-generated SQL a LOT easier (see bug 17453, where we already had this
discussion, and the patch proves it).

table.id is better -- table.table_id will not avoid collisions, it will just
make them more confusing. We would have, for example, profile_id in multiple
tables. In some places it would be a foreign key, in some places it would be a
primary key. We already have a bug_id in multiple tables, I believe.

(In reply to comment #9)
>I have a feeling the cleanest way to make checksetup.pl work
> with multiple databases is to start from scratch and pretend it's the first ever
> installation when we do 3.0. 

  Actually, I've actually been looking at it, and making checksetup cross-DB
compatible won't be impossible at all. See bug 277617, the blockers on bug
248175, and the discussion starting at bug 237862 comment 55. 

  In any case, I'd rather not throw away 5000 lines of working code. :-)
(In reply to comment #9)

While I respect the logic at making this a 3.0 type change I disagree.  While
this change will certainly need some extensive shaking out, waiting to do so in
a "this is a big release" sort of cycle seems the worst time to get over the
hump of that shaking out.

However, given the barnacled mass that checksetup.pl has become dumping it at a
3.0 release would make a tremendous amount of sense.  Until a 3.0 cycle, keep
accumulating barnacles in checksetup.pl to fix the various things that need to
be fixed incrementally.  Until the obvious things are fixed and cleaned up its
going to be hard to get down to serious discussions about what's wrong or right
about the database.

The only thing that I would want to be a blocker for progress on database schema
cleanup is a mostly complete Perl API that could be used to easily construct a
test suite on top of.  With a test suite that had adequate coverage these
changes could be made and vetted thoroughly before victimizing anyone to
actually test it by hand.  (This would also make it easy to create a
statiscically adjustable random Bug creator for testing purposes.)
(In reply to comment #12)
> However, given the barnacled mass that checksetup.pl has become dumping it at a
> 3.0 release would make a tremendous amount of sense. 

  Actually, I think that checksetup can be refactored quite nicely. But that's
probably a discussion for somewhere else than this bug. :-) (But see my above
comment.)
(In reply to comment #13)
> (In reply to comment #12)
> > However, given the barnacled mass that checksetup.pl has become dumping it at a
> > 3.0 release would make a tremendous amount of sense. 
> 
>   Actually, I think that checksetup can be refactored quite nicely. But that's
> probably a discussion for somewhere else than this bug. :-) (But see my above
> comment.)

I wonder if Dave's idea to rewrite checksetup.pl doesn't make total sense. 
After all, we're starting at a vastly different rev. and a 2.xx -> 3.0 script
can be provided without naming it checksetup.pl.  It seems to me that 3.0 ought
to have its own (new) checksetup.pl for the simple reason that admins
maintaining an existing DB shouldn't need the 2.xx checks/upgrades and people
putting 3.0 over top of 2.xx should know that an upgrade program is provided
that should only be run once.  Heck, we might even want to have checksetup.pl
call the external upgrade program if a file like .bugzilla_version_id doesn't
exist or if the version isn't right, or if a command line parameter is specified
to force a DB check is given.  The .buzilla_version_id file should have comments
in it not to hand-modify the file.

Also, I wonder if functions shouldn't be created for each version upgrade so
that readers can see what DB changes were made when, and more importantly, to
insure that the changes happen in proper order.  It would also make it easier
for customizers to see where to put their code.

Maybe I'm just rambling, but these are just a few thoughts to help us get there...

Back on to the topic at hand, even though it is very convenient to have just
.id, if our foreign keys are named tablename_id, what benefit does it serve?  I
still believe we should name the primary key tablename_id to keep it consistent.
 Regardless, what ever the group decides, I'm willing to go along with.
Putting it off until 3.0 doesn't mean putting it off forever.  The criteria for
what I want to be able to call it 3.0 is stuff that's within reach right now. 
Most of these changes I've been talking about have had a lot of activity on them
lately, and are showing signs of actually landing soon.  It would be good to
take advantage of it.  (That, and we've been 2.x for almost 7 years, which is a
long time for a software project that gains features as fast as we do.
QA Contact: mattyt-bugzilla → default-qa
Assignee: shane.h.w.travis → general
Severity: normal → enhancement
Summary: Profiles.userid should be either profiles.profile_id or profiles.id → Profiles.userid should be profiles.id
Assignee: general → database
Component: Bugzilla-General → Database
You need to log in before you can comment on or make changes to this bug.