Open Bug 535821 Opened 15 years ago Updated 12 years ago

All tables should have a Primary Key

Categories

(Bugzilla :: Database, defect)

defect
Not set
normal

Tracking

()

ASSIGNED

People

(Reporter: mockodin, Assigned: mockodin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [wanted-bmo])

To prevent table scans all tables should include a primary key. For MSSQL this reduces table lock pauses when scanning large tables, such as bugs_activity. 

PKeyless tables or tables with no clustered index (pkey is always clustered), incur heavy performance penalties.
MySQL uses the first unique index as a PK if there is no PK. And for tables that lack one, InnoDB (and PostgreSQL, I believe) have an internal row pointer to be the clustering index.

If MS-SQL doesn't do the same, I'd suggest that the driver be modified to do something about it, probably in bz_setup_database.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Rather than marking WONTFIX this should, if not being rolled in to the primary Bugzilla DB schema be marked MSSQL specific. I can take care of making MSSQL create its own primary key if one does not otherwise exist. Though I have to ask, what is the downside of having a primary key? None that I can think of, least none that isn't common and easy to deal with. At least with MSSQL the internal row pointer you refer to is exactly what I wish to avoid. Looking at a cost estimate on query without one the row scan caused by the lack of accounts usually for 50-50% of the query cost compared to other tables in a join that have the cluster/primary indexes.
Summary: All tables should have a Primary Key → [MSSQL] All tables should have a Primary Key
should read 50-80%
(In reply to comment #2)
> Rather than marking WONTFIX this should, if not being rolled in to the primary
> Bugzilla DB schema be marked MSSQL specific.

  Okay.

> Though I have to ask, what is the downside of having a primary key?

  Purposeless data that most databases don't need.

  If you think that there are some tables that, in Bugzilla's architecture, would benefit from (in a logical sense) having a Primary Key, than that's something else to discuss entirely.

> Looking at a
> cost estimate on query without one the row scan caused by the lack of accounts
> usually for 50-50% of the query cost compared to other tables in a join that
> have the cluster/primary indexes.

  That's a "feature" that I suspect is unique to MS-SQL, though you're welcome to test on the other DBs that we support.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Severity: normal → enhancement
(In reply to comment #4)
>   Purposeless data that most databases don't need.

Perhaps, though not in mssql's case apparently :-)

>   If you think that there are some tables that, in Bugzilla's architecture,
> would benefit from (in a logical sense) having a Primary Key, than that's
> something else to discuss entirely.

I'd rather take the cheap way out and apply to all tables that have no key, this prevents the need to changes in the future. The foot print of a identity column is fairly small and space is cheap to begin with.

My currently implemented fix for mssql overrides _adjust_schema doing two things:

1. Allows overriding of all data types regardless of precision , for example
     VARCHAR(64)  => NVARCHAR(64)
Using the default db_specific overide would require explicitly specifying 'VARCHAR(64)' => NVARCHAR(64)' instead of just 'VARCHAR' => 'NVARCHAR'

2. Prior to converting datatypes I prescan the tables field defs for a primary key if none is found then a new pk is pushed in to the field defs for the table
${table}_id this is done before the data conversion so that if this ever gets placed in the default _schema_adjust it should be good to go.
Assignee: database → mockodin
Status: REOPENED → ASSIGNED
Blocks: bz-mssql
We have also run into similar issue, where we are trying to make Bugzilla part of MySQL Replication, and we noticed there are many tables without a Primary Key.

I.e.

We are trying to do the following:

1.  We have MySQL (5.5) Database setup with master and a slave database on another host.
2.  We also have Bugzilla application configured (Version 3.0.4) on the MySQL instance.

One issue which we have encountered with MySQL Replication is on the Slave host, the CPU continues to spin when it encounters a large INSERT/DELETE issued on a table without a Primary Key.

3.  For tables installed by Bugzilla application, we noted there are many tables without a Primary Key.
4.  But we would like the Bugzilla database to be part of a replication.

5.  However we do not want to manually add Primary Keys to the tables created by the application as we do not know what impact it will have on its functionality.


1.  Has anyone implemented above successfully? 
2.  Or have any suggestions/thoughts on possible workarounds?

Any feedback will be very helpful.
Whiteboard: [wanted-bmo]
this isn't a ms-sql specific issue as it impacts my-sql replication.
Severity: enhancement → normal
Summary: [MSSQL] All tables should have a Primary Key → All tables should have a Primary Key
Is there any update to this Bug?   Or any temporary resolution?
We are trying to use Bugzilla (Version 3.0.4) with MySQL Replication, does anyone know whether if new version of Bugzilla addresses this issue?
As far as the new version I don't know, likely suspect it hasn't. For MSSQL yes I'd implemented it, though I haven't maintained the instance at my office in a over a year but it had been running under MSSQL successfully with this issue is resolved. I would note that despite my MSSQL code changes never being rolled into the main release at the point I created them. They worked. My lack of continued effort was directly related to the lack of interest by the pervious lead developers in-a-tangible-support-with-out-pulling-teeth kind of way. I have since moved on.

I can look around and see if I have a copy of the code I had used. That said its not difficult to implement. Take a look at Schema.pm any table not listing a primary could have a serial field added and set as primary/clustered index. Or with extra effort analyze the fields in the existing tables some of them can have a clustered index implemented on a combination of fields. If you follow the later you'll still find most tables need the former solution.
(In reply to Michael Thomas (Mockodin) from comment #10)
> 
> I can look around and see if I have a copy of the code I had used. That said
> its not difficult to implement. Take a look at Schema.pm any table not
> listing a primary could have a serial field added and set as
> primary/clustered index. 

Michael,  thank you for the update.

1.  Are you referring to the copy of code for schema.pm file?  Can you provide the copy of the code if possible for us to at least take a look at?  Thanks
Depends on: 764457
Depends on: 764466
You need to log in before you can comment on or make changes to this bug.