Bugzilla uses "mediumtext" as a DB data type when it's not necessary

RESOLVED FIXED in Bugzilla 3.2

Status

()

RESOLVED FIXED
17 years ago
5 years ago

People

(Reporter: bbaetz, Assigned: xiaoou.wu)

Tracking

(Blocks: 2 bugs)

2.17
Bugzilla 3.2
Dependency tree / graph
Bug Flags:
approval +

Details

Attachments

(1 attachment, 8 obsolete attachments)

(Reporter)

Description

17 years ago
mediumtext may be up to 2^24-1 characters, or just under 16 megs (justdaves
comment in bug 96431 about this is incorrect)

The following fields may be up to 16 megs in length:
  - The description of an attachment
  - The mimetype for an attachment
  - The filename for an attachment
  (These are all longer than the size for the attachment, see below)
  - The description for an attachment status
  - the short_desc for a bug (bug 96431)
  - the status whiteboard for a bug
  - the keyword cache for a bug
  - text for a comment (this is probably ok)
  - description for a component.
  - description for a product
  - disabledtext for a user
  - email flags for a user (covered under bug 73665)
  - stored named query
  - description for fielddefs
  - description for a keyword
  - command for a shadowlog (ignore this)

- An attachment is longblob, or 4 Gigs.

The mysql client/server protocol limits packet sizes to 16 megs, and the default
is 1 meg. This makes longblob totally useless, and mediumtext semi-useless for
most things (the actual attachment/comments being an exception, since those have
separate limits)

We need to have limits checked in the code for all this stuff, and then change
the schema.
What are the recommended MySQL types for text strings of various lengths?

Gerv
(Reporter)

Comment 2

16 years ago
VARCHAR(n) where n is the maximum size of the string
Also, Sybase won't let you ORDER BY its equivalent to mediumtext, because you
can't index blobs (and its closest equivalent to mediumtext is a blob type),
which means I had to make schema changes to get ordering to work properly on
group descriptions.
Blocks: 173130

Comment 4

16 years ago
This is what I'm thinking as of right now for new field lenghts using varchar():
    #                Table           Field               New Max Length
    my @fields = ( ['attachments',  'description',       128],
                   ['attachments',  'mimetype',          64],
                   ['bugs',         'short_desc',        255],
                   ['bugs',         'status_whiteboard', 255],
                   ['bugs',         'keywords',          255],
                   ['components',   'descritpion',       255],
                   ['products',     'description',       255],
                   ['profiles',     'disabledtext',      512],
                   ['namedqueries', 'query',             255],
                   ['fielddefs',    'description',       32],
                   ['keyworddefs',  'description',       255] );
 
Does that seem sensible to everyone?
Assignee: justdave → jake
what about groups.description?

Comment 6

16 years ago
I was just using Bradley's list which is limited to those specifically using
mediumtext.  Should we fix all 'text' types?
(Reporter)

Comment 7

16 years ago
mysql won't let you do ORDER BY either, really - it just pretends with the first
1024 bytes (changable via a server config option)

Why have you used 255, but then 512? We should check the fencpost for an extra
byte on teh various dbs, and then use that. 
For that matter, why use "255" instead of "250"?  Seems like the logical
requirements of the fields should dictate their size, not the technical
capabilities of the database.
reassigning to default owner/QA... Jake's Army Reserve unit has been deployed.
Assignee: jake → justdave
Reassigning bugs that I'm not actively working on to the default component owner
in order to try to make some sanity out of my personal buglist.  This doesn't
mean the bug isn't being dealt with, just that I'm not the one doing it.  If you
are dealing with this bug, please assign it to yourself.
Assignee: justdave → general
QA Contact: mattyt-bugzilla → default-qa

Updated

14 years ago
OS: Linux → All
Hardware: PC → All
Target Milestone: --- → Bugzilla 2.22

Comment 12

14 years ago
FWIW, I'd like to split this up into multiple bugs, one for each field that
needs to be changed. Or at least, one for each table containing fields that need
to be changed.

Updated

13 years ago
Blocks: 310718

Updated

13 years ago
Target Milestone: Bugzilla 2.22 → Bugzilla 2.24

Comment 13

12 years ago
This bug is retargetted to Bugzilla 3.2 for one of the following reasons:

- it has no assignee (except the default one)
- we don't expect someone to fix it in the next two weeks (i.e. before we freeze the trunk to prepare Bugzilla 3.0 RC1)
- it's not a blocker

If you are working on this bug and you think you will be able to submit a patch in the next two weeks, retarget this bug to 3.0.

If this bug is something you would like to see implemented in 3.0 but you are not a developer or you don't think you will be able to fix this bug yourself in the next two weeks, please *do not* retarget this bug.

If you think this bug should absolutely be fixed before we release 3.0, either ask on IRC or use the "blocking3.0 flag".
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2
(Assignee)

Comment 14

12 years ago
Is it possible for us to break the MEDIUMTEXT into two types, one's length > 4000, and the other's length < 4000 ?

So we can use varchar(4000) instead of LOBs(which need a special way to do with in Oracle) in some case when the fields not really need a max length of 65535.



Oracle doens't support:
1. Using a CLOB column in a SELECT... DISTINCT or SELECT...
UNIQUE statement.

For example:

line #446 in  Bugzilla::User  " $query = 'SELECT DISTINCT id, name, description FROM groups';"


2. Comparing CLOB with String.
For example:

line #647 in Bugzilla::Field

$dbh->selectrow_array('SELECT id, name FROM fielddefs
                               WHERE description = ?',
                             undef, $field_description);


3 inserting stuffs as normal varchar.





Comment 15

12 years ago
(In reply to comment #14)
> Is it possible for us to break the MEDIUMTEXT into two types, one's length >
> 4000, and the other's length < 4000 ?

  Yeah, I think this is a good idea. I think we should have only two text types, one called TEXT and the other called LONGTEXT. TEXT can be 4000 characters, and LONGTEXT can be many MB.

  Instead of TINYTEXT, we should probably just use varchar(255) directly. We don't gain much in Bugzilla from using TINYTEXT.

  We'll have to fix the schema generally, for this. Also, for converting fields to the "new" TEXT types, we should file one bug at a time, for each field. (Unless there are several that it makes sense to group together.)

  Also, we should up the SCHEMA_VERSION to 2.1 after we do this, because we'll have to convert old schemas when we load them. (That's done by deserialize_schema.)
(Assignee)

Comment 16

11 years ago
Created attachment 287478 [details] [diff] [review]
Bugzilla::DB::Schema patch

Change type TINYTEXT to varchar(255), MEDIUMTEXT to TEXT ( or LONGTEXT for those require a length>4000).
Attachment #287478 - Flags: review?(mkanat)

Updated

11 years ago
Attachment #107319 - Attachment is obsolete: true

Updated

11 years ago
Assignee: general → xiaoou.wu
Component: Bugzilla-General → Database
Summary: mediumtext abuse → Bugzilla uses "mediumtext" as a DB data type when it's not necessary

Comment 17

11 years ago
Comment on attachment 287478 [details] [diff] [review]
Bugzilla::DB::Schema patch

  I think MEDIUMTEXT is actually an OK name for the 4000-character string. So we should keep that.

  But anything longer than 4000 characters should be converted to LONGTEXT.

  This means that you have to add the LONGTEXT type to the Pg and MySQL drivers.

  Every change will require a modification to Bugzilla::Install::DB. Before changing the type you'll have to check that nothing in the DB currently exceeds the length, for the fields that you actually change the length of. (For example, if you change a MEDIUMTEXT to a varchar(255).)

  If you make a change in the underlying data types for MySQL or Pg, then you may have to add some code to bz_setup_database in the Pg and MySQL drivers, as required.

>@@ -372,8 +372,8 @@
>                              PRIMARYKEY => 1},
>             bug_id       => {TYPE => 'INT3', NOTNULL => 1},
>             creation_ts  => {TYPE => 'DATETIME', NOTNULL => 1},
>-            description  => {TYPE => 'MEDIUMTEXT', NOTNULL => 1},
>-            mimetype     => {TYPE => 'MEDIUMTEXT', NOTNULL => 1},
>+            description  => {TYPE => 'TEXT', NOTNULL => 1},

  This should just be varchar(255). There's no reason to have an attachment description longer than 255 characters.

>+            mimetype     => {TYPE => 'TEXT', NOTNULL => 1},

  This should be varchar(128). I can't imagine a mimetype longer than that.

>@@ -525,7 +525,7 @@
>                             DEFAULT => FIELD_TYPE_UNKNOWN},
>             custom      => {TYPE => 'BOOLEAN', NOTNULL => 1,
>                             DEFAULT => 'FALSE'},
>-            description => {TYPE => 'MEDIUMTEXT', NOTNULL => 1},
>+            description => {TYPE => 'TEXT', NOTNULL => 1},

  That can be varchar(128).

>@@ -779,7 +779,7 @@
>                                             COLUMN => 'userid',
>                                             DELETE => 'CASCADE'}},
>             name         => {TYPE => 'varchar(64)', NOTNULL => 1},
>-            query        => {TYPE => 'MEDIUMTEXT', NOTNULL => 1},
>+            query        => {TYPE => 'TEXT', NOTNULL => 1},

  There *is* a chance of that being longer than 4000 characters. So I guess that will have to be LONGTEXT.

>@@ -1065,7 +1065,7 @@
>             name        => {TYPE => 'varchar(64)', NOTNULL => 1},
>             frequency   => {TYPE => 'INT2', NOTNULL => 1},
>             last_viewed => {TYPE => 'DATETIME'},
>-            query       => {TYPE => 'MEDIUMTEXT', NOTNULL => 1},
>+            query       => {TYPE => 'TEXT', NOTNULL => 1},

  Same there.

>@@ -1155,7 +1155,7 @@
>                                             COLUMN => 'userid',
>                                             DELETE => 'CASCADE'}},
>             subject      => {TYPE => 'varchar(128)'},
>-            body         => {TYPE => 'MEDIUMTEXT'},
>+            body         => {TYPE => 'TEXT'},

  And I think that should be LONGTEXT too.
Attachment #287478 - Flags: review?(mkanat) → review-

Comment 18

11 years ago
Comment on attachment 287478 [details] [diff] [review]
Bugzilla::DB::Schema patch

Oh, and you also need to update the POD in Schema.pm about the data types, to note that MEDIUMTEXT is a maximum of 4000 characters on the most limited DB, and LONGTEXT is up to 16MB.
(Assignee)

Comment 19

11 years ago
Created attachment 287954 [details] [diff] [review]
mediumtext patch
Attachment #287954 - Flags: review?(mkanat)
(Assignee)

Updated

11 years ago
Attachment #287954 - Attachment is obsolete: true
Attachment #287954 - Flags: review?(mkanat)
(Assignee)

Comment 20

11 years ago
Created attachment 287957 [details] [diff] [review]
mediumtext patch
Attachment #287957 - Flags: review?(mkanat)

Comment 21

11 years ago
Comment on attachment 287957 [details] [diff] [review]
mediumtext patch

>@@ -1748,7 +1749,7 @@
>-    push(@statements, "ALTER TABLE $table ADD COLUMN $column " .
>+    push(@statements, "ALTER TABLE $table". ADD_COLUMN ." $column " .

 This has nothing to do with this patch.

  Now, you have to add code to Bugzilla::Install::DB that modifies all of these fields, and makes sure that their content isn't longer than their new limit. If the content is longer than the new limit, you are allowed to die and inform the administrator.
Attachment #287957 - Flags: review?(mkanat) → review-

Updated

11 years ago
Blocks: 310717
(Assignee)

Comment 22

11 years ago
Created attachment 290812 [details] [diff] [review]
mediumtext.diff
Attachment #290812 - Flags: review?(mkanat)

Comment 23

11 years ago
Comment on attachment 290812 [details] [diff] [review]
mediumtext.diff

>+    # 2007-11-29 xiaoou.wu@oracle.com - Bug 153129
>+    $dbh->bz_alter_column('bugs', 'bug_file_loc',
>+        { TYPE => 'MEDIUMTEXT'}, '');
> [snip]

  Okay, I want this all to go into a subroutine at the bottom. It can be called _change_text_types.

  *All* fields *must* be checked *before* being modified. So, the easiest thing to do is to make a hash of all the new fields and call _check_content_length_before_upgrade. The length needs to be the length that's used by the DB, not some number you type in inside of Bugzilla::Install::DB.

  Finally, you don't need the fourth argument to bz_alter_column. You only need that when you're setting something NOT NULL that used to be NULL.

>+sub _check_length_of_content {
>+    my ($table_name, $field_name, $length) = @_;
>+    my $dbh = Bugzilla->dbh;
>+    my $content_lengths = $dbh->selectcol_arrayref("SELECT LENGTH($field_name)
>+                                                    FROM $table_name");
>+    foreach my $content_length ( @$content_lengths ){
>+        if ($content_length > $length) {
>+            die("The content of field <$field_name> in table <$table_name>\n" 
>+                . "is longer than its new limit, <$length> characters.\n"
>+                . "Please make sure that it would not happen, or the upgrade"
>+                . " will never go on.");

  You need to show the content that is longer. Show every single string that's too long. If they're longer than 80 characters, cut them off with a "..." *in the middle*, so both the beginning and end show.>--- 

../cvs/bugzilla/checksetup.pl	2007-10-19 14:46:10.000000000 +0800
>+Bugzilla::Install::DB::check_content_length_before_upgrade();

  No, this should just be a part of Install::DB. I don't want checksetup to check this every time it runs, so wrap it in an "if" like other schema changes in Bugzilla::Install::DB. If you read the file, how we do this will be very obvious.
Attachment #290812 - Flags: review?(mkanat) → review-
(Assignee)

Comment 24

11 years ago
(In reply to comment #23)
How can I get the size of specific type like TINYTEXT, MEDIUMTEXT, LONGTEXT ?  And I think we can just only check those fields that are changed shorter, do you think so?

Comment 25

11 years ago
(In reply to comment #24)
> (In reply to comment #23)
> How can I get the size of specific type like TINYTEXT, MEDIUMTEXT, LONGTEXT ? 

  You'll have to figure that out.

> And I think we can just only check those fields that are changed shorter, do
> you think so?

  Yes, we only need to check the ones that we change to a shorter type. Since in PostgreSQL they're all the same size, we only need to check their sizes in MySQL. So you could just check their sizes in Bugzilla::DB::MySQL::bz_setup_database, if you want. Only do it if their type is the old type, though.
Status: NEW → ASSIGNED
(Assignee)

Comment 26

11 years ago
Created attachment 291368 [details] [diff] [review]
mediumtext patch
Attachment #291368 - Flags: review?(mkanat)

Comment 27

11 years ago
Comment on attachment 291368 [details] [diff] [review]
mediumtext patch

>diff -ru ../cvs/bugzilla/Bugzilla/Install/DB.pm Bugzilla/Install/DB.pm
>@@ -522,6 +522,10 @@
>     Bugzilla::Hook::process('install-update_db');
> 
>     $dbh->bz_setup_foreign_keys();
>+    # Check length of the field content before upgrade.
>+    # 2007-11-29 xiaoou.wu@oracle.com - Bug 153129
>+    check_content_length_before_upgrade();

  No, add it above the --TABLE-- part, just like the very large, obvious comment says to.

  Also, call it change_text_types() instead.

>@@ -2898,6 +2902,72 @@
>+sub check_content_length_before_upgrade {
>+    my $dbh = Bugzilla->dbh; 
>+    return if $dbh->isa('Bugzilla::DB::Oracle'); 

  No. Instead, do:

  return if $dbh->bz_column_info('longdescs', 'thetext')->{TYPE} eq 'MEDIUMTEXT';

>diff -ru ../cvs/bugzilla/Bugzilla/Constants.pm Bugzilla/Constants.pm
>+    MAX_TINYTEXT_SIZE
>+    MAX_MEDIUMTEXT_SIZE
>+    MAX_LONGTEXT_SIZE

  These constants should be DB-specific instead. They can have defaults in Bugzilla::DB, and then they can be overridden in DB drivers as necessary. For example, MySQL has a very long MEDIUMTEXT length.
Attachment #291368 - Flags: review?(mkanat) → review-

Updated

11 years ago
Attachment #287957 - Attachment is obsolete: true

Updated

11 years ago
Attachment #290812 - Attachment is obsolete: true

Updated

11 years ago
Attachment #287478 - Attachment is obsolete: true
(Assignee)

Comment 28

11 years ago
Created attachment 291577 [details] [diff] [review]
mediumtext patch
Attachment #291368 - Attachment is obsolete: true
Attachment #291577 - Flags: review?(mkanat)

Comment 29

11 years ago
Comment on attachment 291577 [details] [diff] [review]
mediumtext patch

>diff -ru ../cvs/bugzilla/Bugzilla/Install/DB.pm Bugzilla/Install/DB.pm
>@@ -514,6 +514,9 @@
> 
>     # 2007-08-21 wurblzap@gmail.com - Bug 365378
>     _make_lang_setting_dynamic();
>+    
>+    # 2007-11-29 xiaoou.wu@oracle.com - Bug 153129
>+    change_text_types();
> 
>     ################################################################
>     # New --TABLE-- changes should go *** A B O V E *** this point #
>@@ -2898,6 +2901,72 @@
>     }
> }
> 
>+sub change_text_types {
>+    my $dbh = Bugzilla->dbh; 
>+    return if $dbh->bz_column_info('longdescs', 'thetext')->{TYPE} eq 
>+              'MEDIUMTEXT';

  Oh, my mistake. I realized that you change this above, so this will always be true, at this point. You'll have to pick something else, like checking the type of attachments.description or something.

>+WARNING: Some of your fields have content longer than its new limit $max_length
> [snip]

  This string needs to be localizeable, so it should be using install_string from Bugzilla::Install::Util.

>+            }else {

  Nit: } else {

>diff -ru ../cvs/bugzilla/Bugzilla/DB.pm Bugzilla/DB.pm
>+use constant MAX_TINYTEXT_SIZE => 255;

  I just realized, we only need this one, right? So that's my mistake, you can just put this directly as a number into Bugzilla::Install::DB, like you had it originally, and you don't need to check against the type.
Attachment #291577 - Flags: review?(mkanat) → review-
(Assignee)

Comment 30

11 years ago
(> >+sub change_text_types {
> >+    my $dbh = Bugzilla->dbh; 
> >+    return if $dbh->bz_column_info('longdescs', 'thetext')->{TYPE} eq 
> >+              'MEDIUMTEXT';
>   Oh, my mistake. I realized that you change this above, so this will always be
> true, at this point. You'll have to pick something else, like checking the type
> of attachments.description or something.

Do you want it do nothing to current MySQL or Pg? if not, I think it should
be something like this:
return if $dbh->bz_column_info('attachments', 'description')->{TYPE} eq
              'TINYTEXT';
or just check if it is not Oracle. 
(Assignee)

Comment 31

11 years ago
Created attachment 292515 [details] [diff] [review]
mediumtext patch
Attachment #291577 - Attachment is obsolete: true
Attachment #292515 - Flags: review?(mkanat)

Comment 32

11 years ago
Created attachment 292527 [details] [diff] [review]
v7 (or so)

Okay, I took your patch, modified it slightly, and this is what I'm going to check in. Thank you.
Attachment #292515 - Attachment is obsolete: true
Attachment #292527 - Flags: review+
Attachment #292515 - Flags: review?(mkanat)

Comment 33

11 years ago
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v  <--  Schema.pm
new revision: 1.91; previous revision: 1.90
done
Checking in Bugzilla/DB/Schema/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v  <--  Mysql.pm
new revision: 1.18; previous revision: 1.17
done
Checking in Bugzilla/DB/Schema/Pg.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Pg.pm,v  <--  Pg.pm
new revision: 1.15; previous revision: 1.14
done
Checking in Bugzilla/Install/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Install/DB.pm,v  <--  DB.pm
new revision: 1.45; previous revision: 1.44
done
Checking in template/en/default/setup/strings.txt.pl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/setup/strings.txt.pl,v  <--  strings.txt.pl
new revision: 1.5; previous revision: 1.4
done
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago
Flags: approval+
Resolution: --- → FIXED

Comment 34

11 years ago
Comment on attachment 292527 [details] [diff] [review]
v7 (or so)

>Index: Bugzilla/DB/Schema.pm
>@@ -469,7 +469,7 @@
>-            description      => {TYPE => 'TEXT'},
>+            description      => {TYPE => 'MEDIUMTEXT'},



>Index: Bugzilla/Install/DB.pm

>+sub change_text_types {

>+    $dbh->bz_alter_column('flagtypes', 'description',
>+        { TYPE => 'MEDIUMTEXT', NOTNULL => 1 });


This patch is broken: new installations have NOT NULL => 0 while upgrades get NOT NULL => 1. This makes tinderbox to fail. I will file a bug about it.

Updated

11 years ago
Blocks: 409463

Updated

6 years ago
Blocks: 824701
You need to log in before you can comment on or make changes to this bug.