Closed Bug 902504 Opened 12 years ago Closed 12 years ago

Clean test suite run fails on mysql 5.6

Categories

(Webtools Graveyard :: Elmo, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: peterbe, Assigned: peterbe)

Details

Attachments

(4 files, 5 obsolete files)

When using MySQL 5.6 it's impossible to run:: FORCE_DB=true python manage.py test --noinput You get the traceback in attachment.
Asking for feedback here because I'm stumped. Without this patch I can't run the test suite because otherwise I get errors such as:: FATAL ERROR - The following SQL query failed: CREATE INDEX `l10nstats_unchangedinfile_6e62a245` ON `l10nstats_unchangedinfile` (`file`); The error was: (1071, 'Specified key was too long; max key length is 767 bytes') What are the limitations about indexes on varchars? Also, we're running it with UTF8 so, is VARCHAR(400) actually 800 which is greater than 767? Characters vs. bytes is still hard for me.
Attachment #786995 - Flags: feedback?(scabral)
Attachment #786995 - Flags: feedback?(l10n)
InnoDB has the limit that a max key length is 767 bytes (MyISAM doesn't have that limit). UTF8 actually is either 3 or 4 bytes, depending on the character, so VARCHAR(400) has a max possible value of 1200 bytes.
FWIW, it's the index value that has that limit in InnoDB, not the data value. So you can have a varchar(400) field in UTF8 no problem, the data size can be that large. It's making an index that's a problem.
Considering that we (Mozilla and IT as a whole) are going to upgrade to MySQL 5.6 at some point in the nearish future; what else can we do? I'm now entirely blocked on being able to run the tests on my system. What say you Axel? :sheeri, What's your opinion about having indexes on large fields?
This is weird. The indexes are fine, actually. They're not perfect, but OK. ./manage.py syncdb --migrate work, for example. That's the code path that ./manage.py test ends up in for the test db creation, with one difference, it's specifying 'load_initial_data': False in options in south's migrate command. Not that I understand how that's making a difference.
Comment on attachment 786995 [details] [diff] [review] Now I can run the whole test suite in mysql 5.6 Review of attachment 786995 [details] [diff] [review]: ----------------------------------------------------------------- As I mentioned in the bug, I don't think this is necessary. MySQL docs say that short indexes are only bad if the entries have a unique constraint. Also things work fine when trying to create the db outside of the test code path.
Attachment #786995 - Flags: feedback?(l10n) → feedback-
I have no problem running the test suite locally. $ mysql --version mysql Ver 14.14 Distrib 5.5.32, for debian-linux-gnu (x86_64) using readline 6.2 $ ./manage.py test /home/adrian/mozilla/workspace/elmo/vendor/src/funfactory/funfactory/manage.py:47: UserWarning: You're using an old-style Playdoh layout with a top level __init__.py and apps directories. This is error prone and fights the Zen of Python. See http://playdoh.readthedocs.org/en/latest/getting-started/upgrading.html warnings.warn("You're using an old-style Playdoh layout with a top " /home/adrian/mozilla/workspace/elmo/vendor-local/lib/python/pytz/__init__.py:35: UserWarning: Module pytz was already imported from /home/adrian/mozilla/workspace/elmo/vendor-local/lib/python/pytz/__init__.pyc, but /home/adrian/mozilla/workspace/elmo/env/lib/python2.7/site-packages is being added to sys.path from pkg_resources import resource_stream nosetests --verbosity 1 Creating test database for alias 'default'... ................................................................................................................................................................................... ---------------------------------------------------------------------- Ran 179 tests in 38.492s OK This was achieved from a non-existing test database. The same happens if I force the database creation: $ FORCE_DB=1 ./manage.py test /home/adrian/mozilla/workspace/elmo/vendor/src/funfactory/funfactory/manage.py:47: UserWarning: You're using an old-style Playdoh layout with a top level __init__.py and apps directories. This is error prone and fights the Zen of Python. See http://playdoh.readthedocs.org/en/latest/getting-started/upgrading.html warnings.warn("You're using an old-style Playdoh layout with a top " /home/adrian/mozilla/workspace/elmo/vendor-local/lib/python/pytz/__init__.py:35: UserWarning: Module pytz was already imported from /home/adrian/mozilla/workspace/elmo/vendor-local/lib/python/pytz/__init__.pyc, but /home/adrian/mozilla/workspace/elmo/env/lib/python2.7/site-packages is being added to sys.path from pkg_resources import resource_stream nosetests --verbosity 1 Creating test database for alias 'default'... Got an error creating the test database: (1007, "Can't create database 'test_elmo'; database exists") Type 'yes' if you would like to try deleting the test database 'test_elmo', or 'no' to cancel: yes Destroying old test database 'default'... ................................................................................................................................................................................... ---------------------------------------------------------------------- Ran 179 tests in 54.305s OK
Attached file unable-to-migrate.txt
So, it's not just a problem with tests. Off of "develop" branch, I can't create the tables. I created a brand new database called "elmo_tmp", updated settings/local.py accordingly and ran the syncdb && migrate. Failed.
I think that what's really suspect is that when I run `FORCE_DB=1 ./manage.py test --nonput` it eventually barfs on one of the index creations. BUT, 50 of the 75 tables are created. Just not with the indexes. mysql> explain mbdb_file; +-------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | path | varchar(400) | NO | | NULL | | +-------+--------------+------+-----+---------+----------------+ 2 rows in set (0.00 sec) mysql> CREATE INDEX `mbdb_file_6a8a34e2` ON `mbdb_file` (`path`); ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes I guess, this is the first index creation attempt once the table has been created and this error is breaking things. :sheeri, can you confirm that it's possible or not possible to create indexes on columns even though they don't have a uniqueness constraint on them?
Indexes can exist on columns that are non-unique (and they should! they're useful!). If you want indexes on those columns, you need a prefix index, like CREATE INDEX `mbdb_file_6a8a34e2` ON `mbdb_file` (`path`(190)); That will create an index on the first 190 characters of "path". (I chose 190, because 190 * 4 = 760, so it's close but doesn't exceed the 767 limit. If you are mostly using 3-byte UTF8 characters, you can use 255 characters....which is, I believe, why they picked 767 as the limit in the first place :D). But if you want to play it safe and it's enough characters for you, 190 chars is what you want.
That command works. Considering that the above SQL I posted is taken from out of the Django ORM (using the South library) I really don't know how we can inject that extra limit on the length through python.
It looks like django doesn't actually give you that ability. How frustrating. Everyone just says "oh just hash the long text and index the hash". What a pain. However, in MySQL 5.6, according to https://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html: "When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats." innodb_large_prefix is off by default but is dynamic (see https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_large_prefix) - there are some other settings we have to make sure are in place. If this is for l10n, I believe that's on its own server, so these changes aren't a problem.
I wonder if we're hitting https://kb.askmonty.org/en/mariadb-versus-mysql-compatibility/#incompatibilities-between-mariadb-52-and-mysql-51: New SQL_MODE value was added: IGNORE_BAD_TABLE_OPTIONS. If it is not set, using a table, field, or index attribute (option) that is not supported by the chosen storage engine will cause an error. This change might cause warnings in the error log about incorrectly defined tables from the mysql database, fix that with mysql_upgrade.
The language of that is kind of confusing, but the SQL_MODE overrides the error for a "bad table option". We did not set that on the server side, and I'd be surprised if it was set on the client side (by you).
Attachment #786995 - Flags: review+
Attachment #786995 - Flags: feedback?(scabral)
I got back to this, and I managed to hack django to work, but it's not pretty. I'm also wondering if the indexes actually work well to begin with, given that most file have a common starting pattern. Like just files starting with 'layout/' are about 20k. I get the feeling that in this particular case, doing an index on the actual hash might actually be what we want. It might also make the stunts we need to do about trailing spaces in filenames have less impact still. And yes, I did watch Sheeri's indexing talk today. You can tell by the way I spelt indices. ;-)
<3 re: the indexing talk. Hope it was helpful! (I hate it too, I took Latin in high school, but my college job in a library trained me into using "indexes" instead of "indices") If you're only doing exact matches, then an index of a hash will work well.
Comment on attachment 786995 [details] [diff] [review] Now I can run the whole test suite in mysql 5.6 Review of attachment 786995 [details] [diff] [review]: ----------------------------------------------------------------- OK, based on the status quo, this is good to go. I'd like to have a follow-up to create the hash columns and index and use those. Also, we need a patch for life, still, which has a mock File object that mirrors mbdb. Also, should the migration for life have the conditional if mbdb isn't on? Probably, as we may at some point actually drop that.
Attachment #786995 - Flags: feedback- → feedback+
:Pike, so are you OK with me merging this patch? Sheeri gave it the r+ and you the f+ but I'm not sure if your f+ is on the r+ so to say.
> Also, we need a patch for life, still, which has a mock File object that > mirrors mbdb. > > Also, should the migration for life have the conditional if mbdb isn't on? > Probably, as we may at some point actually drop that. At least the first issue needs to be addressed. Probably the latter as well.
Attached patch bug902504-2.diff (obsolete) — Splinter Review
Attachment #786995 - Attachment is obsolete: true
Attachment #801651 - Flags: review?(l10n)
Attached patch bug902504-3.diff (obsolete) — Splinter Review
oops. last diff was done too quickly. wrong commit.
Attachment #801651 - Attachment is obsolete: true
Attachment #801651 - Flags: review?(l10n)
Attachment #801801 - Flags: review?(l10n)
Comment on attachment 801801 [details] [diff] [review] bug902504-3.diff Review of attachment 801801 [details] [diff] [review]: ----------------------------------------------------------------- Sadly, the life copy of the model migration needs more work. r-. ::: apps/life/migrations/0001_initial.py @@ +29,5 @@ > + ('id', self.gf('django.db.models.fields.AutoField')(primary_key=True)), > + ('path', self.gf('django.db.models.fields.CharField')(max_length=400)), > + )) > + db.send_create_signal('life', ['File']) > + This needs to be in a conditional, and the table is mbdb_file, even if constructed through the life/models.py.
Attachment #801801 - Flags: review?(l10n) → review-
Attached patch bug902504-4.diff (obsolete) — Splinter Review
Attachment #801801 - Attachment is obsolete: true
Attachment #802560 - Flags: review?(l10n)
Comment on attachment 802560 [details] [diff] [review] bug902504-4.diff Review of attachment 802560 [details] [diff] [review]: ----------------------------------------------------------------- This didn't address the review comments in full, r-. ::: apps/life/migrations/0001_initial.py @@ +26,5 @@ > db.send_create_signal('life', ['Branch']) > > + if 'mbdb' not in settings.INSTALLED_APPS: > + # Adding model 'File' > + db.create_table('life_file', ( no. @@ +30,5 @@ > + db.create_table('life_file', ( > + ('id', self.gf('django.db.models.fields.AutoField')(primary_key=True)), > + ('path', self.gf('django.db.models.fields.CharField')(max_length=400)), > + )) > + db.send_create_signal('life', ['File']) probably neither. Don't know the signals, really.
Attachment #802560 - Flags: review?(l10n) → review-
Sorry for my poor understanding. I really don't understand what you mean then. We already have `db.create_table('mbdb_file' ...` in apps/mbdb/migrations/0001_initial.py Sorry, but can you try to explain it in some other way that this tired swede can understand?
Comment on attachment 802560 [details] [diff] [review] bug902504-4.diff Review of attachment 802560 [details] [diff] [review]: ----------------------------------------------------------------- ::: apps/life/migrations/0001_initial.py @@ +26,5 @@ > db.send_create_signal('life', ['Branch']) > > + if 'mbdb' not in settings.INSTALLED_APPS: > + # Adding model 'File' > + db.create_table('life_file', ( the table name is mbdb_file, see models.py... @@ +30,5 @@ > + db.create_table('life_file', ( > + ('id', self.gf('django.db.models.fields.AutoField')(primary_key=True)), > + ('path', self.gf('django.db.models.fields.CharField')(max_length=400)), > + )) > + db.send_create_signal('life', ['File']) I suspect the sent signal is db.send_create_signal('mbdb', ['File']) ::: apps/life/models.py @@ +64,2 @@ > class Meta: > db_table = 'mbdb_file' We explicitly specify the table name here, and the rest of life migration etc needs to be consistent.
Attached patch bug902504-5.diff (obsolete) — Splinter Review
That's what I had! ...in an earlier patch. At least that's what I think I had. Perhaps I had that in my head but made the wrong diff or something. Anyway, that explains a lot of the confusion. Again, sorry for not comprehending.
Attachment #802560 - Attachment is obsolete: true
Attachment #803824 - Flags: review?(l10n)
Sorry, I slowly learn about South, and this conditional model is whack. I just change my mind as we go, too. Let's remove that mock model in life/models.py, instead of trying to make it work. If we drop mbdb, we should properly migrate that model and data into life, I think. For reference, I just went ahead and made south help us out with what had been the right life initial migration. Just corrupted my settings/base.py to not include a whole lot, and then had south generate the initial migration. That gives the right drops, and the file models, but those need to be ifdef'ed still. But in the end, as you can see, I just reconsidered while typing this comment, I think our models should reflect what we're doing, and not try to be generic and reusable. I expect that's gonna help us when it comes down to actually migrating off of mbdb.
Comment on attachment 803824 [details] [diff] [review] bug902504-5.diff Review of attachment 803824 [details] [diff] [review]: ----------------------------------------------------------------- So sorry, but based on my thinking in the previous comment and change of mind ... ::: apps/life/migrations/0001_initial.py @@ +224,4 @@ > } > } > > + complete_apps = ['life'] let's not touch the life migration at all ... ::: apps/life/models.py @@ +64,4 @@ > class Meta: > db_table = 'mbdb_file' > > + path = models.CharField(max_length=400) .. and remove the mock here completely.
Attachment #803824 - Flags: review?(l10n) → review-
Attached patch bug902504-6.diffSplinter Review
I hope this is the right one now.
Attachment #803824 - Attachment is obsolete: true
Attachment #804495 - Flags: review?(l10n)
Comment on attachment 804495 [details] [diff] [review] bug902504-6.diff Review of attachment 804495 [details] [diff] [review]: ----------------------------------------------------------------- r=me with a nit, there are still no-op changes to life's migration file. Thanks for the patience. ::: apps/life/migrations/0001_initial.py @@ +2,4 @@ > import datetime > from south.db import db > from south.v2 import SchemaMigration > +from django.conf import settings We actually don't want this anymore. @@ +216,4 @@ > } > } > > + complete_apps = ['life'] ... and the newline isn't important enough to touch the file at all.
Attachment #804495 - Flags: review?(l10n) → review+
Commit pushed to develop at https://github.com/mozilla/elmo https://github.com/mozilla/elmo/commit/88ecb9f1a5482479ec266ec54476c22ccc99f627 fixes bug 902504 - Removed indexes on charfields that are max_length=400, r=Pike
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.3
Assignee: nobody → peterbe
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: