Closed Bug 733273 Opened 13 years ago Closed 12 years ago

Figure out Balrog DB upgrade story

Categories

(Release Engineering :: General, defect, P3)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: bhearsum)

References

Details

Attachments

(1 file, 1 obsolete file)

We have https://github.com/mozilla/balrog/blob/master/scripts/manage-db.py to create tables, and need to figure out how to manage schema changes. There is some talk about that on bug 685862. Examples from other projects: tuxedo - uses http://south.aeracode.org/, but that's django specific http://readthedocs.org/docs/sqlalchemy-migrate/
Blocks: 733274
Summary: Figure out db upgrade story → Figure out Balrog DB upgrade story
found in triage.
Component: Release Engineering → Release Engineering: Automation
QA Contact: release → catlee
Oops, I meant to assign this to me awhile ago.
Assignee: nobody → bhearsum
I talked with Sheeri a bit about tips/pitfalls around DB upgrades in production: [10:43:57] <bhearsum> so, i'm working on a db-backed webapp that's going to have to deal with schema migrations at some point, i was wondering if y'all had any tips, tricks, pitfalls, etc. [10:44:03] <bhearsum> it's mysql-backed, if that matters [10:45:10] <sheeri> make it so that replication can continue if the schemas are different (so use default values and full inserts, e.g. INSERT INTO foo (col1, col2,…) VALUES (val1,val2…),(valA, valB..)…. ) [10:45:43] <sheeri> then you don't have to worry about things, because we can take a machine out of the load balancer, run the schema change by hand, and put it back in, and repeat with the other machines. [10:45:54] <sheeri> That's the simplest thing, because it works in all cases. [10:46:45] <bhearsum> hm, full insert is an insert with the columns named as opposed to one like 'insert into foo values(...)'? [10:50:34] <bhearsum> and how do schema changes work in production normally? like, what's the process for it? [10:50:54] <bhearsum> i've got a script right now that does db management (right now it only knows how to create tables, not upgrade them) [10:51:40] <sheeri> bhearsum yes on the full insert w/ columns [10:52:03] <sheeri> honestly I think usually ppl let the django mechanism or whatever handle it, and if it's too big they ask for db help to do it one at a time. [10:52:25] <bhearsum> ah, so the node(s) with write access just do it? [10:52:35] <bhearsum> eg, the first request with the new code will upgrade on the fly [10:54:08] <sheeri> bhearsum I think they do it when they do a code upgrade. [10:54:17] <bhearsum> oooh [10:54:18] <sheeri> honestly if nobody asks us to help, we don't know about it [10:54:29] <sheeri> so for all I know it happens 1000 times a day, or very rarely [10:54:50] <bhearsum> i'm surprised that that doesn't cause problems with the read-only heads or something [10:55:23] <bhearsum> ah :) [10:55:42] <bhearsum> i guess the key is to make sure the new schema is backwards compatible...but that's more of an app-level concern [11:05:21] <bhearsum> thanks for your time sheeri [11:05:28] <sheeri> np. [11:05:43] <sheeri> yeah, backwards compatible will make sure no matter how the changes are done, things will work. [11:06:19] <bhearsum> if it's not backwards compatible it seems like any time you don't get atomit code updates (everywhere) + db upgrade, you can hit issues [11:06:30] <sheeri> ayup! [11:06:46] <sheeri> and if a schema change changes a large table and locks it for too long, we can always do it manually. [11:07:37] <bhearsum> ahhh [11:08:18] <bhearsum> i'm wondering if temporarily making the whole database read-only might be better in the case of this app - there's periods of the day where it will typically not be written too much, and the clients all know how to retry [11:09:09] <sheeri> bhearsum except that when doing a schema change, an exclusive/write lock is gotten, so reads can't happen during that time either [11:09:19] <bhearsum> oh, really? [11:09:20] <bhearsum> wow [11:09:44] <bhearsum> will requests hang for N seconds and then fail? or fail instantly? or...? [11:13:53] <sheeri> bhearsum they'll wait for the lock to be released, and eventually time out [11:14:18] <bhearsum> does the db do the timeout, or is that all client side? [11:14:41] <sheeri> I think both do timeouts [11:14:48] <bhearsum> ah, ok [11:15:23] <bhearsum> either way i guess the client needs to be able to cope [11:15:36] <bhearsum> where client = client script, sqlalchemy based in this case [11:16:51] <sheeri> *nod*
I think this patch is fairly well documented, but here's a high level overview: * Replace manual table creation (metadata.create_all()) with creation through Migrate. * Stop using BigInteger on sqlite, because it doesn't actually work there * Create two initial versions of the schema. The first is everything except the 'comment' field of the rules table. The second adds that field in. I did this to make sure I actually tested a real upgrade ;). * Add a test that ensures the latest version of the schema in the Migrate repo matches the in-app model. * Re-work manage-db.py a bit; support creation/upgrading through it * Add sqlalchemy-mirgate to the requirements, import it into the vendor library.
Attachment #664994 - Flags: feedback?(nthomas)
Attachment #664994 - Flags: feedback?(catlee)
Comment on attachment 664994 [details] [diff] [review] db migration with sqlalchemy-migrate Switching to r? because I don't know of anything else I need to add to these, other than addressing review comments.
Attachment #664994 - Flags: review?(nthomas)
Attachment #664994 - Flags: review?(catlee)
Attachment #664994 - Flags: feedback?(nthomas)
Attachment #664994 - Flags: feedback?(catlee)
Comment on attachment 664994 [details] [diff] [review] db migration with sqlalchemy-migrate Review of attachment 664994 [details] [diff] [review]: ----------------------------------------------------------------- how will upgrades be deployed? do they happen manually by running the manage-db.py script? ::: auslib/db.py @@ +983,5 @@ > + # methods perform similar wrapping functions to what is done by the API > + # functions, but without disposing of the engine. > + schema = migrate.versioning.schema.ControlledSchema(self.engine, self.migrate_repo) > + changeset = schema.changeset(version) > + for version, change in changeset: is there any concern that using 'version' here shadows the 'version' argument to the function? ::: auslib/migrate/versions/002_foo.py @@ +10,5 @@ > + > +def downgrade(migrate_engine): > + metadata = MetaData(bind=migrate_engine) > + Table('rules', metadata, autoload=True).c.comment.drop() > + Table('rules_history', metadata, autoload=True).c.comment.drop() 002_comment_column.py or something more descriptive would be better than 002_foo.py. Alternatively, 002.py is ok as well, with comments/docstrings explaining what the schema differences are. ::: auslib/test/test_db.py @@ +30,5 @@ > def setUp(self): > + self.dburi = 'sqlite:///%s' % self.getTempfile() > + > + def getTempfile(self): > + return NamedTemporaryFile().name I think this is unsafe. NamedTemporaryFile deletes the file when the object is garbage collected. Better to use tempfile.mkstemp or use a static filename inside a temporary directory for testing. ::: scripts/manage-db.py @@ +26,5 @@ > > if not options.db: > + parser.error("db is required") > + if len(args) != 1: > + parser.error("need a singleaction to perform") nit: "single action" @@ +33,4 @@ > > db = AUSDatabase(options.db) > + if action == 'create': > + db.create(int(options.version)) you can set type="int" in parser.add_option to enforce this, and avoid ValueErrors here. ::: vendor/lib/python/migrate/tests/__init__.py @@ +11,5 @@ > +class TestVersionDefined(TestCase): > + def test_version(self): > + """Test for migrate.__version__""" > + self.assertTrue(isinstance(migrate.__version__, basestring)) > + self.assertTrue(len(migrate.__version__) > 0) do we need to include the tests in our vendor lib?
Attachment #664994 - Flags: review?(catlee) → review+
(In reply to Chris AtLee [:catlee] from comment #8) > Comment on attachment 664994 [details] [diff] [review] > db migration with sqlalchemy-migrate > > Review of attachment 664994 [details] [diff] [review]: > ----------------------------------------------------------------- > > how will upgrades be deployed? do they happen manually by running the > manage-db.py script? Yeah, manually deployed. The idea here is that as long as the upgrades are backwards compatible, we'll deploy do the upgrade and then deploy the new code. If the upgrades aren't backwards compatible we'll have to disable writes in production, do the upgrade + deployment, then re-enable writes. For changes that are going to break read-only operations we'll have to do them in 2 stages (stage 1 will support the old + new way, stage 2 will remove the old way). In dev, I plan to run manage-db.py myself. For production, where I assume we won't have access, IT will be doing it. > ::: vendor/lib/python/migrate/tests/__init__.py > @@ +11,5 @@ > > +class TestVersionDefined(TestCase): > > + def test_version(self): > > + """Test for migrate.__version__""" > > + self.assertTrue(isinstance(migrate.__version__, basestring)) > > + self.assertTrue(len(migrate.__version__) > 0) > > do we need to include the tests in our vendor lib? No! I didn't notice this. I'll remove them.
Comment on attachment 664994 [details] [diff] [review] db migration with sqlalchemy-migrate No additional comments, sorry for the delay.
Attachment #664994 - Flags: review?(nthomas) → review+
Attachment #666953 - Flags: review?(catlee)
Comment on attachment 666953 [details] [diff] [review] address catlee's review comments Review of attachment 666953 [details] [diff] [review]: ----------------------------------------------------------------- ::: auslib/test/test_db.py @@ +36,5 @@ > + for t in tmpfiles: > + os.remove(t) > + > + def getTempfile(self): > + t = mkstemp()[1] you should close the open file descriptor too. fd, t = mkstemp() os.close(fd)
Attachment #666953 - Flags: review?(catlee) → review+
(In reply to Chris AtLee [:catlee] from comment #12) > > + for t in tmpfiles: > > + os.remove(t) > > + > > + def getTempfile(self): > > + t = mkstemp()[1] > > you should close the open file descriptor too. > > fd, t = mkstemp() > os.close(fd) I addressed this in https://github.com/bhearsum/balrog/commit/ae61137c93bf3e500e3ef34d23a114a7bc66c441 I also uncovered a hidden dependency after cleaning up some local Python packages -- migrate depends on Tempita. https://github.com/bhearsum/balrog/commit/2af0aa3ac38b07c33d37458404a16fec583e5f1c put it in the vendor lib (ignore the code changes there - I forgot to commit my merge before doing it).
Attachment #664994 - Attachment is obsolete: true
Comment on attachment 666953 [details] [diff] [review] address catlee's review comments Landed in https://github.com/mozilla/balrog/commit/75aa18146091ad28a518c288b6e108a667469a6f; Jenkins run is at https://jenkins.mozilla.org/job/Balrog/105/. When I went to bootstrap the dev DB, I uncovered another hidden dependency on "decorator". That one got fixed in https://github.com/mozilla/balrog/commit/30b4e2e6b58c5889b6d7883af69bff2d30a565ad. I filed bug 797813 to avoid this in the future. To get the dev DB bootstrapped I did the following on aus1.dev.seamicro.phx1.mozilla.com in my own Balrog repo clone: PYTHONPATH=vendor/lib/python python >>> from sqlalchemy import create_engine >>> engine = create_engine('mysql://aus4_dev:xxxxxxxx@dev-zeus-rw.db.phx1.mozilla.com/aus4_dev') >>> import migrate.versioning.schema >>> migrate.versioning.schema.ControlledSchema.create(engine, 'auslib/migrate', 2) Afterwards, the migrate_version table looked as follows: mysql> select * from migrate_version; +---------------+-----------------+---------+ | repository_id | repository_path | version | +---------------+-----------------+---------+ | balrog | auslib/migrate | 2 | +---------------+-----------------+---------+ And after the webheads updated themselves, both aus4-dev and aus4-admin-dev appear to be functioning as usual.
Attachment #666953 - Flags: checked-in+
It took me a couple of attempts to get decorator installed correctly (module in the root of the vendor lib vs. module in its own directory), but everything is looking good in production now. Apologies for the bustage fixes :(.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: