Closed Bug 678163 Opened 13 years ago Closed 13 years ago

put AUS3 database interactions into a library

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [aus3])

Attachments

(2 files, 4 obsolete files)

Attached patch factor out DB stuff to a library (obsolete) — Splinter Review
While looking at prototyping bug 672918 I realised that it was going to tough to do with the database interactions mixed in with the business logic of finding an update - I'd either have to use the AUS class, which is mostly unrelated to a web API, or duplicate some of the database code it currently has.

Attached is one way we could do this. The patch uses SQLAlchemy for database interactions to get us database independence.

One thing that's a bit weird is the version/channel matching logic being in the UpdatePaths class, because that logic is not _strictly_ a database interaction. I'd left it in AUS originally but also found it weird that UpdatePaths did some matching, and then AUS.getMatchingRules did additional matching. I could be convinced either way here if someone feels strongly.
Attachment #552339 - Flags: feedback?(nrthomas)
Attachment #552339 - Flags: feedback?(catlee)
Nick, I did some quick timings with test-rules.py: In the current state of things (DB interactions in AUS w/ plain sqlite driver), test-rules.py consistently took 1.5s to run over 10 runs (~231 tests/second). With this patch, it consistently took 2.9s to run over 10 runs (~125 tests/second).

I'm not sure if that will scale linearly, but 125 tests/second should mean at least 125/requests per second (because a test does more work than a request, such as reading snippets from disk), which seems like more than enough, especially when we're scaling across multiple web heads + caching.

Of course, the above is completely unscientific.
Comment on attachment 552339 [details] [diff] [review]
factor out DB stuff to a library

Looks good, a bunch of suggestions and I'd be curious where the slow down comes from, but not in a spend-more-than-a-few-minutes kind of way.

>diff --git a/AUS.py b/AUS.py
> class AUS3:
>     def __init__(self, dbname=None):
...
>+        self.releases = self.db.getReleases()
>+        self.updatePaths = self.db.getUpdatePaths()

This looks to me like it's returning data rather than a table object, due to the other functions with the same names which runs a query.

matchesRegex() is still present in AUS.py, with a copy in db.py. 

convertRow/Rule/Release could probably go over to db.py, and made generic if possible - ie getReleases() can return a dict with the json already parsed. TestReleases would need updating.

>diff --git a/auslib/__init__.py b/auslib/__init__.py
>+class AUSTable(object):
>+    def select(self, columns=None, where=None, order_by=None):

You could use this for UpdatePaths.getOrderedRules I think.

>+class Releases(AUSTable):
>+    def __init__(self, metadata):
>+        self.table = Table('releases', metadata,
>+            Column('name', String),

I was tempted to say add an index on this, but lets hold off on that sort of thing for now.

>diff --git a/test/test_db.py b/test/test_db.py

Yay, tests!

>+class TestAUSTable(unittest.TestCase):
Could throw in a test of the mirroring and select() here.

>+class TestUpdatePathsSimple(unittest.TestCase, TestUpdatePaths):
>+    def setUp(self):
>+        self.db = AUSDatabase('sqlite:///:memory:')
>+        self.paths = self.db.getUpdatePaths()
>+        self.paths.t.insert().execute(id=1, priority=100, version='3.5', buildTarget='d', throttle=100)

Could have some mapping's here to avoid testGetOrderedRules() being vulnerable to variable sorting on other columns.
Attachment #552339 - Flags: feedback?(nrthomas) → feedback+
I did a quick run of the tests under cProfile. The hottest functions were not SQLAlchemy ones (JSON's raw_decode was the worst, followed by sqlite3's execute(), then create_snippet, then listdir()). However, the difference in total function calls was significant: 114095 for the original, 852873 for the SQLAlchemy version - which leads me to suspect that most of that time is overhead.
Comment on attachment 552339 [details] [diff] [review]
factor out DB stuff to a library

I've got a newer version of this that I'm working on.
Attachment #552339 - Attachment is obsolete: true
Attachment #552339 - Flags: feedback?(catlee)
(In reply to Nick Thomas [:nthomas] from comment #2)
> >diff --git a/AUS.py b/AUS.py
> > class AUS3:
> >     def __init__(self, dbname=None):
> ...
> >+        self.releases = self.db.getReleases()
> >+        self.updatePaths = self.db.getUpdatePaths()
> 
> This looks to me like it's returning data rather than a table object, due to
> the other functions with the same names which runs a query.

I changed this to use the @property decorator - so we can access it through self.db.releases, but can't assign to that.

> matchesRegex() is still present in AUS.py, with a copy in db.py. 

Not needed anymore, so I dropped it.

> convertRow/Rule/Release could probably go over to db.py, and made generic if
> possible - ie getReleases() can return a dict with the json already parsed.
> TestReleases would need updating.

Done, though I couldn't figure out how to make it more generic.
> >diff --git a/auslib/__init__.py b/auslib/__init__.py
> >+class AUSTable(object):
> >+    def select(self, columns=None, where=None, order_by=None):
> 
> You could use this for UpdatePaths.getOrderedRules I think.

Yeah, AUSTable.select was sort of half baked in this patch. I think this new version is a lot better - full explanation of the design is below.

> >+class Releases(AUSTable):
> >+    def __init__(self, metadata):
> >+        self.table = Table('releases', metadata,
> >+            Column('name', String),
> 
> I was tempted to say add an index on this, but lets hold off on that sort of
> thing for now.

Agreed.

> >+class TestAUSTable(unittest.TestCase):
> Could throw in a test of the mirroring and select() here.

Done.

> >+class TestUpdatePathsSimple(unittest.TestCase, TestUpdatePaths):
> >+    def setUp(self):
> >+        self.db = AUSDatabase('sqlite:///:memory:')
> >+        self.paths = self.db.getUpdatePaths()
> >+        self.paths.t.insert().execute(id=1, priority=100, version='3.5', buildTarget='d', throttle=100)
> 
> Could have some mapping's here to avoid testGetOrderedRules() being
> vulnerable to variable sorting on other columns.

Done.

Other changes:
* AUSTable:
** Adjusted to support any where clauses instead of hardcoding ==.
** Support LIMIT
** Use new convertRow decorator to convert rows to writeable dictionaries.
** I don't think using fetchall() is a great idea for large datasets. I tried implementing select() as a generator (iterating w/ fetchone() and yielding each), but we have at least one use in AUS that makes that difficult. If this becomes an issue, we can re-approach it.
* Updated UpdatePaths/Releases to use explicit args instead of **kwargs. I think this is a better experience for consumers, because you can't call it with args that will try to select or limit by non-existent columns.

The rest is just test adjustments/additions.
Attachment #553215 - Flags: feedback?(nrthomas)
Attachment #553215 - Flags: feedback?(catlee)
I think this is new at a point where the base is pretty complete and well tested, and it has most of the features we're going to need. New things since comment#5:
* Implement update(), delete(), and insert() on AUSTable. All of these automatically add history entries to the tables described below.
* Implement history tables. There's a tiny bit of magic involved with this, because I didn't want to have to declare all of these tables by hand. Specifically, AUSTable's constructor creates a new instance of History, which receives another AUSTable instance and creates a Table with all of the same columns as the one it received, plus a change_id, username, and timestamp.
* Implement Permissions table. This small table is all we need to support the ACLs described here: https://wiki.mozilla.org/User:Bhearsum/AUS3/Administration#Authentication.2FAccounts. Note that authentication is to be handled by the web server, so we only need to associate pre-authenticated accounts with permissions.
* Add AUSDatabase.reset() to make it easier to use in testing.
* Tests for all of the above.
Attachment #553215 - Attachment is obsolete: true
Attachment #553215 - Flags: feedback?(nrthomas)
Attachment #553215 - Flags: feedback?(catlee)
Attachment #553903 - Flags: review?(nrthomas)
Attachment #553903 - Flags: review?(catlee)
Comment on attachment 553903 [details] [diff] [review]
v3, much more complete sqlalchemy backend

Review of attachment 553903 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good!

Overall I like it. A few things here and there to address, but nothing keeping you from landing this and continuing along these lines.

::: AUS.py
@@ +47,5 @@
> +            osVersion=updateQuery['osVersion'],
> +            distribution=updateQuery['distribution'],
> +            distVersion=updateQuery['distVersion'],
> +            headerArchitecture=updateQuery['headerArchitecture']
> +        )

How about making getRulesMatchingQuery take an updateQuery object, so we don't have all this parameter copying?

::: auslib/db.py
@@ +8,5 @@
> +
> +import logging
> +log = logging.getLogger(__name__)
> +
> +def convertRows(fn):

a docstring or comment here would be great

@@ +74,5 @@
> +                for row in self.select(where=where):
> +                    # Tack on history table information to the row
> +                    row['changed_by'] = changed_by
> +                    row['timestamp'] = datetime.now()
> +                    # XXX: would be nice if we could use self.history.insert()

why can't you? does this also apply to the code in AUSTable.insert()?

@@ +112,5 @@
> +    def __init__(self, metadata, baseTable):
> +        self.table = Table('%s_history' % baseTable.t.name, metadata,
> +            Column('change_id', Integer, primary_key=True, autoincrement=True),
> +            Column('changed_by', String),
> +            Column('timestamp', DateTime)

be veeeeeerrrrry careful with DateTime columns - their interaction with timezones is weird. maybe a regular timestamp would be better?

@@ +124,5 @@
> +
> +class UpdatePaths(AUSTable):
> +    def __init__(self, metadata):
> +        self.table = Table('update_paths', metadata,
> +            Column('rule_id', Integer, primary_key=True, autoincrement=True),

Your naming is a bit inconsistent here. These should be rules or update paths, but not both. You've got some methods below referring to 'rules' too.

@@ +140,5 @@
> +            Column('distribution', String),
> +            Column('distVersion', String),
> +            Column('headerArchitecture', String),
> +            Column('comment', String)
> +        )

I believe SQLAlchemy's default is to have nullable=True for columns unless you say otherwise. Is NULL valid for all these columns?

@@ +145,5 @@
> +        AUSTable.__init__(self)
> +
> +    def matchesRegex(self, foo, bar):
> +        # Expand wildcards and use ^/$ to make sure we don't succeed on partial
> +        # matches.

Can you give an example?  Is is for matching wildcards in the rules table?

@@ +156,5 @@
> +    def _versionMatchesRule(self, ruleVersion, queryVersion):
> +        """Decides whether a version from the rules matches an incoming version.
> +           If the ruleVersion is null, we match any queryVersion. If it's not
> +           null, we must either match exactly, or match a potential wildcard."""
> +        if ruleVersion == None:

"if ruleVersion is None" is faster, and more pythonic.

@@ +166,5 @@
> +        """Decides whether a channel from the rules matchs an incoming one.
> +           If the ruleChannel is null, we match any queryChannel. We also match
> +           if the channels match exactly, or match after wildcards in ruleChannel
> +           are resolved. Channels may have a fallback specified, too, so we must
> +           check if the fallback version of the queryChannel matches the ruleChannel."""

What if there is no fallbackChannel? It's not clear what's supposed to happen in that case.

@@ +186,5 @@
> +        matchingRules = []
> +        rules = self.select(
> +            where=[
> +                (self.throttle > 0) &
> +                ((self.product==product) | (self.product==None)) &

this should be left as self.product == None so sqlalchemy can do its magic

@@ +195,5 @@
> +                ((self.distribution==distribution) | (self.distribution==None)) &
> +                ((self.distVersion==distVersion) | (self.distVersion==None)) &
> +                ((self.headerArchitecture==headerArchitecture) | (self.headerArchitecture==None))
> +            ]
> +        )

Maybe guard the block below with log.isEnabledFor(logging.DEBUG) (http://docs.python.org/library/logging.html#logging.Logger.isEnabledFor) so it's not run if debugging isn't enabled.

@@ +223,5 @@
> +            Column('name', String),
> +            Column('product', String),
> +            Column('version', String),
> +            Column('data', Text)
> +        )

pretty sure you don't want all these to be nullable.

also, you want an index on at least 'name', right? is 'name' unique?

@@ +239,5 @@
> +        for row in rows:
> +            row['data'] = json.loads(row['data'])
> +        return rows
> +
> +class Permissions(AUSTable):

some docs here about how permissions are represented in the DB would be great, what possible permissions there are, etc.

::: auslib/test/test_db.py
@@ +34,5 @@
> +
> +    def testHistoryUponInsert(self):
> +        self.test.insert(changed_by='george', id=4, foo=0)
> +        ret = self.test.history.t.select().execute().fetchone()
> +        # XXX: How do we make make timestamps knowable ahead of time?

you can use mock to control what time.time() / datetime.now() returns, or you can refactor any methods that use time to use a common function that you can then control in your tests.

something like...
with mock.patch('time.time') as t:
    t.return_value = 5
    # time.time() now will return 5

@@ +93,5 @@
> +        ret = self.test.update(changed_by='bob', where=[self.test.id==1], what=dict(foo=123))
> +        self.assertEquals(ret.rowcount, 1)
> +        self.assertEquals(self.test.t.select(self.test.id==1).execute().fetchall()[0], (1, 123))
> +
> +class TestUpdatePaths(object):

call this a MixIn to make it clear it shouldn't be used on its own
Attachment #553903 - Flags: review?(catlee) → review+
(In reply to Chris AtLee [:catlee] from comment #7)
> ::: AUS.py
> @@ +47,5 @@
> > +            osVersion=updateQuery['osVersion'],
> > +            distribution=updateQuery['distribution'],
> > +            distVersion=updateQuery['distVersion'],
> > +            headerArchitecture=updateQuery['headerArchitecture']
> > +        )
> 
> How about making getRulesMatchingQuery take an updateQuery object, so we
> don't have all this parameter copying?

Done.

> ::: auslib/db.py
> @@ +8,5 @@
> > +
> > +import logging
> > +log = logging.getLogger(__name__)
> > +
> > +def convertRows(fn):
> 
> a docstring or comment here would be great

I added a comment, and reversed the inner/outer method names -- I think rowsToDicts is more obvious to use as a decorator.

> @@ +74,5 @@
> > +                for row in self.select(where=where):
> > +                    # Tack on history table information to the row
> > +                    row['changed_by'] = changed_by
> > +                    row['timestamp'] = datetime.now()
> > +                    # XXX: would be nice if we could use self.history.insert()
> 
> why can't you? does this also apply to the code in AUSTable.insert()?

In the patch you reviewed, I couldn't because conn.execute() takes Query objects, and .insert() and friends return ResultProxys. I refactored a bit to separate out query generation from execution of queries to make this a bit better. We still can't use plain insert(), but we don't have to dig all the way down to self.history.table now.

> @@ +112,5 @@
> > +    def __init__(self, metadata, baseTable):
> > +        self.table = Table('%s_history' % baseTable.t.name, metadata,
> > +            Column('change_id', Integer, primary_key=True, autoincrement=True),
> > +            Column('changed_by', String),
> > +            Column('timestamp', DateTime)
> 
> be veeeeeerrrrry careful with DateTime columns - their interaction with
> timezones is weird. maybe a regular timestamp would be better?

Makes sense - are you suggesting storing an epoch time in a Float, or something different?

> @@ +124,5 @@
> > +
> > +class UpdatePaths(AUSTable):
> > +    def __init__(self, metadata):
> > +        self.table = Table('update_paths', metadata,
> > +            Column('rule_id', Integer, primary_key=True, autoincrement=True),
> 
> Your naming is a bit inconsistent here. These should be rules or update
> paths, but not both. You've got some methods below referring to 'rules' too.

I've been carrying that over from previous work. Nick, do you have any preference here?

> @@ +140,5 @@
> > +            Column('distribution', String),
> > +            Column('distVersion', String),
> > +            Column('headerArchitecture', String),
> > +            Column('comment', String)
> > +        )
> 
> I believe SQLAlchemy's default is to have nullable=True for columns unless
> you say otherwise. Is NULL valid for all these columns?

Mostly, yes. update_type probably should be nullable=False. Many of the others probably won't ever be NULL by convention, but there's no technical reason they can't be. It might be worthwhile making product and maybe versionnullable=False to try and guard against human error, not sure how I feel about that yet...

> @@ +145,5 @@
> > +        AUSTable.__init__(self)
> > +
> > +    def matchesRegex(self, foo, bar):
> > +        # Expand wildcards and use ^/$ to make sure we don't succeed on partial
> > +        # matches.
> 
> Can you give an example?  Is is for matching wildcards in the rules table?

Yeah, for versions and and channels. I added to this comment.

> @@ +156,5 @@
> > +    def _versionMatchesRule(self, ruleVersion, queryVersion):
> > +        """Decides whether a version from the rules matches an incoming version.
> > +           If the ruleVersion is null, we match any queryVersion. If it's not
> > +           null, we must either match exactly, or match a potential wildcard."""
> > +        if ruleVersion == None:
> 
> "if ruleVersion is None" is faster, and more pythonic.

Changed.

> @@ +166,5 @@
> > +        """Decides whether a channel from the rules matchs an incoming one.
> > +           If the ruleChannel is null, we match any queryChannel. We also match
> > +           if the channels match exactly, or match after wildcards in ruleChannel
> > +           are resolved. Channels may have a fallback specified, too, so we must
> > +           check if the fallback version of the queryChannel matches the ruleChannel."""
> 
> What if there is no fallbackChannel? It's not clear what's supposed to
> happen in that case.

fallbackChannel is always going to be set. In most cases it will be the same as the as the query channel. I updated the getRulesMatchingQuery docstring to reflect this.

> @@ +195,5 @@
> > +                ((self.distribution==distribution) | (self.distribution==None)) &
> > +                ((self.distVersion==distVersion) | (self.distVersion==None)) &
> > +                ((self.headerArchitecture==headerArchitecture) | (self.headerArchitecture==None))
> > +            ]
> > +        )
> 
> Maybe guard the block below with log.isEnabledFor(logging.DEBUG)
> (http://docs.python.org/library/logging.html#logging.Logger.isEnabledFor) so
> it's not run if debugging isn't enabled.

I assume you're talking about the part where I loop over the rules just to log them - I added this guard there.

> @@ +223,5 @@
> > +            Column('name', String),
> > +            Column('product', String),
> > +            Column('version', String),
> > +            Column('data', Text)
> > +        )
> 
> pretty sure you don't want all these to be nullable.

Yeah, agreed. We probably want name to the a key field, too.

> also, you want an index on at least 'name', right?

Haven't thought about indexes at all yet.
 
> @@ +239,5 @@
> > +        for row in rows:
> > +            row['data'] = json.loads(row['data'])
> > +        return rows
> > +
> > +class Permissions(AUSTable):
> 
> some docs here about how permissions are represented in the DB would be
> great, what possible permissions there are, etc.

I'm in the process of flushing this out, I'll have to sorted out when I post the next patch.

> ::: auslib/test/test_db.py
> @@ +34,5 @@
> > +
> > +    def testHistoryUponInsert(self):
> > +        self.test.insert(changed_by='george', id=4, foo=0)
> > +        ret = self.test.history.t.select().execute().fetchone()
> > +        # XXX: How do we make make timestamps knowable ahead of time?
> 
> you can use mock to control what time.time() / datetime.now() returns, or
> you can refactor any methods that use time to use a common function that you
> can then control in your tests.
> 
> something like...
> with mock.patch('time.time') as t:
>     t.return_value = 5
>     # time.time() now will return 5

Awesome, thanks for the tip!

> @@ +93,5 @@
> > +        ret = self.test.update(changed_by='bob', where=[self.test.id==1], what=dict(foo=123))
> > +        self.assertEquals(ret.rowcount, 1)
> > +        self.assertEquals(self.test.t.select(self.test.id==1).execute().fetchall()[0], (1, 123))
> > +
> > +class TestUpdatePaths(object):
> 
> call this a MixIn to make it clear it shouldn't be used on its own

Done.
Swapping from updatePaths to rules would be fine. I'll take a closer look at the patch on Tuesday.
Comment on attachment 553903 [details] [diff] [review]
v3, much more complete sqlalchemy backend

Great progress! feedback+ because the code has already moved on, but I'm happy if we land soon and iterate in the main repo.

>diff --git a/AUS.py b/AUS.py
>     def __init__(self, dbname=None):
>         if dbname == None:
>             dbname = "update.db"

Needs to be 'sqlite:///update.db' now.

>         # evaluate wildcard parameters
>+        matchingRules = []
>+        for rule in rules:
>+            matchingRules.append(rule)

Already done in updatePaths.getRulesMatchingQuery, so this lot looks like it can go.

>diff --git a/auslib/db.py b/auslib/db.py
>+class AUSTable(object):
>+    def insert(self, changed_by, **columns):

This guy still needs some work. When I tested with updatePaths it gave me a history row where the rule_id was empty. Probably need to commit in the opposite order, and hope the return value on main table provides the info you need.

>+    def update(self, changed_by, where, what):
>+    def delete(self, changed_by, where):

I find it a bit strange to attach a current timestamp to the old content in these two functions, and might make debugging fun, but perhaps that makes it easier to do rollback later. Implementation will be the test.

For all three modifying functions, where is the appropriate place to add checks? Eg don't insert over the top of an existing row, row must exist to update or delete.

>+    def getReleases(self, name=None, product=None, version=None, limit=None):
...
>+        rows = AUSTable.select(self, where=where, limit=limit)

Can this be |rows = self.select(where=where, limit=limit)| ?

>+class Permissions(AUSTable):
>+    def __init__(self, metadata):
>+        self.table = Table('permissions', metadata,
>+            Column('permission', String, primary_key=True),
>+            Column('username', String, primary_key=True),
>+            Column('options', String)
>+        )
>+        AUSTable.__init__(self)

Do we need a unique row id to make history work on the permissions table ? updatePaths has rule_id, Releases has name, but we'd have to use a compound key here, right ?

>diff --git a/auslib/test/test_db.py b/auslib/test/test_db.py
>+    def testHistoryTableHasAllColumns(self):
>+        columns = [c.name for c in self.test.history.t.get_children()]
>+        self.assertIn('id', columns)
>+        self.assertIn('foo', columns)
>+        self.assertIn('changed_by', columns)
>+        self.assertIn('timestamp', columns)

Should be testing for change_id here too. 

>+    def testHistoryUponInsert(self):
>+        self.test.insert(changed_by='george', id=4, foo=0)
>+        ret = self.test.history.t.select().execute().fetchone()
...
>+        self.assertEquals(ret, [1, 'george', None, None, None])

The second last None is what I was talking about with insert() above. Should be 4 according to the 'Logging & Rollback' section of your wiki page.

>+    def testRevokePermission(self):
>+        self.permissions.revokePermission(changed_by='bill', username='bob', permission='editproduct', options='fake')
>+        query = self.permissions.t.select().where(self.permissions.username=='cathy')

s/cathy/bob/. This'll be passing because there are no cathy rows in the newly set up db.

>diff --git a/test-rules.py b/test-rules.py
>         if options.dumprules:
>             log.info("Rules are \n(id, priority, mapping, throttle, product, version, channel, buildTarget, buildID, locale, osVersion, distribution, distVersion, UA arch):")
>-            for rule in AUS.getRules():
>+            for rule in AUS.rules.getRules():

Won't work right now, but will after the s/updatePaths/rules. :-)
Attachment #553903 - Flags: review?(nrthomas) → review+
Comment on attachment 553903 [details] [diff] [review]
v3, much more complete sqlalchemy backend

Thanks for the feedback! I went ahead and landed, it shouldn't take too long to address all the review comments.
Attachment #553903 - Flags: checked-in+
(In reply to Nick Thomas [:nthomas] from comment #10)
> >diff --git a/auslib/db.py b/auslib/db.py
> >+class AUSTable(object):
> >+    def insert(self, changed_by, **columns):
> 
> This guy still needs some work. When I tested with updatePaths it gave me a
> history row where the rule_id was empty. Probably need to commit in the
> opposite order, and hope the return value on main table provides the info
> you need.

Good catch, I'll fix this up.

> >+    def update(self, changed_by, where, what):
> >+    def delete(self, changed_by, where):
> 
> I find it a bit strange to attach a current timestamp to the old content in
> these two functions, and might make debugging fun, but perhaps that makes it
> easier to do rollback later. Implementation will be the test.

Hmm, the way I've been looking at it is that the timestamp in the history table indicates that the data associated with it was as such up until that time. I made up a contrived example, and I think I see what you mean (please excuse the verbosity):

Action #1: Insert Firefox 6.0, ben @ 10am August 25
Releases:
(name)              (product) (version) (data)
Firefox-6.0-build1  Firefox   6.0       abc

Releases_history:
(change_id) (changed_by) (timestamp)  (name)             (product) (version) (data)
1           ben          10am Aug 25  Firefox-6.0-build1 NULL      NULL      NULL

Action #2: Update Firefox 6.0 data, ben @ 12pm August 25
Releases:
(name)              (product) (version) (data)
Firefox-6.0-build1  Firefox   6.0       123

Releases_history:
(change_id) (changed_by) (timestamp)  (name)             (product) (version) (data)
1           ben          10am Aug 25  Firefox-6.0-build1 NULL      NULL      NULL
2           ben          12pm Aug 25  Firefox-6.0-build1 Firefox   6.0       abc

Action #3: Delete Firefox 6.0, ben @ 3pm August 25
Releases:
(name)              (product) (version) (data)

Releases_history:
(change_id) (changed_by) (timestamp)  (name)             (product) (version) (data)
1           ben          10am Aug 25  Firefox-6.0-build1 NULL      NULL      NULL
2           ben          12pm Aug 25  Firefox-6.0-build1 Firefox   6.0       abc
3           ben          3pm Aug 25   Firefox-6.0-build1 Firefox   6.0       123

Intuitively, I can see how one would expect the data to have taken effect at that time....

What if we added a NULL row as well as a data row in insert()? Doing that in the above example would result in a history table that looks like this:
(change_id) (changed_by) (timestamp)   (name)             (product) (version) (data)
1           ben          9:59am Aug 25 Firefox-6.0-build1 NULL      NULL      NULL
2           ben          10am Aug 25   Firefox-6.0-build1 Firefox   6.0       abc
3           ben          12pm Aug 25   Firefox-6.0-build1 Firefox   6.0       123
4           ben          3pm Aug 25    Firefox-6.0-build1 NULL      NULL      NULL

We might be able to even do it without the initial NULL row if we're OK assuming that the first non-NULL change for a thing is an insert(). (This is _probably_ a safe assumption, but if we ever manipulate tables without using the admin interface, or something else that obeys history, it could end up being incorrect.)


> For all three modifying functions, where is the appropriate place to add
> checks? Eg don't insert over the top of an existing row, row must exist to
> update or delete.

I think SQLAlchemys behaviour is good enough for us here. If I try to insert two records into the Releases table with the same name I get:
IntegrityError: (IntegrityError) column name is not unique u'INSERT INTO releases (name, product, version, data) VALUES (?, ?, ?, ?)' ('c', 'c', 'c', '{"three": 3}')

I can wrap that in an exception of our own, if you want.

update() and delete() still succeed even if you don't try to modify any rows, and you can check the rowcount attribute of the return to figure that out. Do you want different behaviour here?

> 
> >+class Permissions(AUSTable):
> >+    def __init__(self, metadata):
> >+        self.table = Table('permissions', metadata,
> >+            Column('permission', String, primary_key=True),
> >+            Column('username', String, primary_key=True),
> >+            Column('options', String)
> >+        )
> >+        AUSTable.__init__(self)
> 
> Do we need a unique row id to make history work on the permissions table ?
> updatePaths has rule_id, Releases has name, but we'd have to use a compound
> key here, right ?

I think the compound key will be OK. The History object is going to have to keep track of key columns in order to fix the insert() bug you mentioned, it shouldn't be an issue to track compound ones.
Oh, the other the big thing I still need to address is the transactions: they don't actually work the way I thought they would. Specifically, modifying a row in one transaction doesn't prevent it from being overwritten by a simultaneous query -- it simply hangs that query until the transaction complete.

We may need to use row level or table level locking to work around this. http://www.agiledata.org/essays/concurrencyControl.html describes some techniques for dealing with this. Given our requirements, pessimistic locking is probably ideal.
(In reply to Ben Hearsum [:bhearsum] from comment #12)
> What if we added a NULL row as well as a data row in insert()? Doing that in
> the above example would result in a history table that looks like this:
> (change_id) (changed_by) (timestamp)   (name)             (product)
> (version) (data)
> 1           ben          9:59am Aug 25 Firefox-6.0-build1 NULL      NULL    
> NULL
> 2           ben          10am Aug 25   Firefox-6.0-build1 Firefox   6.0     
> abc
> 3           ben          12pm Aug 25   Firefox-6.0-build1 Firefox   6.0     
> 123
> 4           ben          3pm Aug 25    Firefox-6.0-build1 NULL      NULL    
> NULL

Looks good. Much easier to read, and hopefully leads to less code for rollback/rollforward. We should do something like 1 second instead of 1 minute though :-)

> > For all three modifying functions, where is the appropriate place to add
> > checks? Eg don't insert over the top of an existing row, row must exist to
> > update or delete.
> 
> I think SQLAlchemys behaviour is good enough for us here. ...
> I can wrap that in an exception of our own, if you want.

Having exception handling in the API/UI code will be fine I think.
This should be the last of my grab-bag-of-a-bunch-of-changes patches. It addresses all of the review comments, as well as some other things. Here's a summary:
- Fix default db names / dburi construction
- Rename UpdatePaths to Rules
- Remove AUS3.getMatchingRules, because it's fully redundant
- Add tons of comments/docstrings
- Rename convertRows to rowToDict, so that the decorator name is more obvious
- Add AUSTranaction, use in insert/delete/update to make sure changes are never made without a entry in the History table
- Add self.primary_key to AUSTable, for ease of use
- Split query functions for logical separation / easier testing
- Make history table columns nullable=False
- Move history table query generation into History class
- Make Rules.update_type nullable=False
- Make getMatchingRules take an updateQuery object instead of named parameters
- Add some query/helper functions to Permissions (discovered while working on bug 672918)
- Use history table structure described at the top of comment #14

And finally, I laid the groundwork for detecting update collisions by adding versioning support to AUSTable. insert/update have been modified to properly adjust it, but neither update nor delete checks its value prior to making changes. In order to finish that up, they'll all need to take an 'old_data_version' parameter and check it against data_version at the start (and maybe the end) of the transaction, and bail if they don't match. I'm probably going to have to limit them to only operate on a single row at a time to do that, which will put a small burden on the Controllers/Views, but shouldn't be a big deal.
Attachment #558915 - Flags: review?(nrthomas)
Attachment #558915 - Flags: review?(catlee)
Whiteboard: [aus3]
Attachment #558915 - Flags: review?(catlee) → review+
Comment on attachment 558915 [details] [diff] [review]
address review comments, other enhancements

Looks good, r+ and you can fix/rebut my comments as you see fit.

>diff --git a/auslib/db.py b/auslib/db.py
>+    """Base class for all AUS Tables. By default, all tables have a history
>+       table created for them, too, which mirrors their own structure and adds
>+       a record of who made a changed, and when the change happened.

typo nit - changed -> change.
       
>+    def _prepareInsert(self, trans, changed_by, **columns):
...
>+        ret = trans.execute(self._insertStatement(**data))
>+        if self.history:
>+            for q in self.history.forInsert(ret.inserted_primary_key, data, changed_by):
>+                trans.execute(q)

insert() modifies the main table then does the history, while update() & delete() do the history first. Any particular reason for that ? If not lets make them consistent.

>+    def _prepareDelete(self, trans, where, changed_by, old_data_version=None):
>+        """Prepare a DELETE statament for commit. If this table has history enabled,
>+           a row will be created in that table representing the new state of the
>+           row(s) being deleted (NULL). If versioning is enabled

nit - comment trails off into nothingness ...

Probably worth removing the old_data_version argument until we have full support for it.

>+    def _updateStatement(self, where, what):
...
>+        data = what.copy()
>+        query = self.t.update(values=what)

data doesn't seem to be used. Did you mean to or is it a leftover ?

>+    def _prepareUpdate(self, trans, where, what, changed_by):
>+        """Prepare an UPDATE statement for commit. If this table has vesrioning enabled,

typo nit - vesrioning -> versioning.

>+           data_version will be increased by 1. If this table has history enabled, a
>+           row will be added to that table represent the new state of the data.

data_version seems to act like a kind of modification count for the row, to help when you have to wait for a lock to write and the data changed in the meantime. We also have a data_version in the json blobs for the releases, where it describes what types of data of the blob will contain (ie bug 459972 would bump the value). Would be great if we could use different names for the two roles - maybe easiest to switch to schema in the blobs and AUS.py.

>+            modified += ret.rowcount

Is modified left over from debugging ? Doesn't seem to be used anywhere.

>+    def update(self, where, what, changed_by=None):
>+        """Perform an UPDATE statement on thi stable. See AUSTable._updateStatement for

typo nit - thi -> this.

>     def canEditUsers(self, username):
>         where=[
>             (self.username==username) &
>-            ((self.permission=='editusers') | (self.permission=='admin'))
>+            ((self.permission=='editusers') | (self.permission=='admin') |
>+             (self.permission=='/users/:id/permissions/:permission')
>+            )

I couldn't work out if there was any functional difference between these three ways to be allow user modification. Perhaps there will be in future ?

>     def grantPermission(self, changed_by, username, permission, options=None):
...
>+        if permission not in self.allPermissions.keys():
>+            raise ValueError('Unknown permission "%s"' % permission)
>+        if options:
>+            for opt in options:
>+                if opt not in self.allPermissions[permission]:
>+                    raise ValueError('Unknown option "%s" for permission "%s"' % (opt, permission))

These checks could be factored out and used by updatePermission() too.

>diff --git a/auslib/test/test_db.py b/auslib/test/test_db.py
>+class TestReleases(unittest.TestCase, MemoryDatabaseMixin):
>     def setUp(self):
...
>+        self.releases.t.insert().execute(name='c', product='c', version='c', data=json.dumps(dict(three=3)), data_version=1)
>+        self.releases.t.update(self.releases.name=='d').execute(product='d')

What is this update for ? Doesn't look it's going to match on anything.
Attachment #558915 - Flags: review?(nrthomas) → review+
(In reply to Nick Thomas [:nthomas] from comment #16)
> >+    def _prepareInsert(self, trans, changed_by, **columns):
> ...
> >+        ret = trans.execute(self._insertStatement(**data))
> >+        if self.history:
> >+            for q in self.history.forInsert(ret.inserted_primary_key, data, changed_by):
> >+                trans.execute(q)
> 
> insert() modifies the main table then does the history, while update() &
> delete() do the history first. Any particular reason for that ? If not lets
> make them consistent.

Good catch; before we changed the history table structure this difference was necessary. It's not anymore, so I've made them consistent.

> >+           data_version will be increased by 1. If this table has history enabled, a
> >+           row will be added to that table represent the new state of the data.
> 
> data_version seems to act like a kind of modification count for the row, to
> help when you have to wait for a lock to write and the data changed in the
> meantime. We also have a data_version in the json blobs for the releases,
> where it describes what types of data of the blob will contain (ie bug
> 459972 would bump the value). Would be great if we could use different names
> for the two roles - maybe easiest to switch to schema in the blobs and
> AUS.py.

Good point. I've done as you suggest and modified the blob one - schema seems like it makes more sense than data_version there anyways.

> >+            modified += ret.rowcount
> 
> Is modified left over from debugging ? Doesn't seem to be used anywhere.

Half-finished changes to the function, actually :(. Fixed up in this patch.

> >     def canEditUsers(self, username):
> >         where=[
> >             (self.username==username) &
> >-            ((self.permission=='editusers') | (self.permission=='admin'))
> >+            ((self.permission=='editusers') | (self.permission=='admin') |
> >+             (self.permission=='/users/:id/permissions/:permission')
> >+            )
> 
> I couldn't work out if there was any functional difference between these
> three ways to be allow user modification. Perhaps there will be in future ?

editusers is exactly the same as /users/:id/permissions/:permission. In fact, I dropped it from other parts of the code (eg, not listed in allPermissions), but missed this instance. Sorry, the previous patch apparently wasn't well self-reviewed.

> >diff --git a/auslib/test/test_db.py b/auslib/test/test_db.py
> >+class TestReleases(unittest.TestCase, MemoryDatabaseMixin):
> >     def setUp(self):
> ...
> >+        self.releases.t.insert().execute(name='c', product='c', version='c', data=json.dumps(dict(three=3)), data_version=1)
> >+        self.releases.t.update(self.releases.name=='d').execute(product='d')
> 
> What is this update for ? Doesn't look it's going to match on anything.

Bad copy/paste.


In addition to addressing the review comments, I finished implementing versioning checking for delete/update. I think the code and the comments should explain it well enough, but let me know if there's anything that needs further explanation.
Attachment #558915 - Attachment is obsolete: true
Attachment #564619 - Flags: review?(nrthomas)
Attachment #564619 - Flags: review?(catlee)
Comment on attachment 564619 [details] [diff] [review]
finish implementing version checking on delete/update, address review comments

Review of attachment 564619 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Make sure to fix the old version checks on deletes and updates, as well as the precision on the history timestamp column.

::: auslib/db.py
@@ +72,5 @@
> +            e = TransactionError(e.message)
> +            raise TransactionError, e, tb
> +
> +    def rollback(self):
> +        self.trans.rollback()

Do you know about context managers? If you provide __enter__ and __exit__ methods to AUSTransaction, you should be able to write code like:

with AUSTransaction(self.getEngine().connect()) as trans:
    ret = self._prepareInsert(trans, changed_by, **columns)
    trans.commit()
    return ret

@@ +250,5 @@
> +        row = self._returnRowOrRaise(where=where, columns=self.primary_key)
> +        if self.versioned:
> +            self._assertOldDataVersionMatches(row, old_data_version)
> +
> +        ret = trans.execute(self._deleteStatement(where))

Checking the data_version should be done as an additional where clause on your delete statement rather than a distinct operation. Updates should be qualified with this check as well.

@@ +368,5 @@
>      def __init__(self, metadata, baseTable):
>          self.table = Table('%s_history' % baseTable.t.name, metadata,
>              Column('change_id', Integer, primary_key=True, autoincrement=True),
> +            Column('changed_by', String, nullable=False),
> +            Column('timestamp', Float, nullable=False)

Think this should be a Double...mysql floats are 32-bits I think, so we'd lose precision.
Attachment #564619 - Flags: review?(catlee) → review+
Comment on attachment 564619 [details] [diff] [review]
finish implementing version checking on delete/update, address review comments

>diff --git a/generate-json.py b/generate-json.py
>-    relData = {"name": options.name, "data_version": DATA_VERSION, "platforms": {},
>+    relData = {"name": options.name, "schema_version": DATA_VERSION, "platforms": {},

Nit - lets make the constant SCHEMA_VERSION to keep it all consistent. Otherwise the interdiff looks good.
Attachment #564619 - Flags: review?(nrthomas) → review+
I tried various ways of storing the timestamp as a Float, Double, and Numeric and none of them seemed to get returned properly. Multiplying gets us the precision we need easily enough.
Attachment #564619 - Attachment is obsolete: true
Attachment #566197 - Flags: review?(nrthomas)
Attachment #566197 - Flags: review?(catlee)
Comment on attachment 566197 [details] [diff] [review]
add context management to AUSTransaction; make timestamps precise to the millisecond; add data_version to update/delete clauses; fix last reference to DATA_VERSION

Review of attachment 566197 [details] [diff] [review]:
-----------------------------------------------------------------

tiny fix to getTimestamp() - other than that looks great!

::: auslib/db.py
@@ +387,5 @@
> +        AUSTable.__init__(self, history=False, versioned=False)
> +
> +    def getTimestamp(self):
> +        t = int(time.time() * 1000)
> +        log.debug("History.getTimestamp: returning %d" % t)

LIES! You should return t below.
Attachment #566197 - Flags: review?(catlee) → review+
(In reply to Chris AtLee [:catlee] from comment #21)
> > +        AUSTable.__init__(self, history=False, versioned=False)
> > +
> > +    def getTimestamp(self):
> > +        t = int(time.time() * 1000)
> > +        log.debug("History.getTimestamp: returning %d" % t)
> 
> LIES! You should return t below.

Whoops, will do.
Attachment #566197 - Flags: review?(nrthomas) → review+
Comment on attachment 566197 [details] [diff] [review]
add context management to AUSTransaction; make timestamps precise to the millisecond; add data_version to update/delete clauses; fix last reference to DATA_VERSION

(In reply to Chris AtLee [:catlee] from comment #21)
> > +        AUSTable.__init__(self, history=False, versioned=False)
> > +
> > +    def getTimestamp(self):
> > +        t = int(time.time() * 1000)
> > +        log.debug("History.getTimestamp: returning %d" % t)
> 
> LIES! You should return t below.

Landed, with this fixed.
Attachment #566197 - Flags: checked-in+
I'm going to call this bug FIXED -- with the latest patch landed we have all of the core pieces I can think of implemented. Any other core work can go in more specific bugs, and additional methods on the table classes (Releases, Rules, etc.) can be implemented at the same time as the web pieces that require them.

Thanks for going through all the churn here, Chris and Nick.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: