121069 - (bz-transactions) Implement database transactions across Bugzilla

Reporter

Description

•

23 years ago

Spun off of bug #98304. Some databases support the notion of transactions, where an operation is considered to be atomic. Apparently PostgreSQL has good support for them - MySQL I believe supports them but not well. Transactions have two main benefits that I see: - When a transaction contains a write operation is occuring, all other transactions will see the database either as is was at the start, or as it was at the end. Failures of any sort will not result in a "corrupted" database where half a transaction was applied. Bug #104589 makes this problem serious, but even if we fix that transactions are still a good idea for robustness purposes. Computers can crash and software can have bugs. - Read-only operations should never be denied - they can always use the last copy of the database that was before any other transactions begun, which is guaranteed to be consistent by the definition of transactions. This means the shadow database is unnecessary as a performance improvement. I don't see that introducing transactions into the codebase would be very difficult. The API should be rather simple. You need the calls BeginTransaction, CommitTransaction. A shorthand call PerformTransaction that performs a list of SQL statements would also be good when you know up front what you want to do. Essentially I think we should be able to add these calls in a bit at a time. If the database in use does not support transactions there is no great loss. In this case the calls will essentially be no-ops and there is no loss over what we had before. I don't provide a RollbackTransaction call for this reason - not all databases will support it and we must not expect them to. I also think BeginTransaction and CommitTransaction provide another benefit - it is a nice solution to bug #104589, users closing the window shouldn't terminate scripts. A script should NEVER terminate during a transaction. We can implement this notion of "transaction" even in databases that don't support transactions. Indeed, I'm not sure we need to prevent script termination outside of transactions at all. There is also the issue of locking. Write locks are still probably useful in databases with transactions - without them a transaction can fail because it clashes in the typical "multiple writers" situation, and the transaction would need to be reapplied. We need to decide whether the complexity of retrying is better than the possibility of deadlocks that comes with locking. The former could be hidden away in a call such as PerformTransaction, but would need to be considered for CommitTransaction. Read locks are unnecessary for transactions in PostgreSQL, since it will maintain an old copy of the data for the reader. So if a traditional read lock was done on such a database, it would block writers when there is no particular need to. I'm not sure whether PostgreSQL would block writers or just ignore ignore the locks. If the former, we need to make sure read locks are only done for non-transaction mode. So the question here is essential, is locking something that BeginTransaction worries about? If read locks would block writers, it needs to be, as it can decide whether to do the read locks. If we want to do retrying sometimes instead of write locks, the same applies. The locks could be passed as a parameter to BeginTransaction and it would decide what to do. Unfortunately it isn't feasible for BeginTransaction to automatically compute what to lock at the start if necessary since it doesn't know what is going to happen in the transaction - PerformTransaction is not constrained in that way and so perhaps could.

Bradley Baetz (:bbaetz)

Comment 1

•

23 years ago

> Indeed, I'm not sure we need to prevent script termination > outside of transactions at all. Only if all dbs support transactions. Since we currently only support mysql, and you need a different, semi-experimental table type, I'd rather not rely on that. OTOH, that TERM bug is likely to be pushed out from 2.16 unless someone has a patch. Its possible that we could merge the idea in there of checking for user exited by doing so at the end of a transaction. > Read locks are unnecessary for transactions in PostgreSQL, since it will > maintain an old copy of the data for the reader. This is untrue. See my comments in the pgsql bug, giving examples where this is not the case. It is possible to avoid some read locks - for example, in mysql, tables used in a query must either be all locked, or none locked. This is not the case in postgresql. Some of the write locks can be turned into SELECT .. FOR UPDATE ON <tablename>, so that we have row level locking. The token stuff can probably lose most, if not all, table level locking that way. After 2.16, and the templatisation which moves towards relationSet, + getting rid of bug_form.pl, and so on, I want to audit our locking. Currently, it is theoetically possible for show_bug.cgi, for example, to show a different topic in the <title> than on the page, if a process_bug happened between the two queries. Moving to Bug.pm for things like this, where everything is selected at once (and in postgres, an individual SELECT is always self-consistent), would remove this problem. Other differences include mysql reording tables in the LOCK statement - the app has to do so itsself when locking tables. Also, if you want different types of locks, they must be in different statments in postgresql, although 7.2 can lock multiple tabels with the same type of lock in that case. Other comments: We do not want to do retrying. I forsee some kind of LockTables() call, which takes an array of {tablename, locktype} pairs, in the order they are to be locked, locktype would not only be read/write, but would degrade to READ/WRITE under mysql. See http://developer.postgresql.org/docs/postgres/sql-lock.html. Currently some of our write locking relies on SELECTs blocking, but that is too heavy for lots of cases - I'd want to distinguish with: READ - share WRITE - share row exclusive EXCLUSIVE - exclusive mode, same as WRITE for mysql. I think we can get away w/o ACCESS EXCLUSIVE if we use select for update. mysql 3.23 understands that syntax, and since I believe we're upping the requirement post-2.16, we can just add those calls to the SQL directly. However, mysql doesn't support the OF <tablename> syntax, to only loock a subset of the tbales used in the query, so maybe we will need a wrapper function. We may want a "READ if !($db->{'supports_SelectForUpdate'})" level, too. It may be possible to avoid exlusive locks entirely - we'll have to try it to find out. (BTW, I'm aiming for a DB::mysql, and DB::postgres type thing, which would have functions, and variables - postgres allows ON DELETE CASCADE, so the code which deletes groups from everywhere could be if(!$db->{'supportsCascade'}). Or something like that)

Dave Miller [:justdave]

Comment 2

•

23 years ago

You have to prevent others from reading if you are going to write anything that depends on the data you read that might change as a result of what you write. For example, on databases where we don't have autoincrement and we don't have a sequence counter, if you're going to get the next bug ID you have to lock everyone else out of reading from the bugs table until you determine what the next available bug ID is and snag it for yourself, otherwise another process might try to grab the same ID.

Bradley Baetz (:bbaetz)

Comment 3

•

23 years ago

Right- that was my comment in the pgsql bug. This brings up annother issue (Aside from 'why are we having technical discussions in bugzilla again'...). In postgres, sequence values, are always allocated, and never rolled back (to avoid blocking a transaction while waiting for a commit). This means that it is possible for holes to appear, either if you rollback (which we're not going to do), or if theres an error in a later statement in the same transaction. Since the number of INSERTs we do to the bugs or attachments tables (which are the only improtant ones which use ids) are small, does anyone see a problem with either locking the bugs table, and using MAX trick, or using a separate table which is locked? The latter is probably better, but may be a pain to impl is a cross-db fashion Anyone know what mysql does for that?

Matthew Tuck [:CodeMachine]

Reporter

Comment 4

•

23 years ago

>> Indeed, I'm not sure we need to prevent script termination >> outside of transactions at all. > Only if all dbs support transactions. Like I said, for MySQL we can still suppress TERM within a logical transaction even if we can't use a database transaction. >> Read locks are unnecessary for transactions in PostgreSQL, since it will >> maintain an old copy of the data for the reader. > This is untrue. See my comments in the pgsql bug, giving examples where this > is not the case. I'm not sure what you mean here. > Currently, it is theoetically possible for show_bug.cgi, for example, > to show a different topic in the <title> than on the page, if a > process_bug happened between the two queries. Isn't this what read transactions fix? > you have to lock everyone else out of reading from the bugs table until you > determine what the next available bug ID is and snag it for yourself, > otherwise another process might try to grab the same ID. OK, I believe two clashing transactions of this type would cause one to need to be retried, which is where you might want to write lock the number, but I think it would be better to pass this to something like PerformTransaction and then you don't need to lock since it can handle retrying transparently.

Matthew Tuck [:CodeMachine]

Reporter

Comment 5

•

23 years ago

I think a separate table for seqnums is good. I'm not sure how a separate table is not cross-db compatible, it would seem the opposite to me. It also doesn't require locking the whole record in MySQL, so wouldn't block bug updates for example.

Bradley Baetz (:bbaetz)

Comment 6

•

23 years ago

>> This is untrue. See my comments in the pgsql bug, giving examples where this >> is not the case. >I'm not sure what you mean here. http://bugzilla.mozilla.org/show_bug.cgi?id=98304#c19 and followups. >> Currently, it is theoetically possible for show_bug.cgi, for example, >> to show a different topic in the <title> than on the page, if a >> process_bug happened between the two queries. ?Isn't this what read transactions fix? Yes, but we don't actually do that for show_bug, and since that has to read from the main db, I don't think we want to. That point was more the "we need to audit locking" rather than to do with transactiosn directly, though. >OK, I believe two clashing transactions of this type would cause one to need to >be retried, which is where you might want to write lock the number, but I think >it would be better to pass this to something like PerformTransaction and then >you don't need to lock since it can handle retrying transparently. We want to avoid retrying. You'd have to pass a sub in, (and then we'd use serialised transactions, not read committed), and we'd just waste processing time which a simple lock could be avoid. There may be cases where this is needed; I can't think of any off hand

Bradley Baetz (:bbaetz)

Comment 7

•

23 years ago

postgresql actually does use a separate table - the serial datatype is a shortcut for an int field with a default value of the next value from a sequence. See http://developer.postgresql.org/docs/postgres/datatype.html#DATATYPE-SERIAL I don't know what mysql does with auto_incrememnt in transactions.

Matthew Tuck [:CodeMachine]

Reporter

Comment 8

•

23 years ago

> http://bugzilla.mozilla.org/show_bug.cgi?id=98304#c19 and followups. OK, I suppose if you do row-level locking this is true, I was thinking from a table-level where it would be a write lock instead. >> Isn't this what read transactions fix? > Yes, but we don't actually do that for show_bug Why on Earth not? That's the point of transactions. > We want to avoid retrying. You'd have to pass a sub in, I think it depends how you get seqnums. I was thinking about a record on a separate table, where you could use PerformTransaction, which would take a list of SQL statements. I think we could do that - can you say in SQL to add 1 to a number? If you used the MAX technique, you probably couldn't do it that way. > and we'd just waste processing time which a simple lock could be avoid. I suppose this is true, but I was mainly thinking of situations where this could actually save time. If the situation you are locking against is rare, it would be quicker on average to retry.

Bradley Baetz (:bbaetz)

Comment 9

•

23 years ago

>>> Isn't this what read transactions fix? >> Yes, but we don't actually do that for show_bug >Why on Earth not? That's the point of transactions. well, we'd want a serialized transaction then. See http://developer.postgresql.org/docs/postgres/transaction-iso.html and the following two pages (Actually, we want repeatable read, but postgres doesn't support those) Since show_bug doesn't shange anything, this would be OK. I do want to stress that that example had nothing to do with transactions, though, and more to the point that currently there is no read lock, even for mysql. Of course, adding that lock would mean locking all the other tables, so... This implies that BeginTransaction would have to take a flag variable. >> We want to avoid retrying. You'd have to pass a sub in, >I think it depends how you get seqnums. For seqnums we don't, in general we would (eg process_bug, which does lots of stuff) >> and we'd just waste processing time which a simple lock could be avoid. >I suppose this is true, but I was mainly thinking of situations where this >could actually save time. If the situation you are locking against is rare, it >would be quicker on average to retry. Hmm. Good point. I'll have to think about that.

Matthew Tuck [:CodeMachine]

Reporter

Comment 10

•

23 years ago

> This implies that BeginTransaction would have to take a flag variable. Do you mean for the transaction type? I forgot to mention that, yes that could possibly be one parameter to BeginTransaction. I believe there are four different types and PostgreSQL supports two of them although I'm not sure what they are. To me everything should use the strongest type. I suppose the other types are for better performance. I would suggest to use the strongest type by default but allow it to be overridden where we're really sure it is unnecessary. Although even then someone might modify the transaction so it is necessary - so we need to be careful of that, even though its a "bug" that bites us no more than having no transactions. PerformTransaction could, I imagine, work out what type is needed up front. If we took transaction type as a parameter (or created different begin subs), would we support just the types that PostgreSQL supports, or all the types in case other databases support them? I understand you could always choose the next strongest if a type isn't available, and PostgreSQL supports the strongest.

Bradley Baetz (:bbaetz)

Comment 11

•

23 years ago

Well, there are 4 types according to ANSI sql. postgres supports 2, and the mysql docs mention all 4. I don't know if all 4 are only supported in 4.0, or if 3.23+dbd tables support them all. Yeah, going up to the next highest is possible. However, any further discussion of this is going to have to wait until we've tried it out - I think we're getting ahead of ourselves, without having actually tried this out. > PerformTransaction could, I imagine, work out what type is needed up front. Ug. No, it should be explicit. Maybe with a default, but bugzilla shouldn't try to scan sql to work out what we want.

Matthew Tuck [:CodeMachine]

Reporter

Comment 12

•

23 years ago

>> PerformTransaction could, I imagine, work out what type is needed up front. > Ug. No, it should be explicit. Maybe with a default, but bugzilla shouldn't > try to scan sql to work out what we want. I assume your concerns here are about performance. We could definitely set the default to 'auto'. The question would therefore be, what is a greater performance drain, the computation of the transaction mode, or the cost of having too strict a transaction? Neither is very obvious, and the latter is possibly difficult to measure entirely when you consider there might be some sort of cache effects. And the above tradeoff could be different for different databases.

Matthew Tuck [:CodeMachine]

Reporter

Comment 13

•

23 years ago

Some more thoughts: To fully support transactionless installations, as a part of the API we still need to allow specification of the locks that get done when transactions aren't on. So essentially all the locking and transaction logic is hidden in the transaction API. Now, bbaetz's sequential number example before raises a distinction here - the difference between an essential lock and a transactionless lock. In the first type, the lock must get done regardless of transactions. This type is quite rare I believe. The second type only need get done when you're not using transactions, as MVCC will take care of your problems. So, do we allow specification of the difference between these lock types? If we pass something similar to our current system of "table1 READ, table2 WRITE", it might be a little hard to parse. We could instead pass this as a hash of table to type. But given the rareness of the essential lock situation, perhaps a better solution is in these situations to use SendSQL directly and not use the transaction API. Allow me to contradict my earlier statement that we can't in general determine the transaction type we want. If we do pass locks for transactionless installation, we can possibly compute the transaction type using this data. I'd need to learn more about the transactions to how accurate this process could be. We can certainly easily discover whether a transaction is read-only, write-only or read-write. If this data was a hash, processing of this information could be relatively quick. The reason I am so interested in automatic computation is I would prefer to avoid the class of bugs where the transaction type is too weak - I see this as being too easy to do. It's possible we could move this checking into the testing suite, but this would not be easy.

Matthew Tuck [:CodeMachine]

Reporter

Comment 14

•

23 years ago

I think full transaction support should mean: - We have all the necessary documentation and code to support transactions on both MySQL and PgSQL. - On all databases, regardless of whether they support transactions, an administrator should be able to turn off transactions.

Matthew Tuck [:CodeMachine]

Reporter

Updated

•

23 years ago

Priority: -- → P2

Target Milestone: --- → Bugzilla 2.18