Open Bug 1141426 Opened 9 years ago Updated 9 years ago

Bugzilla does not handle PostgreSQL serialization errors

Categories

(Bugzilla :: Database, defect)

defect
Not set
normal

Tracking

()

People

(Reporter: mtyson, Unassigned)

References

Details

Attachments

(1 file)

Attached file Reproducer
Note: this bug only happens when using Bugzilla with a Postgres database.  See bug 514778 and bug 1108821 for concrete examples of this happening.

Bugzilla runs the database using REPEATABLE READ serialization.  A side effect of that level of serialization is that under postgres, transactions may be terminated if two or more transactions attempt to update the same row at the same time.

To understand this bug its important to read and understand how postgres handles transaction isolation, for some light background reading, please read that section of the postgres manual.  Particularly section 13.2.2.

http://www.postgresql.org/docs/9.1/interactive/transaction-iso.html

The key problem is this passage in relation to REPEATABLE READ transaction isolation from the Postgres manual.

> Applications using this level must be prepared to retry transactions due to serialization failures.

Postgres handles locking at the row level, so any concurrent updates to the same row will trigger this problem.  EG an email job updating the lastdiffed column can cause a bug update to fail if a user or rpc script updates a bug at the same time a mail job for a previous update for the same bug is being processed.

The first transaction to commit will succeed, and any other transactions manipulating the same row(s) will be terminated with the following error message:

> ERROR:  could not serialize access due to concurrent update

To replicate or simulate this problem, see the attachment for instructions.

(Note: Under MySQL,the second transaction will succeed, possibly clobbering the first transaction)

So, the question is, what is the best way to handle this error?  Currently the user is just presented with a stack trace displaying the serialization error.

IMO it isn't safe to always retry or replay the transaction, as the underlying data has changed (lost update problem)

Job queue items such as email spooling can simply be put back into the queue and retried again later.

Web UI and RPC is another problem.  We could present the user with a mid-air collision type warning, except we may not be able to identify what has changed.  Sometimes the changes are not really colliding as such (eg the lastdiffed example above) and quite likely it would appear to the user that nothing has changed.

My current plan is to have bugzilla throw the user an error stating that the operation failed due to a concurrent update, and ask them to reload the bug and resubmit. I'm open to other ideas though.
OS: Linux → All
Hardware: x86_64 → All
Summary: Bugzilla does not handle postgres serialization errors. → Bugzilla does not handle PostgreSQL serialization errors
So, just an update on this.

I've patched this problem to some extent within the BRC codebase (See linked BRC bugs)

I've solved it in some places by reducing the transaction isolation level to READ COMMITTED.  I've done this in the following places:

Bugzilla::Search::Recent (BRC 1276192)
Bugzilla::Auth::Login::Cookie (BRC 1235135)

These transactions were causing serialization errors because timestamp updates were getting overwritten by concurrent transactions.  Since this data isn't really that important, reducing the transaction isolation was the simplest fix.

The last one was a bit more interesting.  It will only happen if you have multiple web servers committing transactions that update flags. (BRC 1276196)

When a flag is set, a mail for the flag change is created and stored in the mail_staging table.  When the transaction is committed, that flag change mail then gets saved into the mail_staging table.  After the transaction is committed, bugzilla will attempt to send the mail.  This is done outside of a transaction.

If two webservers do this at the same time, two things can happen.
1) A duplicate of the flag mail can be sent.
2) A serialization error when both web heads attempt to delete the sent mail.

I've resolved both issues by using a transaction and a row level lock with nowait.  It's pretty hacky but it works for now.  The first transaction will lock the mails and process them.  The second will fail to grab the lock and abort processing of the table.

There is a chance of a delayed mail in this sequence of events.  I think this unlikely to be a problem as the next transaction committed will flush out the table.  This happens fairly often BRC.

This method is a postgres specific fix and not suitable for sending upstream.  If upstream wanted this fix it would need to be solved another way.

If upstream is interested in these fixes, please let me know and we can work out how best to solve them.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: