Closed Bug 764214 Opened 12 years ago Closed 12 years ago

Refactor SyncStorage backend API for better atomicity

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rfkelly, Assigned: rfkelly)

References

Details

(Whiteboard: [qa?])

Attachments

(3 files, 3 obsolete files)

The Sync2.0 and AITC protocols rely on X-Last-Modified and X-If-Unmodified-Since for concurrency and conflict management.  For this to work without race conditions, the checking of these headers needs to be pushed down into the storage backend so that it can be done "atomically" from the point-of-view of the store.

Attached is a proposed modification of the SyncStorage backend interface done with this in mind.

As an example of the change, take the set_item() method which is used to handle PUT /storage/collection/item requests.  To handle X-If-Unmodified-Since with the current backend API, there has to be some higher-level logic in the controller that checks it like this:


    last_modified = backend.get_item(userid, collection, item)["modified"]
    if last_modified > request.headers["X-If-Unmodified-Since"]:
        raise HTTPModified

    new_modified = backend.set_item(userid, collection, item, data)
    return HTTPNoContent(headers={"X-Last-Modified": new_modified})


This is racy, since a concurrent process might have overwritten the data in between the check and the write.

With the proposed change, we push that check down into the backend so it can be done atomically:


    if_unmodified = request.headers["X-If-Unmodified-Since"]
    try:
        res = backend.set_item(userid, collection, item, data, if_unmodified)
    except ModifiedError:
        raise HTTPModified
    else:
        return HTTPNoContent(headers={"X-Last-Modified": res["last_modified"]})


Thoughts?
Attachment #632466 - Flags: feedback?(telliott)
Whiteboard: [qa?]
Comment on attachment 632466 [details]
proposed interface definition for storage backends

Is it your preference to do this rather than locking (which would eliminate race conditions)?

This mostly looks good, though I have a couple concerns:

1) As defined here, most implementations of the API will still have a race condition. The window may be smaller, because you've pushed this logic closer to the actual write, but the base logic remains the same and many systems don't have the ability to do what you want synchronously. Is your intent to simply mandate the requirement and consider non-conforming libraries to be buggy? 

2) By pushing this logic down a level, you're requiring reimplementation for all clients, and internal to a lot more functions than before. The advantage of having it in the controller is that all of the fiddly bits only have to be written once. Admittedly, a race-conditiony once, but there's value in not having a repeated code pattern all over the place.

3) (Ignore this, it's totally semantic, but...) requiring that set_items return a QuotaError is a little problematic insofar as many storage engine wouldn't be quota-aware. For example, we don't want our mysql library checking quota - it's handled by the memcache layer. Technically, the way to do that would be to pass a quota in (if any) and act on that, but we're rapidly spiraling down into showing a whole lot of metainformation at the interface. Maybe it's just the two.

So, in short, there's nothing here that's not valid for pursuing further, but I'm not sure it's going to give you what you're hoping to get out of it.
Attachment #632466 - Flags: feedback?(telliott) → feedback+
(In reply to Toby Elliott [:telliott] from comment #1)
> Comment on attachment 632466 [details]
> proposed interface definition for storage backends
> 
> Is it your preference to do this rather than locking (which would eliminate
> race conditions)?

Do you mean client-driven session-level locking?  

The other alternative I'm considering is an explicit lock/unlock or session api as part of the backend interface.  The controller code could do something like this:

   with storage.read_lock():

      last_modified = backend.get_item(userid, collection, item)["modified"]
      if last_modified > request.headers["X-If-Unmodified-Since"]:
          raise HTTPModified

      new_modified = backend.set_item(userid, collection, item, data)
      return HTTPNoContent(headers={"X-Last-Modified": new_modified})

This would map nicely to a backend that used explicit locking, not necessarily to e.g. memcached where you would want to use its atomic compare-and-set operations to achieve atomicity.

> This mostly looks good, though I have a couple concerns:
> 
> 1) As defined here, most implementations of the API will still have a race
> condition. The window may be smaller, because you've pushed this logic
> closer to the actual write, but the base logic remains the same and many
> systems don't have the ability to do what you want synchronously. Is your
> intent to simply mandate the requirement and consider non-conforming
> libraries to be buggy?

The SQL backend could use row-level locking to maintain these semantics, and the memcached backend could use atomic compare-and-swap.  A hypothetical cassandra backend would need to add in something like zookeeper for locking, or just accept the potential for a race.

Basically, I'm trying to push the responsibility for this down into the backend, so we can implement it in whatever way makes the most sense given the abilities and limitations of each backend.

> 2) By pushing this logic down a level, you're requiring reimplementation for
> all clients, and internal to a lot more functions than before. The advantage
> of having it in the controller is that all of the fiddly bits only have to
> be written once. Admittedly, a race-conditiony once, but there's value in
> not having a repeated code pattern all over the place.

True.  A session/locking driven approach would fix this, but I'm not sure it would be a net positive overall.
(In reply to Ryan Kelly [:rfkelly] from comment #2)

> The other alternative I'm considering is an explicit lock/unlock or session
> api as part of the backend interface.  The controller code could do
> something like this:
> 
>    with storage.read_lock():
> 

This is the model I was thinking of.
Based on our conversations about this, here is an alternate proposal for your consideration.  It exposes an explicit "session" object on which locks can be acquired as needed.

Using this API, a PUT /storage/collection/item request would look like:

    with backend.start_session(userid) as session:
        session.lock_for_write(collection)
        last_modified = session.get_item(collection, item)["modified"]
        if last_modified > request.headers["X-If-Unmodified-Since"]:
            raise HTTPModified
        new_modified = session.set_item(collection, item, data)
        return HTTPNoContent(headers={"X-Last-Modified": new_modified})

I think having an explicit session object will also help with some other fiddly corner-cases in our API, e.g. the double-read problem described in Bug 764246.  It will also reduce the code-duplication from your point (2) above.

Thoughts?
Attachment #635209 - Flags: feedback?(telliott)
You may also notice that the QuotaError stuff is gone, replaced with a single method that gets the total size for a user.  You're quite right that this is better implemented at a higher level - the only point in pushing it down into the backend API would be if some backends had a native quota implementation that was more efficient than generic code in the controller.  I can't imagine any such backend being developed.
What if we just added "get_lock" and "get_write_lock" methods to the storage class and didn't worry about the session object implementation (we could use one, but it wouldn't be required)

Then we'd have something like 

with backend.get_write_lock(userid):
OK, I am attaching an updated interface definition based on our discussions this past week.  I have also basically completed a migration of the SQL and Memcached backends to this interface, which I will open as independent bugs.
Attachment #632466 - Attachment is obsolete: true
Attachment #635209 - Attachment is obsolete: true
Attachment #635209 - Flags: feedback?(telliott)
Attachment #638329 - Flags: review?(telliott)
Blocks: 770159
Blocks: 770162
Comment on attachment 638329 [details]
proposed interface definition for storage backends

Do we need an unlock function? The definition of lock_for_read specifies how to unlock, so it seems redundant.

Obviously, I can do

with storage.lock_for_read(10, 20):
  do my things

and that's intuitive, but is there also a more procedural interface?

storage.lock_for_read(10, 20)
do my things
storage.unlock(10, 20)

Are there advantages to having both? I appreciate the desire to be flexible, but unless there are specific use cases for choosing one over the other, it may not be worth the choice-confusion.
(In reply to Toby Elliott [:telliott] from comment #8)
> Comment on attachment 638329 [details]
> proposed interface definition for storage backends
> 
> Do we need an unlock function? The definition of lock_for_read specifies how
> to unlock, so it seems redundant.
> [...snip...]
> Are there advantages to having both? I appreciate the desire to be flexible,
> but unless there are specific use cases for choosing one over the other, it
> may not be worth the choice-confusion.

The unlock() method is mostly there so it can be called by the context-manager helper object.  I don't have any use-cases in mind for these methods outside a "with" stagement. So I think we can remove it from the public API in the interests of "Only One Obvious Way To Do It".  Updated file attached.
Attachment #638329 - Attachment is obsolete: true
Attachment #638329 - Flags: review?(telliott)
Attachment #639506 - Flags: review?(telliott)
Comment on attachment 639506 [details]
proposed interface definition for storage backends

Looks good. The explanatory comment in the header is a big plus, too.
Attachment #639506 - Flags: review?(telliott) → review+
Attaching the updates to controller.py for review as a separate patch.  The main points of change are:

  * a convert_storage_errors decorator to turn e.g. CollectionNotFound exceptions into a HTTPNotFound response.  This removes a lot of conditional error-checking throughout the code.

  * with_read_lock and with_write_lock decorators to take locks for the duration of a method call.  They are applied to methods that deal with BSOs, but not to methods like get_collection_timestamps() that only deal with summary information, so that we pay no locking overhead in the common case where there is nothing to sync.

  * quota-checking and recalculation is now the responsibility of the controller.

  * the value of request.server_time is no longer passed into the backend.  Instead, the backend generates its own timestamps and reports them back to the controller.
Attachment #640138 - Flags: review?(telliott)
Attaching the updated testcases as a separate patch, for easier review.  Key changes:

  * use X-Last-Modified rather than X-Timestamp for most of the timestamp-checking tests

  * added syncstorage/tests/test_storage.py with some generic storage-backend testcases, and refactored test_sql.py and test_memcachesql.py to use it.

  * added a test to ensure that read/write locking works correctly.

  * and lots of tweaks for the new API, e.g. replacing assertEquals(None) with assertRaises(CollectionNotFound)
Attachment #640139 - Flags: review?(telliott)
Attachment #640138 - Flags: review?(telliott) → review+
Attachment #640139 - Flags: review?(telliott) → review+
https://github.com/mozilla-services/server-syncstorage/commit/a437a5d712955bc22ebb13c7d8b632aa7a632168
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: