Closed Bug 734990 Opened 12 years ago Closed 12 years ago

Allow X-If-Modified-Since for all GET requests

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rfkelly, Assigned: rfkelly)

References

Details

(Whiteboard: [qa?])

Attachments

(3 files)

Following up on rnewman's email to services-dev, it seems sensible to support X-If-Modified-Since as a header for *all* GET requests.  Requests to /info/* will 304 if no collections have been modified, requests to /storage/collection will 304 if that collection has not been modified, and requests to /storage/collection/id will 304 if that specific object has not been modified.

Suggested docs patch attached, thoughts?
Attachment #605062 - Flags: feedback?(rnewman)
Attachment #605062 - Flags: feedback?(gps)
Comment on attachment 605062 [details] [diff] [review]
patch documenting X-If-Modified-Since on all GET requests

Review of attachment 605062 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. f+ as it is.

Do we also want to document If-Modified-Since in addition to X-If-Modified-Since? It seems kinda silly to accept X-If-Modified-Since but not If-Modified-Since.

Also, are we OK with using mtime as the resource validator? In theory, content validators (strong validators in HTTP spec parlance - see RFC 2616 13.3.3) are much better. Do we have any desire to use If-None-Match as our conditional request mechanism? I haven't fully thought through the impact on the client and server in terms of performance, etc. Just throwing the idea out there.
Attachment #605062 - Flags: feedback?(gps) → feedback+
Supporting X-If-Modified-Since on all GET calls requires us to store an additional piece of metadata and potentially do two calls to memcache (one for that, one for the full blob if it has updated) or always introspect the JSON blob of timestamps. This is unlikely to be too bad, but it does present a cost to the server. That cost may be offset by decreased data transmission, and it does have the nice bonus of letting us log 304 rates on this stuff, but it should be a consideration.
I'm thumbs down on supporting If-Modified-Since. While it seems a bit weird on the surface, it requires us to start parsing dates with way less precision than we're using. We don't want to encourage that.

I'd prefer not to use If-None-Match for much the same reason - there's not a lot of upside to supporting both, and it can lead to splitting. In this case, though, there are bigger downsides - if we lose a user's etags, we have no way to regenerate it from the database, whereas max timestamps can be. That means we couldn't store etags in memcache and are looking at infra changes.
(In reply to Toby Elliott [:telliott] from comment #3)
> if we lose a user's etags, we
> have no way to regenerate it from the database, whereas max timestamps can
> be. That means we couldn't store etags in memcache and are looking at infra
> changes.

I think you are wrong about etags. An etag is merely an opaque string and can be computed any way we want (e.g. SHA1s of the entity body). It is purely an optimization that many HTTP servers use non-isomorphic etags (e.g. the filesystem integer identifier for a file in the case of Apache HTTPd).

While it certainly isn't a traditional usage, I don't believe there is anything forbidding us from having the client compute etags itself and issue them in If-None-Match requests to a server that knows what to do with them.
(In reply to Gregory Szorc [:gps] from comment #4)
> (In reply to Toby Elliott [:telliott] from comment #3)
> > if we lose a user's etags, we
> > have no way to regenerate it from the database, whereas max timestamps can
> > be. That means we couldn't store etags in memcache and are looking at infra
> > changes.
> 
> I think you are wrong about etags. An etag is merely an opaque string and
> can be computed any way we want (e.g. SHA1s of the entity body). It is
> purely an optimization that many HTTP servers use non-isomorphic etags (e.g.
> the filesystem integer identifier for a file in the case of Apache HTTPd).
>
> While it certainly isn't a traditional usage, I don't believe there is
> anything forbidding us from having the client compute etags itself and issue
> them in If-None-Match requests to a server that knows what to do with them.

These are both problematic, though the observation that we could use hashed entity bodies does relieve some of the problem on the server. 

The core problem is that we do updates in batches. That means that on both sides (clients computing and server recomputing), there's going to be some unnecessary mismatches. 

It could be done - you'd need server to control everything and just have the client cache responses - but I'm not yet seeing the benefits of changing over to it. Collision avoidance?
I'm usually pro etags, but I don't think they're a good fit in this case.  The protocol already depends so heavily on mtime providing proper change detection that X-If-Modified-Since is the most obvious fit.  And if there are cases where an etag would behave differently to an mtime then we've probably got bigger consistency problems.

Since it costs us nothing to allow this in the spec and just ignore the header in the server if it performs badly, I'm going to go ahead and commit this to the docs repos.
To be clear, "I'm going to go ahead and commit *X-If-Modified-Since* support to the docs repo"
Comment on attachment 605062 [details] [diff] [review]
patch documenting X-If-Modified-Since on all GET requests

Review of attachment 605062 [details] [diff] [review]:
-----------------------------------------------------------------

One sharp edge: the spec doesn't define what happens if:

  GET info/collections
   - save X-Weave-Modified
  DELETE storage/foobar
  GET info/collections with X-I-U-S

That is, none of the *remaining* collections have been modified since the specified mtime, but the representation returned in response to a request for info/collections is *very much* different, because a whole collection is missing! (The same applies to `DELETE storage`, as we do in a wipe. That's a really important one!)

That is: X-I-U-S on an info/* resource applies to the resource, not the extant living collections. Deleting or creating a collection should bump the implicit info/* timestamp.

(This means that one obvious implementation, as a check against collection timestamps with deletion mapping directly to a SQL DELETE and dropping a value from memcache, isn't sufficient.)

It would be good for the docs to clarify this.
Attachment #605062 - Flags: feedback?(rnewman) → feedback+
Indeed. s/modified/modified or deleted/g.
Whiteboard: [qa?]
Assignee: nobody → rkelly
Blocks: 727761
No longer blocks: 720964
OK, we finally have a patch to implement this.  I've added the notion of "storage timestamp" to the storage backends, which is the last time at which any modification was done inside any collection.

As implemented, this does *not* behave correctly in the face of deletes because the underlying storage engines do not track deletions.  That's coming in a separate patch and operates at a level below the controller code that I'm modifying here.
Attachment #614667 - Flags: review?(telliott)
Blocks: 745072
Whoops, I left out the changes to MemcachedSQLStorage that allow the overall storage timestamp to be cached.  Revised patch attached.
Attachment #614691 - Flags: review?(telliott)
Comment on attachment 614667 [details] [diff] [review]
patch to support x-if-[un]modified-since on all requests

oops, forgot to cancel review on the older patch; done now
Attachment #614667 - Flags: review?(telliott)
Comment on attachment 614691 [details] [diff] [review]
patch to support x-if-[un]modified-since on all requests

I'm going to r+, but I think you should strongly consider renaming one of stamp/stamps. That naming is just inviting really horrible bugs in the future.
Attachment #614691 - Flags: review?(telliott) → review+
https://github.com/mozilla-services/server-syncstorage/commit/3d1ce83fa8845e47d1e6bf429f8f152c444eb48d
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Code verified and passed functional tests of Sync 2.0 and AITC 1.0 for a 4/19/2012 deploy to qa1.
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: