Closed Bug 720964 Opened 12 years ago Closed 12 years ago

[meta] define and document Sync 2.0 protocol

Categories

(Cloud Services :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rfkelly, Assigned: rfkelly)

References

Details

(Keywords: meta, Whiteboard: [qa+])

Attachments

(1 file, 1 obsolete file)

      No description provided.
Component: Server: Sync → General
OS: Linux → All
QA Contact: sync-server → general
Hardware: x86 → All
Attached file file describing the sync2.0 protocol (obsolete) —
I've attached the current draft, and uploaded a built snapshot of the docs for viewing at http://people.mozilla.org/~rkelly/moz-services-docs/storage/apis-2.0.html

Summary of changes:

* The term "Weave" is no longer used anywhere in the protocol:
    * "Weave Basic Objects" have been renamed "Basic Storage Objects".
    * The "Weave" prefix has been removed from all custom headers.

* Authentication can now be performed using any HTTP Access Authentication
  method accepted by both client and server.  Mozilla-hosted services will
  accept only Sagrada Token Server authentication.

* URLs no longer contain a username component; the current user is taken from
  the authentication info and there is no way to refer to the stored data for
  another user.

* The WBO fields "parentid" and "predecessorid" have been removed, along with
  the corresponding query parameters on all requests.

* Timestamps are now reported in integer milliseconds rather than decimal seconds.

* The **GET /info/quota** request now returns an object with keys named "usage"
  and "quota", rather than just a list of numbers.

* The query parameters for **DELETE /storage/collection** have been removed.
  The only operations now supported are "delete these specific ids" and
  "delete the whole collection".

* The **POST /storage/collection** request now accepts application/newlines
  input in addition to application/json.

* The **POST /storage/collection** request now explicitly allows the server
  to process objects as they are received, and to error out partway through
  consuming the objects.

* The **application/whoisi** output format has been removed.

* The *X-If-Modified-Since* header has been added.

* The *X-Weave-Records* header has been renamed to *X-Num-Records*.

* The *X-Weave-Alert* header has been removed.

* The following response codes are explicitly mentioned: 304, 405, 412, 413.

* The collection name "addons" is now a default Mozilla collection.
I've added some more details on HTTP status codes per rnewman's requests in Bug 686584
Attachment #592962 - Attachment is obsolete: true
Very solid. We're down to the nitpicks from this end:

Can't call it Storage, since there are multiple storage types coming. I suggest SyncStorage, since it is sorage designed for Syncronization, but am open to other names.

Pathname description is confusing if you dont know why it needs to be defined. "the prefix associated with the service on the box. Allows for multiple different service entrypoints on the same machine"

Drop the decimal in version?

Shold we specify that quota numbers may be approximate in info/quota? Gives us a little breathing room

**ids**: returns the ids for objects in the collection that are in the provided comma-separated list. - I had a hard time parsing this. I think it's clearer if you say "a comma separated list of ids that restricts the returned values to only items in that set" or something similar. Maybe its just the second reference to "ids" that's confusing me.

PUT (and other timestamp responses): Is the timestamp response json-encoded? Heck, do we even need this if we're returning a header?

"If the request is aborted mid-flight, there is no way to determine which items were stored successfully and which were not." If we're only processing, say, 25 items, could we not return those 25 as success and the rest as "Too many records" or something? What's the expectation that triggers this? Do we need to differ between payloads that are too large in volume and ones that have more rows than we're willing to process?

Do we need to talk about the correct client behavior when they get a 401, retrieve a new token, then get another 401? Also, current 401 description implies that node change is the normal reason for a 401, whereas token expiration is going to be a zillion times more common.
> Can't call it Storage, since there are multiple storage types coming. I suggest SyncStorage,
> since it is storage designed for Synchronization, but am open to other names.

To be clear, do you also mean "can't use /storage/ as part of the service URL"?
+1 for SyncStorage, it gets right to the point.

> Drop the decimal in version?

-0 on purely aesthetic reasons.  I think having the decimal is a nice reminder that we intend to keep refining this protocol.

> PUT (and other timestamp responses): Is the timestamp response json-encoded? Heck,
> do we even need this if we're returning a header?

It's the raw number, formatted as a string and served as application/json.  I would be in favour of "204 No Content" with the X-Weave-Timestamp header.

> "If the request is aborted mid-flight, there is no way to determine which items were stored
> successfully and which were not." If we're only processing, say, 25 items, could we not 
> return those 25 as success and the rest as "Too many records" or something? What's the 
> expectation that triggers this?

It was a quick attempt to give us the necessary wiggle room for Bug 693812.
(In reply to Ryan Kelly [:rfkelly] from comment #4)
> > Can't call it Storage, since there are multiple storage types coming. I suggest SyncStorage,
> > since it is storage designed for Synchronization, but am open to other names.
> 
> To be clear, do you also mean "can't use /storage/ as part of the service
> URL"?
> +1 for SyncStorage, it gets right to the point.

No, I mean you can't title it "Storage API v2.0" :)



> > Drop the decimal in version?
> 
> -0 on purely aesthetic reasons.  I think having the decimal is a nice
> reminder that we intend to keep refining this protocol.
> 

OK, just want to make sure we thought about it.



> It's the raw number, formatted as a string and served as application/json. 
> I would be in favour of "204 No Content" with the X-Weave-Timestamp header.

WFM


> > "If the request is aborted mid-flight, there is no way to determine which items were stored
> > successfully and which were not." If we're only processing, say, 25 items, could we not 
> > return those 25 as success and the rest as "Too many records" or something? What's the 
> > expectation that triggers this?
> 
> It was a quick attempt to give us the necessary wiggle room for Bug 693812.

Hmm, at that point, we should choose one way or the other. Either we support streamed requests, or we support rejecting after a certain point. Trying to support both seems like a recipe for pain.
> Should we specify that quota numbers may be approximate in info/quota? Gives us a little
> breathing room

By contrast, is the intention that the info returned by /info/collection_usage be accurate?  Should we call that out specifically as well?
So, historically, we have returned accurate numbers for /info/collection_usage. However, that's with the understood contract with the client that it only calls it about 4 menus deep. It's *expensive*

We can obviously make it cheap if real accuracy isn't required. I'd say stick to current state for now (info/quota vague, info/collection_usage accurate), but we may rethink quotas in the future in ways that affects this behavior
Going by the current implementation, I've had to add this restriction on object ids:

"""BSO ids may contain any alphanumeric character as well as the period, underscore, hyphen, tilde, hash and question mark."""

Is there any particular justification for this set of characters that's worth noting in the docs?  Should we consider tightening or loosening this restriction?
It is quite an interesting character set, isn't it? My ideal vote would be strict alphanumeric for no particular reason other than it is common, easier to validate (often a built-in test function for that), and doesn't contain commonly reserved characters.

But, we are effectively base64 today. While new data will be uploaded for the new protocol version, existing IDs using a now invalid character set on existing clients would need to be updated. This may not be cheap and has a non-0 cost to implement. So, I'll modify my vote to be for the base64 character set. Common and cheap.
> So, I'll modify my vote to be for the base64 character set. Common and cheap.

urlsafe I presume?

It's also interesting that the current doc has example IDs like "{GXS58IDC}12" which don't actually meet the restrictions.  In the current code such IDs will work fine when accessed through GET/POST on the collection itself, but are not addressable through GET/PUT on the URL that includes their ID.
(In reply to Ryan Kelly [:rfkelly] from comment #10)
> > So, I'll modify my vote to be for the base64 character set. Common and cheap.
> 
> urlsafe I presume?

Base64 URL-safe, no line terminators, 12-chars is our standard for GUIDs.
… and as gps intimated, migrating existing client data to use a different standard would be difficult or impossible (e.g., changing GUID validation routines in the storage layer and Fennec…).

Any superset of the identifiers we currently use is fine by me.
Depends on: 734990
Depends on: 733629
Depends on: 735102
Depends on: 735103
Whiteboard: [qa+]
Depends on: 735458
Depends on: 736132
No longer depends on: 734990
Blocks: 727761
Summary: define and document Sync 2.0 protocol → [meta] define and document Sync 2.0 protocol
Keywords: meta
No longer depends on: 735102
Depends on: 739845
Depends on: 744187
Depends on: 768655
Depends on: 768663
Depends on: 774920
Depends on: 775395
Depends on: 775798
Depends on: 776768
Depends on: 735039
Depends on: 784592
No longer depends on: 768655
Depends on: 784599
Depends on: 787870
Depends on: 787871
No longer depends on: 768663
Depends on: 786523
No longer depends on: 786523
Depends on: 785045
Depends on: 793384
Depends on: 812091
Depends on: 814244
No longer depends on: 814244
With the error-reporting bugs closed out, I think we can finally stick a fork in this one.  Plenty of code-level improvements to continue with, but the protocol itself should be done.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: