Closed Bug 959034 Opened 10 years ago Closed 5 years ago

Support "409 Conflict" errors from sync1.5 storage servers

Categories

(Firefox :: Sync, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox29 --- wontfix
firefox30 - wontfix
firefox31 - affected

People

(Reporter: rfkelly, Unassigned)

References

Details

(Whiteboard: [qa?])

Per the proposed sync1.5 protocol docs in Bug 951986, the sync storage servers may return a "409 Conflict" error when two clients attempt to update a collection simultaneously.  This is designed to prevent data loss that can occur when records from different clients get assigned the same timestamp.

Desktop sync needs to be prepared to handle this error.  It should be safe to initially treat it similarly to a "503 Service Unavailable" i.e. just abort the sync and start it again later.  More nuanced handling might abort syncing that datatype but allow other datatypes to proceed.
Whiteboard: [qa?]
Given this is designed to prevent data loss, let's try and target 30.
Flags: firefox-backlog?
OS: Windows 7 → All
Hardware: x86_64 → All
Flags: firefox-backlog? → firefox-backlog+
:rfkelly
How does this differ from bug 959032
> How does this differ from bug 959032

s/Desktop/Android/g

The client teams requested separate bugs for tracking purposes.
We really should do a better job of making platform specific titles... :-)
Gavin - can we get an assignee here if this is to make FF30?  We're going to build on Beta 6/7 this week, starting to narrow down on risk & major changes.
Flags: needinfo?(gavin.sharp)
I suspect this is too late to jam into 30. Let's target 31, and I will find an assignee.
Flags: needinfo?(gavin.sharp)
Trying to prioritize this, and there's very little information to go on here.

How does desktop sync currently handle this 409 failure?
Do we know how often that failure occurs in practice?
Is this actually causing data loss for users?
Flags: needinfo?(rfkelly)
Flags: needinfo?(ckarlof)
In reverse order:

> Is this actually causing data loss for users?

No.  The server is currently returning the 409 Conflict in cases where it might otherwise do weird things due to concurrent writes.  This bug is simply to ensure that client code is aware of, and responds appropriately to, this error code.

> Do we know how often that failure occurs in practice?

Very rarely; the server will block for some time before returning a 409.  I'll see about getting some concrete numbers here.

> How does desktop sync currently handle this 409 failure?

I'm not sure. From conversations with Richard I believe it is (or at least *was* at some stage) treated as an unexpected protocol error, so it would cause the sync to abort and possibly show an error bar.  This is safe but ugly.
Flags: needinfo?(rfkelly)
(In reply to Ryan Kelly [:rfkelly] from comment #8)

> > How does desktop sync currently handle this 409 failure?
> 
> I'm not sure. From conversations with Richard I believe it is (or at least
> *was* at some stage) treated as an unexpected protocol error, so it would
> cause the sync to abort and possibly show an error bar.  This is safe but
> ugly.

As far as I can tell from reading policies.js and record.js/resource.js, 409 will cause an error bar -- it's not a network error, and it's not a specially handled 400/401/404.

(From the client's perspective, 409 is an out-of-protocol response, which implies that something has been misconfigured, and thus the user should be notified so they can fix it.)

It will definitely cause the conflicting engine to fail, but other engines should attempt to sync, unless the 409 is returned during key/meta fetches.
> unless the 409 is returned during key/meta fetches.

It's only returned for writes, never for reads.  So apart from the error bar the current handling sounds pretty good.
I think rfkelly and rnewman have this covered.
Flags: needinfo?(ckarlof)
Sounds like not something we need to explicitly track, then.
Blocks: 1051739
Blocks: 1062120
Priority: -- → P2
No longer blocks: 905997
Priority: P2 → P3
Ryan, is this obsoleted by the batch API?
Flags: needinfo?(rfkelly)
> Ryan, is this obsoleted by the batch API?

No, it's still possible for servers to return this error even with the batch API.  But I wouldn't be too sad if we WONTFIXed this, since the status quo seems to be treating us OK; the only benefit is it might allow the error to fail a single engine rather than fail the entire sync, which
Flags: needinfo?(rfkelly)
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox

This has come back onto the radar for the "durable sync" effort. IIUC, the current situation on desktop is:

  • A sync will be aborted if an engine fails with a 401 error (which tends to mean a node reassignment) or there's a server error with a retry-after header.

  • We "somewhat" expect a 412 when posting a batch, although that's just informational - there's no special handling of that case, so 409 would be treated identically - and as above, that means this engine will fail but other engines will continue.

This echoes what Richard wrote in comment 9, although error bars are now dead. So I think this is WONTFIX.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.