Sync servers configured for hard-eol responses causes Fx to say there is a "server error" rather than displaying the EOL messaging.



Cloud Services
Firefox Sync: Backend
4 years ago
3 years ago


(Reporter: markh, Unassigned)


Firefox Tracking Flags

(Not tracked)


(Whiteboard: [qa+])



4 years ago
See bug 1014411 comment 6 - Ryan setup a "hard-eol" server which returns a 513 on every request.  Sadly this causes the login process to fail (due to info/collections seeing s 513) and even though the EOL handling code does see the message etc, that login error effectively "trumps" the EOL messaging - with the end result being an infobar saying there is a server error, and the EOL infobar isn't seen.

(A quick experiment with having info/collections returning an empty JSON blob means the next failure has to do with meta/global getting a 513, and an empty JSON blob for that response will cause the client to get upset in various creative ways)

This could probably be fixed client-side, but that still leaves earlier Fx versions screwed.  rnewman, any ideas/thoughts?

Log below (with timestamps removed)

Sync.Service    INFO    Logging in user vusnj3h2crfvpzuam7mbtmdyyrx5rm5n
Sync.Service    DEBUG   Caching URLs under storage user base:
Sync.Resource   DEBUG   mesg: GET fail 513
Sync.Resource   DEBUG   GET fail 513
Sync.Status     DEBUG   Status.login: success.login => error.login.reason.server
Sync.Status     DEBUG   Status.service: success.status_ok => error.login.failed
Sync.ErrorHandler       ERROR   X-Weave-Alert: hard-eol: SYNC HAS SUNK
Sync.Service    TRACE   Event: weave:service:login:error
Sync.SyncScheduler      TRACE   Handling weave:service:login:error
Sync.SyncScheduler      DEBUG   Clearing sync triggers and the global score.
Sync.SyncScheduler      TRACE   _checkSync returned "".
Sync.SyncScheduler      DEBUG   Next sync in 86400000 ms.
Sync.ErrorHandler       TRACE   Handling weave:service:login:error
Sync.ErrorHandler       DEBUG   Flushing file log.
Sync.ErrorHandler       TRACE   Beginning stream copy to error-1401345291948.txt: 1401345291949
Sync.Service    DEBUG   Exception: Login failed: error.login.reason.server No traceback available
Sync.Service    DEBUG   Not syncing: login returned false.
Sync.ErrorHandler       TRACE   Notifying weave:ui:login:error. Status.login is error.login.reason.server. Status.sync is success.sync
Whiteboard: [qa+]
The server-side solution to this appears to be to only send hard-eol on some requests, but not on others.  Which we can do, but it could mean leaving actual functioning (albeit read-only) sync nodes in place, which sounds expensive.

Ideally, we'd like to redirect all sync-1.1 traffic to a simple static "EOLinator" service that just returns the appropriate error responses.  So: can we return some fake static data in /info/collections, /storage/meta/global and so-on what will help coerce the clients into the correct state?

(I'd like to fix it server-side if it's simple enough, so that FF28 holdouts get good messaging)
Flags: needinfo?(rnewman)
After some experimentation, the real trick here seems to be /crypto/keys.  I can serve a fake /meta/global to the client, but as soon as it detects something is not quite right, it heads into its "wipe the server" routine and tried to upload a new set of keys.  It wont proceed with the actual sync until it can verify that the keys are correctly uploaded, which precludes serving static data for this item.

I successfully activated the "Your Firefox Sync service is no longer available" message by accepting writes to /meta/global and /crypto/keys, but having all the other URLs return the 513 error.
Flags: needinfo?(rnewman)

Comment 3

4 years ago
So it sounds like services are able to help us work around this client problem.  Fixing the problem on the client sounds worthwhile at face value, but in practice, we can't fix it for all affected versions - so having, say, Fx 32 and up handle this correctly when none of the earlier versions do doesn't solve anything worthwhile.

Should we close this (or at least have it not block bug 1014406)?

Comment 4

3 years ago
This doesn't block the migration but instead it blocks our strategy for decommissioning after migration has been in place for a while.
Blocks: 1008066
No longer blocks: 1014406
Given how far we are into the migration, I'm betting this is WONTFIX
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.