Closed Bug 683396 Opened 13 years ago Closed 13 years ago

Correctly identify server maintenance at login, too

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: philikon, Assigned: emtwo)

References

Details

(Whiteboard: [fixed in services])

Attachments

(2 files, 4 obsolete files)

Bug 578195 makes us identify and display or muffle server maintenance errors, but only for syncs. We don't actually identify server errors correctly on login which means right after startup, you'd see "Server incorrectly configured".
I *think* the solution is to make sure that we call ErrorHandler.checkServerError() for every non-200 response. For machines that start up during maintenance, they will obviously fail on the very first fetch of info/collection in verifyLogin(), but really, it could happen at any point along the login() codepath where we fetch a resource.

In a lot of those places we already call ErrorHandler.checkServerError() and then set some Status.sync/Status.login value. That's not a good order to do things in since ErrorHandler.checkServerError()'s whole point is to set those flags. So at a minimum, we should reverse those lines.
(In reply to Philipp von Weitershausen [:philikon] from comment #1)
> I *think* the solution is to make sure that we call
> ErrorHandler.checkServerError() for every non-200 response.

... and every exception we catch while fetching a request (which is what happens for network errors).
Assignee: nobody → msamuel
Still needs tests for other login scenarios but is mostly done
Attachment #557370 - Flags: feedback?(philipp)
Comment on attachment 557370 [details] [diff] [review]
WIP: correctly identify server maintenance during login

This patch makes me cry. Not sure whether it's for joy or sorrow, though. Just one nit and a note about test coverage:

>@@ -75,16 +82,18 @@ function sync_httpd_setup() {
>     "/1.1/johndoe/storage/meta/global": upd("meta", global.handler()),
>     "/1.1/johndoe/info/collections": collectionsHelper.handler,
>     "/1.1/johndoe/storage/crypto/keys":
>       upd("crypto", (new ServerWBO("keys")).handler()),
>     "/1.1/johndoe/storage/clients": upd("clients", clientsColl.handler()),
> 
>     "/1.1/janedoe/storage/meta/global": handler_401,
>     "/1.1/janedoe/info/collections": handler_401,
>+
>+    "/api/1.1/johnsmith/info/collections": service_unavailable,

I suggest s/api/maintenance/ or something more descriptive.

It would be good if we could also get tests for info/collections working fine, but crypto/keys and meta/global returning maintenance errors, both in the empty account (first sync, so we're uploading) and the existing account case. Given your time constraints, we could defer those to a follow-up, though, and queue that bug at the end of your todo list :)

One thing you definitely want to do in this patch is make sure the Status.login = SERVER_MAINTENANCE case that was added to ErrorHandler.shouldReportError() is tested in test_shouldReportError().
Attachment #557370 - Flags: feedback?(philipp) → feedback+
(In reply to Philipp von Weitershausen [:philikon] from comment #4)
> Comment on attachment 557370 [details] [diff] [review]
> WIP: correctly identify server maintenance during login
> 
> This patch makes me cry. Not sure whether it's for joy or sorrow

aww, what could possibly be so sorrowful about the patch? :P

> It would be good if we could also get tests for info/collections working
> fine, but crypto/keys and meta/global returning maintenance errors, both in
> the empty account (first sync, so we're uploading) and the existing account
> case. Given your time constraints, we could defer those to a follow-up,
> though, and queue that bug at the end of your todo list :)

Will try.

> One thing you definitely want to do in this patch is make sure the
> Status.login = SERVER_MAINTENANCE case that was added to
> ErrorHandler.shouldReportError() is tested in test_shouldReportError().

Yes. You did find something I almost missed! :O
added more tests
Attachment #557370 - Attachment is obsolete: true
Attachment #557645 - Flags: review?(philipp)
Attachment #557645 - Attachment is patch: true
Comment on attachment 557645 [details] [diff] [review]
V1: correctly identify server maintenance during login

>+
>+    "/maintenance/1.1/johnsmith/info/collections": service_unavailable,
>+
>+    "/maintenance/1.1/janesmith/storage/meta/global": service_unavailable,
>+    "/maintenance/1.1/janesmith/info/collections": collectionsHelper.handler,
>+
>+    "/maintenance/1.1/foo/storage/meta/global": upd("meta", global.handler()),
>+    "/maintenance/1.1/foo/info/collections": collectionsHelper.handler,
>+    "/maintenance/1.1/foo/storage/crypto/keys": service_unavailable,
>   });

This is awesome, thanks for writing tests for these. These cover at least the "existing account" login path which will cover us 99% of the time. We should still have tests for the "new/empty account" login path where we upload keys and meta/global. So we might get a successful 404 on the GET for those, but the PUT might return a 503 + Retry-After. Edge case, I know, but we've had other key-related edge cases before where maintenance lead to inconsistencies across clients.


>+add_test(function testA_login_server_maintenance_error() {
>+add_test(function testB_login_server_maintenance_error() {
>+add_test(function testC_login_server_maintenance_error() {
etc...

Please give these more descriptive names. Best is to mention which resource produced the maintenance failure (IOW, one of info_collections, meta_global, crypto_keys).

r=me with that!
Attachment #557645 - Flags: review?(philipp) → review+
(In reply to Philipp von Weitershausen [:philikon] from comment #7)
> We should still have tests for the "new/empty account" login path where we
> upload keys and meta/global. So we might get a successful 404 on the GET for
> those, but the PUT might return a 503 + Retry-After. Edge case, I know, but
> we've had other key-related edge cases before where maintenance lead to
> inconsistencies across clients.

I meant to add: it's not really worth it holding this bug for those tests, so please just file a follow-up bug for writing them.
Blocks: 684798
Attachment #557645 - Attachment is obsolete: true
http://hg.mozilla.org/services/services-central/rev/29bba7fb6ddc
Status: NEW → ASSIGNED
Whiteboard: [fixed in services]
STR:
1) When a server node goes down for maintenance (sends 503 + Retry-After) prior to being logged in, Sync should *not* display an error message on a scheduled sync.
2) You will get a nice "server is down for maintenance" message if a sync is invoked by clicking the Sync Now button. (It works exactly like the muffling/displaying of network errors in bug 659067)
(In reply to Marina Samuel [:marina] from comment #11)
> STR:
> 1) When a server node goes down for maintenance (sends 503 + Retry-After)
> prior to being logged in, Sync should *not* display an error message on a
> scheduled sync.
> 2) You will get a nice "server is down for maintenance" message if a sync is
> invoked by clicking the Sync Now button. (It works exactly like the
> muffling/displaying of network errors in bug 659067)

To clarify, "prior to being logged in" means first starting the browser.
Attached patch Proposed test failure fix (obsolete) — Splinter Review
Maybe this will help?
Attachment #558678 - Flags: feedback?(philipp)
Comment on attachment 558678 [details] [diff] [review]
Proposed test failure fix

This is a good idea, but I don't see how this would help unless nsIObserverService actually unwinds nested notifications somehow or we deferred weave:ui:sync:* ourselves. I actually think that's a good idea anyway, so let's drive the UI on the next tick in addition to this patch.
Attachment #558678 - Flags: feedback?(philipp)
Attached patch V2: Proposed test failure fix (obsolete) — Splinter Review
* notifying the UI on nextTick
Attachment #558678 - Attachment is obsolete: true
Attachment #558690 - Flags: feedback?(philipp)
Comment on attachment 558690 [details] [diff] [review]
V2: Proposed test failure fix

>+  notifyOnNextTick: function notifyOnNextTick(observer) {
>+    Utils.nextTick(function() Svc.Obs.notify(observer), Svc.Obs);
>+  },

s/observer/topic/ please and the 2nd argument to nextTick isn't necessary here, you can just omit it.
Attachment #558690 - Flags: feedback?(philipp) → review+
Comment on attachment 558690 [details] [diff] [review]
V2: Proposed test failure fix

Please refactor 

+    Service.startOver();
+    Status.resetSync();
+    Status.resetBackoff();
+
+    server.stop(run_next_test);

into something analogous to test_syncengine_sync.js's

  function cleanAndGo(server) {
    Svc.Prefs.resetBranch("");
    Records.clearCache();
    server.stop(run_next_test);
  }

Avoids having to correct multiple locations.
addressing comment 17 & comment 18

Found a tiny problem (& fixed) in test_login_syncAndReportErrors_prolonged_network_error after doing comment 18 :O yay
Attachment #558690 - Attachment is obsolete: true
Attachment #558696 - Attachment is patch: true
Attachment #558696 - Flags: feedback?(philipp)
(In reply to Marina Samuel [:marina] from comment #19)

> Found a tiny problem (& fixed) in
> test_login_syncAndReportErrors_prolonged_network_error after doing comment
> 18 :O yay

*makes zen monk face*

;)
Attachment #558696 - Flags: feedback?(philipp) → review+
Yeah I totally botched up the rebase of that patch... Landed fix: http://hg.mozilla.org/services/services-central/rev/474cf6d733df
Tests were hanging on opt builds, it turned out to be a test fixture problem that was simply not showing up on debug builds (presumably because they're a bit slower so the test fixtures weren't racing the actual code):

http://hg.mozilla.org/services/services-central/rev/752ab20f2ff8
No longer blocks: 684798
Depends on: 684798
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: