Closed Bug 511549 Opened 15 years ago Closed 15 years ago

make detailedStatus much smarter about errors and backoff

Categories

(Firefox :: Sync, defect, P1)

x86
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: mconnor, Assigned: mconnor)

References

()

Details

Attachments

(1 file, 4 obsolete files)

We probably want to preserve this data separately. bug 481733 comment 8 mentions this.
Target Milestone: --- → 0.7
Depends on: 481733
Attached patch scope creep ftw? (wip) (obsolete) — Splinter Review
Attached patch more (obsolete) — Splinter Review
Attachment #396976 - Attachment is obsolete: true
to-do:

* make the engines report something actually checkable for statusCode, not just the exception (probably should just log the exception and keep to defined status codes for the status record), either here or in bug 512906
* add strings for missing bits
* make sync failed notifications smarter
Priority: -- → P1
Priority: P1 → P2
Priority: P2 → P1
Bug morphing to handle more under this bug, basically WEP 106
Summary: add a .login property to detailedStatus → make detailedStatus much smarter about errors and backoff
Attached patch more (obsolete) — Splinter Review
Attachment #396977 - Attachment is obsolete: true
Blocks: 504125
Blocks: 504135
Attached patch non-faily diff (obsolete) — Splinter Review
Attachment #399779 - Attachment is obsolete: true
Comment on attachment 400620 [details] [diff] [review]
non-faily diff

This should be ready for review now.
Attachment #400620 - Flags: review?(edilee)
Comment on attachment 400620 [details] [diff] [review]
non-faily diff

After this lands, I think I'll move status in to its own Status object under modules/status.js as a singleton that engines can import as well and service.js would import into the Weave object. Also probably clean up set* into setters..

>+++ b/source/chrome/content/sync.js
>@@ -184,47 +184,68 @@ WeaveWindow.prototype = {
>   _onSyncEnd: function WeaveWin__onSyncEnd(status) {
..
>+    if (!status) { 
>+      if (!Weave.Service.status.backoff) {
>+        // this is the first or second failure, and 
>+        let error = Weave.Utils.getErrorString(Weave.Service.status.sync);
>+        let description = this._stringBundle
>+                              .getFormattedString("error.sync.description", [error]);
>+        let tryAgainButton =
>+          new Weave.NotificationButton(
>+            this._stringBundle.getString("error.sync.tryAgainButton.label"),
>+            this._stringBundle.getString("error.sync.tryAgainButton.accesskey"),
>+            function() { gWeaveWin.doSync(); return true; }
>+          );
>+        let notification =
>+          new Weave.Notification(
>+            title,
>+            description,
>+            null,
>+            Weave.Notifications.PRIORITY_INFO,
>+            [tryAgainButton]
>+          );
>+        Weave.Notifications.replaceTitle(notification);
>+      }
>+      else {
>+        let error = Weave.Utils.getErrorString(Weave.Service.status.sync);
>+        let description = this._stringBundle
>+                              .getString("error.sync.description", [error]);
>+
>+        let notification =
>+          new Weave.Notification(
>+            title,
>+            description,
>+            null,
>+            Weave.Notifications.PRIORITY_INFO,
>+            null
>+          );
>+        Weave.Notifications.replaceTitle(notification); 
>+      }
Factor out the if/else branch so error/description are set before (with getFormattedString) and initialize button to null, then conditionally set button to try again for !backoff. Then share the code for creating the new notification (without newlines for each arg).

>     this._updateLastSyncItem();
>+    
accidental tab?

>+++ b/source/modules/constants.js.in
>@@ -36,18 +36,19 @@
>+                          'SYNC_SUCCEEDED', 'LOGIN_SUCCEEDED',
You're missing a bunch of other constants that you added. (I've been using " instead of '... but I suppose somebody was maybe using ' for more constant-like-strings?)

>@@ -84,46 +85,62 @@ const ONE_MEGABYTE = 1024 * ONE_KILOBYTE
>+const STATUS_OK =                         "success.status_ok";
>+const SYNC_FAILED =                       "error.sync.failed";
>+const LOGIN_FAILED =                      "error.login.failed";
>+const SYNC_FAILED_PARTIAL =               "error.sync.failed_partial";
>+const STATUS_DISABLED =                   "service.disabled";
...
These don't have the corresponding errors.properties file changes for localized messages.

>+++ b/source/modules/engines.js
>@@ -356,19 +356,21 @@ SyncEngine.prototype = {
>-      let resp = res.put(meta);
>-      if (!resp.success)
>+      let resp = res.put(meta.serialize());
>+      if (!resp.success) {
>+        this._log.debug("Metarecord upload fail:" + resp);
>         throw resp;
>+      }
You can attach another property to resp.. in this case ENGINE_META_UPLOAD_FAIL for now.

>+++ b/source/modules/resource.js
>@@ -247,16 +247,22 @@ Resource.prototype = {
>       }
>+      else
>+        this._log.debug(action + " fail: " + status + " " + this._data);
What is this else-ing? This is adding another else block to if (success) { .. } else { .. } And both already print out the status and _data body if necessary.

>+      // this is a server-side safety valve to allow slowing down clients without hurting performance
>+      if (headers["X-Weave-Backoff"])
>+        Observers.notify("weave:service:backoff:interval", parseInt(headers["X-Weave-Backoff"], 10))
Why not also do the 503 Retry-After check here? I suppose the only issue is if we happen to be talking to a different server and it gives a 503.. (nit: semicolon)

>+++ b/source/modules/service.js
>@@ -117,31 +117,54 @@ Utils.lazy(Weave, 'Service', WeaveSvc);
>  * Service status query system.  See constants defined in constants.js.
>  */
> 
> function StatusRecord() {
>   this._init();
> }
> StatusRecord.prototype = {
>   _init: function() {
>-    this.server = [];
>+    this.service = null;
>     this.sync = null;
>+    this.login = null;
>     this.engines = {};
>+    this.backoff = false;
>+    this.backoffInterval = 0;
_resetBackoff() for the last two?

>+  _resetBackoff: function () {
>+    this.backoff = false;
>+    this.backoffInterval = 0;
Should this include minimumNextSync? Right now it's undefined by default.. see later

>+      case "weave:service:backoff:interval":
>+        let interval = data + Math.random() * data * 0.25; // required backoff + up to 25%
>+        this.status.backoffInterval = interval;
>+        this.status.minimumNextSync = Date.now() + data;
So status.backoff can be false but have a backoffInterval and minimumNextSync. status.backoff is only set to true if the server response 5xx.. basically a client initiated backoff, so maybe name it that because now there's 2 types of backoffs where the server one is implicitly indicated by minimumNextSync. But then again, is there a need to differentiate the two?

>@@ -851,24 +873,26 @@ WeaveSvc.prototype = {
>         this._log.warn("Couldn't download keys from server, aborting sync");
>-        this._setSyncFailure(KEYS_DOWNLOAD_FAIL);
>+        this._handleServerError(PubKeys.response.status);
>+        this._handleServerError(PrivKeys.response.status);
>+        this.status.setSyncStatus(KEYS_DOWNLOAD_FAIL);
What's handleServerError? There's checkServerError but that takes a response object.

>@@ -916,35 +940,40 @@ WeaveSvc.prototype = {
>+    else if (this.status.minimumNextSync > Date.now())
>+      reason = kSyncBackoffNotMet;
Not sure if you intended this or not, but minimumNextSync is undefined until X-Weave-Backoff. undefined is coerced into NaN which will make the(/any) comparison false.

>   _checkSyncStatus: function WeaveSvc__checkSyncStatus() {
..
>-
>+      
>+      this.status.setServiceStatus(STATUS_DISABLED);
There's no function setServiceStatus.. perhaps you just wanted this.status.service = STATUS_DISABLED; nit: extra tab on blank line

>@@ -1162,19 +1162,20 @@ WeaveSvc.prototype = {
>       engine.sync();
>       return true;
>     }
>     catch(e) {
>       // maybe a 401, cluster update needed?
>       if (e.status == 401 && this._updateCluster())
>         return this._syncEngine(engine);
> 
>+      this._checkServerError(e);
>+      this.status.setEngineStatus(engine.name, e);
"e" is the response String object, so probably not what you want to set for the engine status.

>@@ -1196,16 +1197,40 @@ WeaveSvc.prototype = {
>+  _checkServerError: function WeaveSvc__checkServerError(resp) {
>+    if (Utils.checkStatus(resp.status, null, [500, [502, 504]])) {
>+      this.status.backoff = true;
So here we're saying the client has decided to back off because the server is acting strange?
>+      if (resp.status == 503 && resp.headers["Retry-After"])
>+        Observers.notify("weave:service:backoff:interval", parseInt(resp.headers["Retry-After"], 10));
But treat it as if the server gave a X-Weave-Backoff? Just trying to figure out what difference it makes for server vs client initiated backoffs.
Attachment #400620 - Flags: review?(edilee) → review-
(In reply to comment #8)
> (From update of attachment 400620 [details] [diff] [review])
> After this lands, I think I'll move status in to its own Status object under
> modules/status.js as a singleton that engines can import as well and service.js
> would import into the Weave object. Also probably clean up set* into setters..

Yeah, was going to use setters, but the engine status one breaks the pattern, and it's easier to have set* than mix those up, IMO.  Being a standalone object is right, I just didn't want to scope creep infinitely in one patch.

> >+++ b/source/chrome/content/sync.js

> Factor out the if/else branch so error/description are set before (with
> getFormattedString) and initialize button to null, then conditionally set
> button to try again for !backoff. Then share the code for creating the new
> notification (without newlines for each arg).

done.

> >     this._updateLastSyncItem();
> >+    
> accidental tab?

or too-helpful editor.

> >+++ b/source/modules/constants.js.in
> >@@ -36,18 +36,19 @@
> >+                          'SYNC_SUCCEEDED', 'LOGIN_SUCCEEDED',
> You're missing a bunch of other constants that you added. (I've been using "
> instead of '... but I suppose somebody was maybe using ' for more
> constant-like-strings?)

most of these are ', so... I just copied that.  Added the missing bits.

> >@@ -84,46 +85,62 @@ const ONE_MEGABYTE = 1024 * ONE_KILOBYTE
> >+const STATUS_OK =                         "success.status_ok";
> >+const SYNC_FAILED =                       "error.sync.failed";
> >+const LOGIN_FAILED =                      "error.login.failed";
> >+const SYNC_FAILED_PARTIAL =               "error.sync.failed_partial";
> >+const STATUS_DISABLED =                   "service.disabled";
> ...
> These don't have the corresponding errors.properties file changes for localized
> messages.

Right, mostly because we never use status.service anywhere in the UI.  Trying to add strings only as needed, but setting values so that if/when we do, it's minimal changes.

> >+++ b/source/modules/engines.js
> >@@ -356,19 +356,21 @@ SyncEngine.prototype = {
> >-      let resp = res.put(meta);
> >-      if (!resp.success)
> >+      let resp = res.put(meta.serialize());
> >+      if (!resp.success) {
> >+        this._log.debug("Metarecord upload fail:" + resp);
> >         throw resp;
> >+      }
> You can attach another property to resp.. in this case ENGINE_META_UPLOAD_FAIL
> for now.

Mmm, right. Too used to dealing with XPCOM objects.  Fixed.

> >+++ b/source/modules/resource.js
> >@@ -247,16 +247,22 @@ Resource.prototype = {
> >       }
> >+      else
> >+        this._log.debug(action + " fail: " + status + " " + this._data);
> What is this else-ing? This is adding another else block to if (success) { .. }
> else { .. } And both already print out the status and _data body if necessary.

I don't remember this, bad merge, perhaps? Removed, in any case.

> >+      // this is a server-side safety valve to allow slowing down clients without hurting performance
> >+      if (headers["X-Weave-Backoff"])
> >+        Observers.notify("weave:service:backoff:interval", parseInt(headers["X-Weave-Backoff"], 10))
> Why not also do the 503 Retry-After check here? I suppose the only issue is if
> we happen to be talking to a different server and it gives a 503.. (nit:
> semicolon)

Mostly because I wanted to handle that with the other HTTP errors instead of splitting it.

> >+++ b/source/modules/service.js
> > StatusRecord.prototype = {
> >   _init: function() {

> >+    this.backoff = false;
> >+    this.backoffInterval = 0;
> _resetBackoff() for the last two?

Sure, fixed.

> >+  _resetBackoff: function () {
> >+    this.backoff = false;
> >+    this.backoffInterval = 0;
> Should this include minimumNextSync? Right now it's undefined by default.. see
> later

yes, fixed.

> >+      case "weave:service:backoff:interval":
> >+        let interval = data + Math.random() * data * 0.25; // required backoff + up to 25%
> >+        this.status.backoffInterval = interval;
> >+        this.status.minimumNextSync = Date.now() + data;
> So status.backoff can be false but have a backoffInterval and minimumNextSync.
> status.backoff is only set to true if the server response 5xx.. basically a
> client initiated backoff, so maybe name it that because now there's 2 types of
> backoffs where the server one is implicitly indicated by minimumNextSync. But
> then again, is there a need to differentiate the two?

Actually, not really, we should set backoff to true here.

> >@@ -851,24 +873,26 @@ WeaveSvc.prototype = {
> >         this._log.warn("Couldn't download keys from server, aborting sync");
> >-        this._setSyncFailure(KEYS_DOWNLOAD_FAIL);
> >+        this._handleServerError(PubKeys.response.status);
> >+        this._handleServerError(PrivKeys.response.status);
> >+        this.status.setSyncStatus(KEYS_DOWNLOAD_FAIL);
> What's handleServerError? There's checkServerError but that takes a response
> object.

Oops, missed that, I refactored to take the response object and missed that, good catch.

> >@@ -1162,19 +1162,20 @@ WeaveSvc.prototype = {
> >       engine.sync();
> >       return true;
> >     }
> >     catch(e) {
> >       // maybe a 401, cluster update needed?
> >       if (e.status == 401 && this._updateCluster())
> >         return this._syncEngine(engine);
> > 
> >+      this._checkServerError(e);
> >+      this.status.setEngineStatus(engine.name, e);
> "e" is the response String object, so probably not what you want to set for the
> engine status.

yeah, now using e.failureCode || ENGINE_UNKNOWN_FAIL

> >@@ -1196,16 +1197,40 @@ WeaveSvc.prototype = {
> >+  _checkServerError: function WeaveSvc__checkServerError(resp) {
> >+    if (Utils.checkStatus(resp.status, null, [500, [502, 504]])) {
> >+      this.status.backoff = true;
> So here we're saying the client has decided to back off because the server is
> acting strange?

Correct.
Attachment #400620 - Attachment is obsolete: true
Attachment #400918 - Flags: review?(edilee)
Comment on attachment 400918 [details] [diff] [review]
with comments addressed

r=Mardak

>+++ b/source/chrome/content/sync.js
>@@ -184,33 +184,35 @@ WeaveWindow.prototype = {
>+    let error = Weave.Utils.getErrorString(Weave.Service.status.sync);
>+    let description = this._stringBundle
>+                          .getString("error.sync.description", [error]);
This could be placed inside the first if. And getString should be getFormattedString.
>+    if (!status) { 
>+      let priority = Weave.Notifications.PRIORITY_INFO;
>+      let button = null;
>+      if (Weave.Service.status.backoff) {
>+        priority = Weave.Notifications.PRIORITY_INFO;
both are INFO right now and "try again" added for taken

>+++ b/source/modules/service.js
>@@ -412,17 +428,23 @@ WeaveSvc.prototype = {
>+      case "weave:service:backoff:interval":
>+        let interval = data + Math.random() * data * 0.25; // required backoff + up to 25%
>+        this.status.backoff = true;
>+        this.status.backoffInterval = interval;
>+        this.status.minimumNextSync = Date.now() + data;
Just making sure this is what you want.. X-Weave-Backoff will make the client stop syncing any remaining engines because backoff = true.
Attachment #400918 - Flags: review?(edilee) → review+
http://hg.mozilla.org/labs/weave/rev/23c2dce2e956
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Blocks: 512906
flagging in-testsuite? for unit tests?
Flags: in-testsuite?
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: