Suboptimal handling of 503s during /meta/global fetch

RESOLVED FIXED

Status

()

defect
P2
normal
RESOLVED FIXED
5 years ago
9 months ago

People

(Reporter: rfkelly, Assigned: eoger)

Tracking

unspecified
Points:
---
Dependency tree / graph
Bug Flags:
firefox-backlog +

Firefox Tracking Flags

(firefox50 fixed)

Details

(Whiteboard: [qa+])

Attachments

(1 attachment)

Reporter

Description

5 years ago
Currently running Aurora: 35.0a2 (2014-10-21)

There was a db problem on my sync node and I got the following error log, along with an "unknown error" yellow bar.  Two things in the log strike me as suboptimal.

First, a 503 on GET of /meta/global is treated like a 404 rather than network error:

    * GET fail 503 https://sync-131-us-west-2.sync.services.mozilla.com/1.5/9079281/storage/meta/global
    * No meta/global record on the server. Creating one.

Second, a 503 on PUT of /meta/global produces the "unknown error" yellow bar, so it's probably not checking Retry-After and triggering the right retry logic:

    * PUT fail 503 https://sync-131-us-west-2.sync.services.mozilla.com/1.5/9079281/storage/meta/global
    * Unable to upload new meta/global. Failing remote setup.
Whiteboard: [qa+]
Reporter

Comment 1

4 years ago
> the following error log

Which I don't seem to have attached.  I feel like this has been dealt with in subsequent bugs, :markh may remember more explicitly
(In reply to Ryan Kelly [:rfkelly] from comment #0)

> First, a 503 on GET of /meta/global is treated like a 404 rather than
> network error:
> 
>     * GET fail 503
> https://sync-131-us-west-2.sync.services.mozilla.com/1.5/9079281/storage/
> meta/global
>     * No meta/global record on the server. Creating one.

It looks like this is still the case - service.js, in _remoteSetup(), has a special case for 401, but all other responses are treated as above.

> Second, a 503 on PUT of /meta/global produces the "unknown error" yellow
> bar, so it's probably not checking Retry-After and triggering the right
> retry logic:

The code appears to be checking the retry-after header (but only that header, not the x-weave-* variations etc) on 503 in that case - but only on that put (ie, the earlier 503 on get did not check for retry-after). I actually can't see how the error bar would have come up in that case - but the error bar is gone now anyway.

I think this should remain open to fix the handling of the failed get - treating everything that's not 401 the same seems wrong.
Flags: firefox-backlog+
OS: Windows 7 → All
Priority: -- → P2
Hardware: x86_64 → All
Comment on attachment 8752437 [details]
Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.

https://reviewboard.mozilla.org/r/52619/#review49614

This function is complex - I think we need new tests to exercise the changes here (ie, if the existing tests still all pass, clearly they aren't covering enough!)

::: services/sync/modules/service.js:1087
(Diff revision 1)
>          newMeta.isNew = true;
>  
>          this.recordManager.set(this.metaURL, newMeta);
>          let uploadRes = newMeta.upload(this.resource(this.metaURL));
>          if (!uploadRes.success) {
>            this._log.warn("Unable to upload new meta/global. Failing remote setup.");

Let's add uploadRes.status to the log message here (checkServerError doesn't always log it)

::: services/sync/modules/service.js:1116
(Diff revision 1)
>          STORAGE_VERSION > parseFloat(remoteVersion)) {
>  
>        this._log.info("One of: no meta, no meta storageVersion, or no meta syncID. Fresh start needed.");
>  
>        // abort the server wipe if the GET status was anything other than 404 or 200
> -      let status = this.recordManager.response.status;
> +      status = this.recordManager.response.status;

ISTM this should no longer be necessary (ie, that status is guaranteed to be the same value it was before)? Maybe changing the new |let status| above to be |const status|?

(What I'm concerned about is that this.recordManager.response is effectively a "global" and it's difficult to guarantee it hasn't changed via a side-effect of calling another function etc, and the comment here certainly implies it is intended to reflect the original get. It *looks* like it should still be the same object, but it's hard to ensure this as the world changes. This entire function sucks and is hard to rationalize about)

::: services/sync/modules/service.js:1118
(Diff revision 1)
>        this._log.info("One of: no meta, no meta storageVersion, or no meta syncID. Fresh start needed.");
>  
>        // abort the server wipe if the GET status was anything other than 404 or 200
> -      let status = this.recordManager.response.status;
> +      status = this.recordManager.response.status;
>        if (status != 200 && status != 404) {
>          this.status.sync = METARECORD_DOWNLOAD_FAIL;

ISTM that you've basically prevented this block from being entered?
Attachment #8752437 - Flags: review?(markh)
Assignee

Comment 5

3 years ago
Comment on attachment 8752437 [details]
Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/1-2/
Attachment #8752437 - Flags: review?(markh)
Comment on attachment 8752437 [details]
Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.

https://reviewboard.mozilla.org/r/52619/#review51142

::: services/sync/modules/service.js:1063
(Diff revision 2)
>    // anything (e.g. key setup).
>    _remoteSetup: function _remoteSetup(infoResponse) {
>      let reset = false;
>  
>      this._log.debug("Fetching global metadata record");
>      let meta = this.recordManager.get(this.metaURL);

Looking at this more, we can't reference .response.status here - this.recordManager.get() might return a cached version - but only the *record* is cached, not the response - so there's a chance this.recordManager.response.status will be from a completely unrelated request here?

It also looks like the existing code has the same problem down in the "// abort the server wipe if" block, but the reference a few lines down from here is OK as it ensures it really did make the request.

::: services/sync/modules/service.js:1083
(Diff revision 2)
>  
>        // ... fetch the current record from the server, and COPY THE FLAGS.
>        let newMeta = this.recordManager.get(this.metaURL);
> +      status = this.recordManager.response.status;
>  
> -      // If we got a 401, we do not want to create a new meta/global - we
> +      if (status === 404) {

https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style says to avoid "strict" equality and I don't see a reason you actually need them here.

::: services/sync/modules/service.js:1100
(Diff revision 2)
> -          this._log.warn("Unable to upload new meta/global. Failing remote setup.");
> +          this._log.warn("Unable to upload new meta/global (" + uploadRes.status + "). Failing remote setup.");
>            this.errorHandler.checkServerError(uploadRes);
>            return false;
>          }
> +      } else if (status === 200 && newMeta) {
> +        newMeta.isNew   = newMeta.isNew;

This looks wrong?

::: services/sync/modules/service.js:1109
(Diff revision 2)
>        }
>  
>        // Switch in the new meta object and record the new time.
>        meta              = newMeta;
>        this.metaModified = infoResponse.obj.meta;
> +    } else if (status !== 200 && status !== 404) {

The 404 check looks wrong here - I'd have thought that's impossible given the changes above. Also, there should either be a trailing "else", or a comment saying why there isn't one - ie, it *looks* like maybe this should be written:

} else if (status != 200) {
  logError = true;
} // else status is 200 and we got the current meta.


but this is really quite difficult to rationalize about.

::: services/sync/modules/service.js
(Diff revision 2)
>          STORAGE_VERSION > parseFloat(remoteVersion)) {
>  
>        this._log.info("One of: no meta, no meta storageVersion, or no meta syncID. Fresh start needed.");
>  
> -      // abort the server wipe if the GET status was anything other than 404 or 200
> -      let status = this.recordManager.response.status;

see comment above: ie, it's not clear to me that the existing check for status is valid - but also not clear that it's invalid - I'm not clear how this code finds itself in that state yet.
Attachment #8752437 - Flags: review?(markh)
(In reply to Mark Hammond [:markh] from comment #6)
> Comment on attachment 8752437 [details]
> MozReview Request: Bug 1087636 - Better handling of server errors in GET
> /meta/global in Sync. r=markh
> 
> https://reviewboard.mozilla.org/r/52619/#review51142

Aargh - reviewboard lost the "summary" comment I added. In short, I think this existing code is way too complex and possibly broken, so I think we need to restructure/rewrite parts/lots of it before we can actually fix it. Let's chat more about this.
Assignee

Comment 8

3 years ago
Comment on attachment 8752437 [details]
Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/2-3/
Attachment #8752437 - Attachment description: MozReview Request: Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. r=markh → Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.
Attachment #8752437 - Flags: review?(markh)
Assignee: nobody → edouard.oger
Blocks: 1257960
Status: NEW → ASSIGNED
Assignee

Comment 9

3 years ago
Comment on attachment 8752437 [details]
Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/3-4/
Comment on attachment 8752437 [details]
Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.

https://reviewboard.mozilla.org/r/52619/#review58514

This looks great, thanks!

::: services/sync/tests/unit/test_service_sync_remoteSetup.js:66
(Diff revision 4)
>      "/1.1/johndoe/storage/crypto/keys": upd("crypto", keysWBO.handler()),
>      "/1.1/johndoe/storage/crypto": upd("crypto", cryptoColl.handler()),
>      "/1.1/johndoe/storage/clients": upd("clients", clients.handler()),
> -    "/1.1/johndoe/storage/meta/global": upd("meta", wasCalledHandler(meta_global)),
> -    "/1.1/johndoe/storage/meta": upd("meta", wasCalledHandler(metaColl)),
> -    "/1.1/johndoe/info/collections": collectionsHelper.handler
> +    "/1.1/johndoe/storage/meta": upd("meta", wasCalledHandler(metaColl))
> +  };
> +  handlers[GLOBAL_PATH] = upd("meta", wasCalledHandler(meta_global));

Can't we just have these directly in the |handlers| assignment?
Attachment #8752437 - Flags: review?(markh) → review+
Assignee

Comment 11

3 years ago
Comment on attachment 8752437 [details]
Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/4-5/
Assignee

Updated

3 years ago
Keywords: checkin-needed

Comment 12

3 years ago
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/fx-team/rev/652fa69526ff
Better handling of server errors in GET /meta/global in Sync. r=markh
Keywords: checkin-needed

Comment 13

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/652fa69526ff
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.