Closed Bug 1087636 Opened 11 years ago Closed 9 years ago

Suboptimal handling of 503s during /meta/global fetch

Categories

(Firefox :: Sync, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox50 --- fixed

People

(Reporter: rfkelly, Assigned: eoger)

References

Details

(Whiteboard: [qa+])

Attachments

(1 file)

Currently running Aurora: 35.0a2 (2014-10-21) There was a db problem on my sync node and I got the following error log, along with an "unknown error" yellow bar. Two things in the log strike me as suboptimal. First, a 503 on GET of /meta/global is treated like a 404 rather than network error: * GET fail 503 https://sync-131-us-west-2.sync.services.mozilla.com/1.5/9079281/storage/meta/global * No meta/global record on the server. Creating one. Second, a 503 on PUT of /meta/global produces the "unknown error" yellow bar, so it's probably not checking Retry-After and triggering the right retry logic: * PUT fail 503 https://sync-131-us-west-2.sync.services.mozilla.com/1.5/9079281/storage/meta/global * Unable to upload new meta/global. Failing remote setup.
Whiteboard: [qa+]
> the following error log Which I don't seem to have attached. I feel like this has been dealt with in subsequent bugs, :markh may remember more explicitly
(In reply to Ryan Kelly [:rfkelly] from comment #0) > First, a 503 on GET of /meta/global is treated like a 404 rather than > network error: > > * GET fail 503 > https://sync-131-us-west-2.sync.services.mozilla.com/1.5/9079281/storage/ > meta/global > * No meta/global record on the server. Creating one. It looks like this is still the case - service.js, in _remoteSetup(), has a special case for 401, but all other responses are treated as above. > Second, a 503 on PUT of /meta/global produces the "unknown error" yellow > bar, so it's probably not checking Retry-After and triggering the right > retry logic: The code appears to be checking the retry-after header (but only that header, not the x-weave-* variations etc) on 503 in that case - but only on that put (ie, the earlier 503 on get did not check for retry-after). I actually can't see how the error bar would have come up in that case - but the error bar is gone now anyway. I think this should remain open to fix the handling of the failed get - treating everything that's not 401 the same seems wrong.
Flags: firefox-backlog+
OS: Windows 7 → All
Priority: -- → P2
Hardware: x86_64 → All
Comment on attachment 8752437 [details] Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. https://reviewboard.mozilla.org/r/52619/#review49614 This function is complex - I think we need new tests to exercise the changes here (ie, if the existing tests still all pass, clearly they aren't covering enough!) ::: services/sync/modules/service.js:1087 (Diff revision 1) > newMeta.isNew = true; > > this.recordManager.set(this.metaURL, newMeta); > let uploadRes = newMeta.upload(this.resource(this.metaURL)); > if (!uploadRes.success) { > this._log.warn("Unable to upload new meta/global. Failing remote setup."); Let's add uploadRes.status to the log message here (checkServerError doesn't always log it) ::: services/sync/modules/service.js:1116 (Diff revision 1) > STORAGE_VERSION > parseFloat(remoteVersion)) { > > this._log.info("One of: no meta, no meta storageVersion, or no meta syncID. Fresh start needed."); > > // abort the server wipe if the GET status was anything other than 404 or 200 > - let status = this.recordManager.response.status; > + status = this.recordManager.response.status; ISTM this should no longer be necessary (ie, that status is guaranteed to be the same value it was before)? Maybe changing the new |let status| above to be |const status|? (What I'm concerned about is that this.recordManager.response is effectively a "global" and it's difficult to guarantee it hasn't changed via a side-effect of calling another function etc, and the comment here certainly implies it is intended to reflect the original get. It *looks* like it should still be the same object, but it's hard to ensure this as the world changes. This entire function sucks and is hard to rationalize about) ::: services/sync/modules/service.js:1118 (Diff revision 1) > this._log.info("One of: no meta, no meta storageVersion, or no meta syncID. Fresh start needed."); > > // abort the server wipe if the GET status was anything other than 404 or 200 > - let status = this.recordManager.response.status; > + status = this.recordManager.response.status; > if (status != 200 && status != 404) { > this.status.sync = METARECORD_DOWNLOAD_FAIL; ISTM that you've basically prevented this block from being entered?
Attachment #8752437 - Flags: review?(markh)
Comment on attachment 8752437 [details] Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/1-2/
Attachment #8752437 - Flags: review?(markh)
Comment on attachment 8752437 [details] Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. https://reviewboard.mozilla.org/r/52619/#review51142 ::: services/sync/modules/service.js:1063 (Diff revision 2) > // anything (e.g. key setup). > _remoteSetup: function _remoteSetup(infoResponse) { > let reset = false; > > this._log.debug("Fetching global metadata record"); > let meta = this.recordManager.get(this.metaURL); Looking at this more, we can't reference .response.status here - this.recordManager.get() might return a cached version - but only the *record* is cached, not the response - so there's a chance this.recordManager.response.status will be from a completely unrelated request here? It also looks like the existing code has the same problem down in the "// abort the server wipe if" block, but the reference a few lines down from here is OK as it ensures it really did make the request. ::: services/sync/modules/service.js:1083 (Diff revision 2) > > // ... fetch the current record from the server, and COPY THE FLAGS. > let newMeta = this.recordManager.get(this.metaURL); > + status = this.recordManager.response.status; > > - // If we got a 401, we do not want to create a new meta/global - we > + if (status === 404) { https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style says to avoid "strict" equality and I don't see a reason you actually need them here. ::: services/sync/modules/service.js:1100 (Diff revision 2) > - this._log.warn("Unable to upload new meta/global. Failing remote setup."); > + this._log.warn("Unable to upload new meta/global (" + uploadRes.status + "). Failing remote setup."); > this.errorHandler.checkServerError(uploadRes); > return false; > } > + } else if (status === 200 && newMeta) { > + newMeta.isNew = newMeta.isNew; This looks wrong? ::: services/sync/modules/service.js:1109 (Diff revision 2) > } > > // Switch in the new meta object and record the new time. > meta = newMeta; > this.metaModified = infoResponse.obj.meta; > + } else if (status !== 200 && status !== 404) { The 404 check looks wrong here - I'd have thought that's impossible given the changes above. Also, there should either be a trailing "else", or a comment saying why there isn't one - ie, it *looks* like maybe this should be written: } else if (status != 200) { logError = true; } // else status is 200 and we got the current meta. but this is really quite difficult to rationalize about. ::: services/sync/modules/service.js (Diff revision 2) > STORAGE_VERSION > parseFloat(remoteVersion)) { > > this._log.info("One of: no meta, no meta storageVersion, or no meta syncID. Fresh start needed."); > > - // abort the server wipe if the GET status was anything other than 404 or 200 > - let status = this.recordManager.response.status; see comment above: ie, it's not clear to me that the existing check for status is valid - but also not clear that it's invalid - I'm not clear how this code finds itself in that state yet.
Attachment #8752437 - Flags: review?(markh)
(In reply to Mark Hammond [:markh] from comment #6) > Comment on attachment 8752437 [details] > MozReview Request: Bug 1087636 - Better handling of server errors in GET > /meta/global in Sync. r=markh > > https://reviewboard.mozilla.org/r/52619/#review51142 Aargh - reviewboard lost the "summary" comment I added. In short, I think this existing code is way too complex and possibly broken, so I think we need to restructure/rewrite parts/lots of it before we can actually fix it. Let's chat more about this.
Comment on attachment 8752437 [details] Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/2-3/
Attachment #8752437 - Attachment description: MozReview Request: Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. r=markh → Bug 1087636 - Better handling of server errors in GET /meta/global in Sync.
Attachment #8752437 - Flags: review?(markh)
Assignee: nobody → edouard.oger
Blocks: 1257960
Status: NEW → ASSIGNED
Comment on attachment 8752437 [details] Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/3-4/
Comment on attachment 8752437 [details] Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. https://reviewboard.mozilla.org/r/52619/#review58514 This looks great, thanks! ::: services/sync/tests/unit/test_service_sync_remoteSetup.js:66 (Diff revision 4) > "/1.1/johndoe/storage/crypto/keys": upd("crypto", keysWBO.handler()), > "/1.1/johndoe/storage/crypto": upd("crypto", cryptoColl.handler()), > "/1.1/johndoe/storage/clients": upd("clients", clients.handler()), > - "/1.1/johndoe/storage/meta/global": upd("meta", wasCalledHandler(meta_global)), > - "/1.1/johndoe/storage/meta": upd("meta", wasCalledHandler(metaColl)), > - "/1.1/johndoe/info/collections": collectionsHelper.handler > + "/1.1/johndoe/storage/meta": upd("meta", wasCalledHandler(metaColl)) > + }; > + handlers[GLOBAL_PATH] = upd("meta", wasCalledHandler(meta_global)); Can't we just have these directly in the |handlers| assignment?
Attachment #8752437 - Flags: review?(markh) → review+
Comment on attachment 8752437 [details] Bug 1087636 - Better handling of server errors in GET /meta/global in Sync. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52619/diff/4-5/
Keywords: checkin-needed
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/fx-team/rev/652fa69526ff Better handling of server errors in GET /meta/global in Sync. r=markh
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: