Closed
Bug 692714
Opened 13 years ago
Closed 13 years ago
Need to handle node reassignment on all storage requests
Categories
(Firefox :: Sync, defect)
Firefox
Sync
Tracking
()
RESOLVED
FIXED
mozilla10
People
(Reporter: tchung, Assigned: rnewman)
References
Details
(Keywords: regression, Whiteboard: [fixed in services])
Attachments
(7 files, 8 obsolete files)
835 bytes,
text/rtf
|
Details | |
43.66 KB,
text/rtf
|
Details | |
25.44 KB,
patch
|
philikon
:
review+
|
Details | Diff | Splinter Review |
6.75 KB,
patch
|
Details | Diff | Splinter Review | |
9.91 KB,
patch
|
Details | Diff | Splinter Review | |
429.85 KB,
application/zip
|
Details | |
12.54 KB,
patch
|
philikon
:
review+
|
Details | Diff | Splinter Review |
I was just surfing around on the Mac Nightly, and a 401 suddenly popped up with a yellow bar, saying incorrect name or password. Clicking the preferences button popped up the username reset window. I cancelled the dialog, and the yellow bar went away.
Attaching success and error logs.
Repro:
1) install Mac nightly : Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0a1) Gecko/20111005 Firefox/10.0a1
2) log into sync on phx
3) sit for awhile, and it popped up a 401 error with a yellow bar.
Expected:
- no 401 error, no yellow bar notification
Reporter | ||
Comment 1•13 years ago
|
||
Comment 2•13 years ago
|
||
So yeah, in bug 664865 we removed a very generic handling of 401 responses that would have us detect a node reassignment on all requests in order to make it more reliable mid-sync. The problem is that in any requests leading up to a sync (info/collections, etc.) isn't covered by this and will in the worst case produce the yellow error bar telling the users their username and password are wrong.
Blocks: 664865
Summary: 401 migration error thrown → Need to handle node reassignment on all storage requests
Comment 3•13 years ago
|
||
While I haven't done a complete analysis of which requests are missing treatment for node reassignment, the evidence from Tony's logs is pretty clear. Bug 664865 landed in Firefox 7, so I'm marking everything since then as affected.
status-firefox10:
--- → affected
status-firefox7:
--- → affected
status-firefox8:
--- → affected
status-firefox9:
--- → affected
Assignee | ||
Comment 4•13 years ago
|
||
*dons dunce hat*
I wonder if I'll be awake enough to work on this tonight...
OS: Mac OS X → All
Hardware: x86 → All
Updated•13 years ago
|
Keywords: regression
Assignee | ||
Comment 5•13 years ago
|
||
Looking into this now.
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
Assignee | ||
Comment 6•13 years ago
|
||
Wrote some tests. Looks like the client requests a node for, e.g., initial info/collections requests:
Sync.Service DEBUG User-Agent: XPCShell/1 FxSync/1.13.0.20100621.
Sync.Service INFO Starting sync at 2011-10-09 19:42:48
Sync.Service DEBUG In sync: should login.
Sync.Service INFO Logging in user johndoe
Sync.Test.Server DEBUG SyncServer: Handling request: GET /1.1/johndoe/info/collections
Sync.Resource DEBUG mesg: GET fail 401 http://localhost:8080/1.1/johndoe/info/collections
Sync.Resource DEBUG GET fail 401 http://localhost:8080/1.1/johndoe/info/collections
Sync.Service WARN 401: login failed.
Sync.Service DEBUG Finding cluster for user johndoe
Client made a request for a node reassignment. << TEST CODE
Sync.Resource DEBUG mesg: GET success 200 http://localhost:8080/user/1.0/johndoe/node/weave
Sync.Resource DEBUG GET success 200 http://localhost:8080/user/1.0/johndoe/node/weave
Sync.Service DEBUG Cluster value = null
Sync.SyncScheduler DEBUG Clearing sync triggers.
Sync.Service DEBUG Exception: Login failed: error.login.reason.account No traceback available
Sync.Service DEBUG Not syncing: login returned false.
Also tested 401s for storage requests and only engine storage requests. All of them result in the clusterURL pref being reset or node/weave being fetched.
Will continue investigating.
Assignee | ||
Comment 7•13 years ago
|
||
Haven't run TPS; waiting for build.
Attachment #566624 -
Flags: review?(philipp)
Assignee | ||
Comment 8•13 years ago
|
||
Comment on attachment 566624 [details] [diff] [review]
Proposed patch. v1
Adding a couple things.
Attachment #566624 -
Flags: review?(philipp)
Assignee | ||
Comment 9•13 years ago
|
||
Today's a travel day, so on and off I'll be doing the following:
* Running TPS
* Manually checking behavior in error scenarios (such as bad stored password)
* Porting tests to use old test infrastructure for landing on Aurora and Beta
Until then, here's a patch that passes 108 tests on s-c.
Attachment #566624 -
Attachment is obsolete: true
Attachment #566863 -
Flags: review?(philipp)
Assignee | ||
Comment 10•13 years ago
|
||
TPS output:
Test Summary
TEST-PASS | test_sync.js |
TEST-PASS | test_prefs.js |
TEST-PASS | test_tabs.js |
TEST-PASS | test_passwords.js |
TEST-PASS | test_history.js |
TEST-PASS | test_formdata.js |
TEST-PASS | test_bug530717.js |
TEST-PASS | test_bug531489.js |
TEST-PASS | test_bug538298.js |
TEST-PASS | test_bug556509.js |
TEST-PASS | test_bug562515.js |
TEST-PASS | test_bug563989.js |
TEST-PASS | test_bug535326.js |
TEST-PASS | test_bug501528.js |
TEST-PASS | test_bug575423.js |
TEST-PASS | test_bug546807.js |
TEST-PASS | test_history_collision.js |
TEST-PASS | test_privbrw_formdata.js |
TEST-PASS | test_privbrw_passwords.js |
TEST-PASS | test_privbrw_tabs.js |
TEST-PASS | test_bookmarks_in_same_named_folder.js |
TEST-PASS | test_client_wipe.js |
TEST-PASS | test_special_tabs.js |
TEST-UNEXPECTED-FAIL | test_mozmill_sanity.js | test did not complete
Comment 11•13 years ago
|
||
Comment on attachment 566863 [details] [diff] [review]
Proposed patch. v2
> case 401:
> Weave.Service.logout();
> Status.login = LOGIN_FAILED_LOGIN_REJECTED;
Can you explain (best in a comment) why this is still needed here? I personally have doubts.
>+ this._log.info("Got 401 response; resetting clusterURL.");
>+ Svc.Prefs.reset("clusterURL");
>+ this._log.info("Attempting to schedule another sync.");
>+ SyncScheduler.scheduleNextSync(0);
> break;
>+/**
>+ * Make a test request to `url`, then watch the result of two syncs
>+ * to ensure that a node request was made.
>+ */
>+function syncAndExpectNodeReassignment(server, firstNotification, secondNotification, url) {
>+ function onwards() {
>+ // Set backoff to avoid repeated syncs.
>+ Status.backoffInterval = 9999999999;
>+ let nodeFetched = false;
>+ function onFirstSync() {
>+ _("First sync completed.");
>+ Svc.Obs.remove(firstNotification, onFirstSync);
>+ Svc.Obs.add(secondNotification, onSecondSync);
>+ do_check_eq(Service.clusterURL, "");
>+
>+ // Track whether we fetched node/weave. We want to wait for the second
>+ // sync to finish so that we're cleaned up for the next test, so don't
>+ // run_next_test in the node handler.
>+ nodeFetched = false;
>+
>+ // Verify that the client requests a node reassignment.
>+ // Install a node handler to watch for these requests.
>+ installNodeHandler(server, function () {
>+ nodeFetched = true;
>+ });
>+ }
>+ function onSecondSync() {
>+ _("Second sync completed.");
>+ Svc.Obs.remove(secondNotification, onSecondSync);
>+ do_check_true(nodeFetched);
>+ server.stop(run_next_test);
You forgot to Service.startOver(); here. Also, to decouple that from ErrorHandler's and SyncScheduler's own sync:finish or sync:error notifications, best to wrap it in a nextTick.
I would also like to add negative tests for weave:ui:sync:error and weave:ui:login:error, along the lines of:
function onUIError() {
do_throw("Poor user shouldn't see a thing!");
}
Svc.Obs.add("weave:ui:login:error", onUIError);
Svc.Obs.add("weave:ui:sync:error", onUIError);
Could even make this a global observer if we're confident that every test in here will not show UI messages. Otherwise you want to unregister them of course :)
r- mostly for that last point: I want to ensure the user doesn't get to see *anything*
Attachment #566863 -
Flags: review?(philipp) → review-
Assignee | ||
Comment 12•13 years ago
|
||
(In reply to Philipp von Weitershausen [:philikon] from comment #11)
> You forgot to Service.startOver(); here.
One of the tests relies on the fact that the previous test results in the user being logged in from the prior test. Starting over would cause it to pass without the fix.
I should probably make that clear, perhaps by adjusting that test to independently set up its environment.
> I would also like to add negative tests for weave:ui:sync:error and
> weave:ui:login:error, along the lines of:
Good idea.
Comment 13•13 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #12)
> I should probably make that clear, perhaps by adjusting that test to
> independently set up its environment.
I would prefer the latter.
Assignee | ||
Comment 14•13 years ago
|
||
This is turning out to be slightly more involved than I first expected. Will resume work in the morning.
Assignee | ||
Comment 15•13 years ago
|
||
Might as well get this into the review queue while I finish up the rest.
Along with
Bug 694728 - Sync: Log Status transitions
this makes the behavior of Sync in these error situations much more obvious from the log (albeit preferably in Trace mode).
Attachment #567256 -
Flags: review?(philipp)
Assignee | ||
Comment 16•13 years ago
|
||
This passes tests, and applies quite cleanly on Aurora. (Actual Aurora patch to come.)
Tests use the old HTTP test harness, which is why they're ugly and awful. Part 2 improves matters.
TPS passes. I have done some hand verification of edge cases, but would like to do more, considering the possible impact.
My current patch queue:
0 A bug-694728-status: Bug 694728 - Sync: Log Status transitions.
1 A bug-694657: Bug 694657 - Sync: test JS server 404 for GET on missing WBO.
2 A bug-692714-part-0-cleanup: Bug 692714 - Part 0: cleanup and logging.
3 A bug-692714-part-1: Bug 692714 - Part 1: handle node reassignment on all storage requests.
4 A bug-692714-part-2: Bug 692714 - Part 2: port tests to SyncServer for non-Aurora landing.
Not requesting review yet (because I want to do more testing), but feel free to review it if you want, Philipp.
Attachment #566863 -
Attachment is obsolete: true
Assignee | ||
Comment 17•13 years ago
|
||
This switches to using SyncServer, which makes the tests much cleaner (and with fewer spurious errors in the log). Depends on Bug 694744, which fixes a stoopid typo in the server code.
Assignee | ||
Comment 18•13 years ago
|
||
Attachment #567256 -
Attachment is obsolete: true
Attachment #567256 -
Flags: review?(philipp)
Attachment #567674 -
Flags: review?(philipp)
Assignee | ||
Comment 19•13 years ago
|
||
This version adds a backoff (with pref-based persistence of cross-sync 401 knowledge) and a test for same.
Manual testing with a faked 401 on /storage/ suggests that each instant sync will cause another (silent) failure, with timed retries occurring at the backoff duration.
"Sync Now" causes an Unknown Error, but improving that is outside the scope of this bug.
Tests are in the old style (see next patch), so this should apply quite cleanly on Aurora.
Attachment #567263 -
Attachment is obsolete: true
Attachment #567675 -
Flags: review?(philipp)
Assignee | ||
Comment 20•13 years ago
|
||
Because SyncServer.
Attachment #567267 -
Attachment is obsolete: true
Attachment #567676 -
Flags: review?(philipp)
Assignee | ||
Updated•13 years ago
|
Attachment #567675 -
Attachment description: Part 1: handle node reassignment on all storage requests. v1 → Part 1: handle node reassignment on all storage requests. v2
Assignee | ||
Comment 21•13 years ago
|
||
These patches touch the Service/ErrorHandler/Resource/SyncScheduler/backoff/Status mess, and attempt to ameliorate some edge cases that have not previously been considered. As such, I wouldn't be in the least bit surprised if I've broken or missed something, or at least generated a few follow-up bugs.
Assuming this passes review, I'd like to land it on s-c and get a week of banging on it to ensure coverage of all the important cases. We can request Aurora approval if it seems solid.
Assignee | ||
Comment 22•13 years ago
|
||
I've done a bunch of manual verification of this. Incorrect password still shows up in the correct spot, and overall it seems to be acting fine.
Comment 23•13 years ago
|
||
Comment on attachment 567674 [details] [diff] [review]
Part 0: cleanup and logging. v2
>@@ -134,16 +134,17 @@ let SyncScheduler = {
>
> // reset backoff info, if the server tells us to continue backing off,
> // we'll handle that later
> Status.resetBackoff();
>
> this.globalScore = 0;
> break;
> case "weave:service:sync:finish":
>+ this._log.trace("SyncScheduler handling weave:service:sync:finish.");
this._log will already print "Sync.Scheduler". Also, you have the topic variable. So why don't we put
this._log.trace("Handling " + topic);
at the very top of observe()?
>@@ -548,32 +554,35 @@ let ErrorHandler = {
> this.notifyOnNextTick("weave:ui:login:error");
> } else {
> this.notifyOnNextTick("weave:ui:clear-error");
> }
>
> this.dontIgnoreErrors = false;
> break;
> case "weave:service:sync:error":
>+ this._log.debug("ErrorHandler handling weave:service:sync:error.");
Ditto for ErrorHandler, but make it trace too!
> case "weave:service:sync:finish":
>+ this._log.debug("ErrorHandler handling weave:service:sync:finish.");
>+ this._log.debug("Status.service is " + Status.service);
Make this trace please!
>@@ -588,17 +597,17 @@ let ErrorHandler = {
> break;
> }
> },
>
> notifyOnNextTick: function notifyOnNextTick(topic) {
> Utils.nextTick(function() {
> this._log.trace("Notifying " + topic +
> ". Status.login is " + Status.login +
>- ". Status.sync is " + Status.sync);
>+ ". Status.sync is " + Status.sync + ".");
Unnecessary churn and minus points for inconsistency with your own log statement for Status.service above!
>diff --git a/services/sync/modules/service.js b/services/sync/modules/service.js
>--- a/services/sync/modules/service.js
>+++ b/services/sync/modules/service.js
>@@ -508,28 +508,31 @@ WeaveSvc.prototype = {
> // gets cluster from central LDAP server and returns it, or null on error
> _findCluster: function _findCluster() {
> this._log.debug("Finding cluster for user " + this.username);
>
> let fail;
> let res = new Resource(this.userAPI + this.username + "/node/weave");
> try {
> let node = res.get();
>+ this._log.trace("_findCluster got HTTP status " + node.status);
The logging from resource isn't enough?
>@@ -729,19 +732,19 @@ WeaveSvc.prototype = {
> } catch (ex) {
> this._log.debug("Fetching passphrase threw " + ex +
> "; assuming master password locked.");
> Status.login = MASTER_PASSWORD_LOCKED;
> return false;
> }
>
> try {
>- // Make sure we have a cluster to verify against
>- // this is a little weird, if we don't get a node we pretend
>- // to succeed, since that probably means we just don't have storage
>+ // Make sure we have a cluster to verify against.
>+ // This is a little weird, if we don't get a node we pretend
>+ // to succeed, since that probably means we just don't have storage.
If you're getting in there, please rewrap to 80 chars :)
r+ with those changes.
Attachment #567674 -
Flags: review?(philipp) → review+
Assignee | ||
Comment 24•13 years ago
|
||
(In reply to Philipp von Weitershausen [:philikon] from comment #23)
> >+ // to succeed, since that probably means we just don't have storage.
>
> If you're getting in there, please rewrap to 80 chars :)
That's 76 chars. Otherwise, all done, thanks!
Comment 25•13 years ago
|
||
Comment on attachment 567675 [details] [diff] [review]
Part 1: handle node reassignment on all storage requests. v2
>diff --git a/services/sync/modules/policies.js b/services/sync/modules/policies.js
>--- a/services/sync/modules/policies.js
>+++ b/services/sync/modules/policies.js
>@@ -151,16 +151,23 @@ let SyncScheduler = {
>
> let sync_interval;
> this._syncErrors = 0;
> if (Status.sync == NO_SYNC_NODE_FOUND) {
> this._log.trace("Scheduling a sync at interval NO_SYNC_NODE_FOUND.");
> sync_interval = NO_SYNC_NODE_INTERVAL;
> }
> this.scheduleNextSync(sync_interval);
>+
>+ if (Status.sync == SYNC_SUCCEEDED) {
>+ // Great. Let's clear our mid-sync 401 note.
>+ this._log.trace("Clearing lastSyncReassigned.");
>+ Svc.Prefs.reset("lastSyncReassigned");
>+ }
Why are we doing this in SyncScheduler and not in ErrorHandler? This is where the pref gets set so it should also be where it gets reset. Encapsulation FTW! (r- for this)
>@@ -734,16 +741,25 @@ let ErrorHandler = {
>
> let lastSync = Svc.Prefs.get("lastSync");
> if (lastSync && ((Date.now() - Date.parse(lastSync)) >
> Svc.Prefs.get("errorhandler.networkFailureReportTimeout") * 1000)) {
> Status.sync = PROLONGED_SYNC_FAILURE;
> this._log.trace("shouldReportError: true (prolonged sync failure).");
> return true;
> }
>+
>+ // We got a 401 mid-sync. Wait for the next sync before actually handling
>+ // an error. This assumes that we'll get a 401 again on a login fetch in
>+ // order to report the error.
>+ if (Weave.Service.clusterURL == "") {
if (!Weave.Service.clusterURL) ?
> case 401:
> Weave.Service.logout();
>- Status.login = LOGIN_FAILED_LOGIN_REJECTED;
>+ this._log.info("Got 401 response; resetting clusterURL.");
>+ Svc.Prefs.reset("clusterURL");
>+
>+ let delay = 0;
>+ if (Svc.Prefs.get("lastSyncReassigned")) {
>+ this._log.warn("Last sync also failed for 401. Delaying next sync.");
>+ delay = MINIMUM_BACKOFF_INTERVAL;
Add a comment here along the lines of
// We're getting another 401 mid-sync when the last sync already did that. It can't
// be a wrong password since login worked properly. Probably an intermittent problem
// on the server, so let's give it some space to recover.
>diff --git a/services/sync/modules/service.js b/services/sync/modules/service.js
>--- a/services/sync/modules/service.js
>+++ b/services/sync/modules/service.js
>@@ -917,16 +917,17 @@ WeaveSvc.prototype = {
> CollectionKeys.clear();
>
> /* Login and sync. This also generates new keys. */
> this.sync();
> return true;
> }))(),
>
> startOver: function() {
>+ this._log.info("Invoking Service.startOver.");
trace!
I skimmed over the tests, they look good (and familiar from the WIP). Will take a closer look when you upload a new version of this patch :)
Attachment #567675 -
Flags: review?(philipp) → review-
Comment 26•13 years ago
|
||
Comment on attachment 567676 [details] [diff] [review]
Part 2: port tests to SyncServer for non-Aurora landing. v2
>@@ -66,40 +64,21 @@ function installNodeHandler(server, next
> function handleNodeRequest(req, resp) {
> _("Client made a request for a node reassignment.");
> resp.setStatusLine(req.httpVersion, 200, "OK");
> resp.setHeader("Content-Type", "text/plain");
> resp.bodyOutputStream.write(newNodeBody, newNodeBody.length);
> Utils.nextTick(next);
> }
> let nodePath = "/user/1.0/johndoe/node/weave";
>- server.registerPathHandler(nodePath, handleNodeRequest);
>+ server.server.registerPathHandler(nodePath, handleNodeRequest);
server.server? Meh... :)
Attachment #567676 -
Flags: review?(philipp) → review+
Assignee | ||
Comment 27•13 years ago
|
||
Review comments addressed. Tests pass. Part 2 applies cleanly.
Attachment #567675 -
Attachment is obsolete: true
Attachment #567939 -
Flags: review?(philipp)
Assignee | ||
Comment 28•13 years ago
|
||
Updated part 0.
Attachment #567674 -
Attachment is obsolete: true
Assignee | ||
Comment 29•13 years ago
|
||
Updated Part 2.
Attachment #567676 -
Attachment is obsolete: true
Comment 30•13 years ago
|
||
Comment on attachment 567939 [details] [diff] [review]
Part 1: handle node reassignment on all storage requests. v3
>@@ -151,16 +152,17 @@ let SyncScheduler = {
>
> let sync_interval;
> this._syncErrors = 0;
> if (Status.sync == NO_SYNC_NODE_FOUND) {
> this._log.trace("Scheduling a sync at interval NO_SYNC_NODE_FOUND.");
> sync_interval = NO_SYNC_NODE_INTERVAL;
> }
> this.scheduleNextSync(sync_interval);
>+
> break;
Spurious line break, pls remove :)
>+ Svc.Obs.add("weave:service:sync:error", onSyncError);
>+ function onSyncError() {
>+ _("Got weave:service:sync:error in first sync.");
>+ // Wait for the automatic next sync.
>+ Svc.Obs.remove("weave:service:sync:error", onSyncError);
>+ Svc.Obs.add("weave:service:login:error", onLoginError);
>+ }
>+
>+ function onLoginError() {
>+ _("Got weave:service:login:error in second sync.");
>+ Svc.Obs.remove("weave:service:login:error", onLoginError);
>+
>+ do_check_eq(Status.login, LOGIN_FAILED_LOGIN_REJECTED);
>+ do_check_false(Service.isLoggedIn);
>+
>+ // Clean up.
>+ Utils.nextTick(function () {
>+ Service.startOver();
>+ server.stop(run_next_test);
>+ });
>+ }
>+
I'm almost inclined to prefer that onLoginError were nested inside onSyncError to indicate succession. Or do what you did in test_node_reassignment.js and call them onFirstLoginError and onSecondSyncError or something? Don't feel strongly about it...
>+ function onSecondSync() {
>+ _("Second sync completed.");
>+ Svc.Obs.remove(secondNotification, onSecondSync);
>+ SyncScheduler.clearSyncTriggers();
>+
>+ // Make absolutely sure that any event listeners are done with their work
>+ // before we proceed.
>+ nthTick(4, function () {
So how are you coming up with the number 4? What's really involved behind the scenes? If it's a timer, you want to wait 150ms and then do one or two next ticks. See the waitForZeroTimer() helper in test_syncscheduler.js. Feel free to move it to head_helpers.js and reuse it :)
Looks good otherwise! r=me
Attachment #567939 -
Flags: review?(philipp) → review+
Assignee | ||
Comment 31•13 years ago
|
||
https://hg.mozilla.org/services/services-central/rev/9982ff744731
https://hg.mozilla.org/services/services-central/rev/a8acc35de8a5
https://hg.mozilla.org/services/services-central/rev/a90cea9fee21
Whiteboard: [fixed in services]
Assignee | ||
Comment 32•13 years ago
|
||
Will request approval for Aurora after this has gone through QA.
Background for QA:
This set of patches addresses handling of 401 responses during a sync. In particular 401 responses that indicate a node reassignment, and in particular those received mid-sync (as would happen if a user is reassigned whilst they are syncing).
The desired behavior in this case is as follows:
* Abort the current sync
* Do not show an error bar (unless this was a Sync Now, or the usual prolonged error handling kicks in)
* Log out and clear our cluster URL
* Schedule another sync
* Allow the login logic on the next sync to pick up a node reassignment or detect a bad password.
A subsequent mid-sync failure, as would happen in the case of a pernicious server issue, will result in a minimum backoff.
So, please verify:
* Syncing normally continues to work.
* An ordinary node reassignment outside of a sync results in the usual client behavior: silently picking up the new server and doing a first sync.
* A node reassignment within a sync (easily simulated by having Zeus return a 401 for /storage/) results in a silent immediate abort and retry, followed on repeated errors by repeated retries ever MINIMUM_BACKOFF_INTERVALs.
* "Instant Sync" syncs during this backoff time will proceed as normal (that is, triggering a sync, failing, and not reporting an error).
* A bad password is reported as expected, with no subsequent attempt scheduled until the password is changed.
Comment 33•13 years ago
|
||
Assignee | ||
Updated•13 years ago
|
Attachment #568490 -
Attachment mime type: text/plain → application/zip
Assignee | ||
Comment 34•13 years ago
|
||
13:17:01 <@rnewman> so checkSyncStatus is scheduling another sync, then my 401 handling code gets to run
13:17:39 <@rnewman> you can see in the log: we schedule one sync, then schedule an immediate sync, then we finish up and schedule a sync for 90 seconds
13:17:56 <@rnewman> well, the last schedule is an interval change
13:17:58 <@rnewman> but still
13:18:04 <@rnewman> there's way too much logic firing here
13:19:02 <@rnewman> next line in the next log: 1319139800308 Sync.SyncScheduler TRACE There's already a sync scheduled in 88692 ms.
13:19:14 <@rnewman> so we have kicked off a sync and still have one scheduled
13:19:17 < tracy> 2 syncs per second were firing, 'til I had atoll shot off the 401
13:19:23 <@atoll> yeah
13:19:56 <@rnewman> yes, on each failure it's scheduling another sync, exactly as it was when I tested
13:20:06 <@rnewman> but something is also causing an immediate sync to be scheduled
13:20:22 <@rnewman> presumably because of some small discrepancy introduced in event handling order
13:20:35 <@rnewman> so despite the backoff I added, the loop still occurs
13:21:43 < tracy> immediateInterval was last before I shut browser off. this test was triggered by that timer being past, thus syncing on startup.
13:21:54 < tracy> if that helps at all.
13:22:46 < tracy> all of the ever half second syncs are setting to activeInterval.
Assignee | ||
Comment 35•13 years ago
|
||
13:30:44 <@rnewman> OK, I found the bug
13:30:51 <@atoll> with 10 minutes to spare!
13:30:57 <@rnewman> at the end of the *first* 401:
13:31:01 <@rnewman> 1319139799468 Sync.ErrorHandler TRACE Clearing lastSyncReassigned.
13:31:12 <@rnewman> so we're making a note that next time we need to back off...
13:31:19 <@rnewman> and then we clear the note before the next sync!
13:31:46 <@rnewman> this is occurring because we got sync:finish, but Status.sync == SYNC_SUCCEEDED
13:31:49 <@rnewman> which is wrong
13:32:46 <@rnewman> I don't see Status.sync *ever* being adjusted in these logs
13:32:50 <@rnewman> Status.service is
13:33:50 <@rnewman> so I believe that if we can get the status to be correct by
the time finish is called, or we alter the check, things
will start working as they did when I tested this myself
Comment 36•13 years ago
|
||
Backed out on s-c: https://hg.mozilla.org/services/services-central/rev/b957b1741f68
Whiteboard: [fixed in services]
Assignee | ||
Comment 37•13 years ago
|
||
I spent a little quality time tonight with some bourbon and Vim.
I don't know at what point it regressed, but even in my tests it's plain that lastSyncReassigned was being reset at the end of the bad sync, because we were only checking one Status flag. Status needs to die.
When I added a login observer to check lastSyncReassigned was still set (given that we expect it to be unset at the end of a completely successful sync), my test failed. Hurrah!
This is a delta patch for purposes of examination. It includes the following:
* We no longer simply check Status.sync; we verify that both Status.sync and Status.service are set to their respective success values before we clear the "back off, I got a 401" pref.
* I added a test for 401 looping on a particular engine, not just all of storage, in case that case ever differs.
* I added the login observer to the node reassignment tests to verify that the pref sticks around.
Status needs to die.
I welcome feedback from interested parties, when they're done turning petroleum and rubber into noise!
Attachment #568603 -
Flags: feedback?
Assignee | ||
Updated•13 years ago
|
Attachment #568603 -
Flags: feedback? → feedback?(philipp)
Comment 38•13 years ago
|
||
Comment on attachment 568603 [details] [diff] [review]
Possible incremental remediation. v1
Status is a total jerk.
Attachment #568603 -
Flags: feedback?(philipp) → review+
Assignee | ||
Comment 39•13 years ago
|
||
Relanded and fixed:
https://hg.mozilla.org/services/services-central/rev/9807ab756d1d
https://hg.mozilla.org/services/services-central/rev/949055398e11
https://hg.mozilla.org/services/services-central/rev/373e670ec4b7
STR in Comment 32 as before.
Whiteboard: [fixed in services]
Comment 40•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/9807ab756d1d
https://hg.mozilla.org/mozilla-central/rev/949055398e11
https://hg.mozilla.org/mozilla-central/rev/373e670ec4b7
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Updated•13 years ago
|
status-firefox10:
affected → ---
Updated•6 years ago
|
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in
before you can comment on or make changes to this bug.
Description
•