Closed Bug 742485 Opened 13 years ago Closed 13 years ago

Node assignment sends invalid "null" cluster response when no node is available

Categories

(Cloud Services :: Server: Registration, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jbonacci, Assigned: telliott)

References

Details

(Whiteboard: [qa+])

Attachments

(1 file)

atoll and I tested this against stage re: No node found Disable node assignments in Stage Set up a new profile/account Wait for first sync or force a Sync Now Error bar appears about "NS_ERROR_UNKNOWN_HOST" Cluster assignment ends up being https:https://null/ Sync log looks like this: ...etc... 1333567974789 Sync.Service INFO Account created: BLAH 1333567974803 Sync.Status DEBUG Status.service: service.client_not_configured => success.status_ok 1333567974803 Sync.AddonsReconciler INFO Registering as Add-on Manager listener. 1333567974803 Sync.AddonsReconciler DEBUG Adding change listener. 1333567974868 Sync.Service DEBUG User-Agent: Firefox/11.0 FxSync/1.14.0.20120312181643. 1333567974868 Sync.Service INFO Starting sync at 2012-04-04 12:32:54 1333567974868 Sync.Service DEBUG In sync: should login. 1333567974869 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1333567974869 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1333567974869 Sync.Service INFO Logging in user vonxzneli6fhmoby3opjh3xujp6j2o7x 1333567974869 Sync.Service DEBUG Finding cluster for user vonxzneli6fhmoby3opjh3xujp6j2o7x 1333567974990 Sync.Resource DEBUG mesg: GET success 200 https://stage-auth.services.mozilla.com/user/1.0/BLAH/node/weave 1333567974990 Sync.Resource DEBUG GET success 200 https://stage-auth.services.mozilla.com/user/1.0/BLAH/node/weave 1333567974991 Sync.Service DEBUG Cluster value = https://null/ 1333567974991 Sync.Service DEBUG Setting cluster to https://null/ 1333567974991 Sync.Service DEBUG Caching URLs under storage user base: https://null/1.1/BLAH/ 1333567975085 Sync.Service DEBUG verifyLogin failed: NS_ERROR_UNKNOWN_HOST JS Stack trace: Res_get()@resource.js:483 < ()@service.js:749 < WrappedNotify()@util.js:148 < verifyLogin()@service.js:717 < ()@service.js:1006 < WrappedNotify()@util.js:148 < WrappedLock()@util.js:103 < WrappedCatch()@util.js:77 < WeaveSvc_login()@service.js:980 < ()@service.js:1272 < WrappedCatch()@util.js:77 < sync()@service.js:1268 1333567975085 Sync.Status DEBUG Status.login: error.login.reason.no_username => error.login.reason.network 1333567975085 Sync.Status DEBUG Status.service: success.status_ok => error.login.failed 1333567975085 Sync.Status DEBUG Status.login: error.login.reason.network => error.login.reason.network 1333567975085 Sync.Status DEBUG Status.service: error.login.failed => error.login.failed 1333567975086 Sync.Tracker.Clients DEBUG client.name preference changed 1333567975086 Sync.Tracker.Clients WARN Attempted to add undefined ID to tracker 1333567975087 Sync.SyncScheduler DEBUG Clearing sync triggers and the global score. 1333567975087 Sync.SyncScheduler DEBUG Next sync in 86400000 ms.
It takes about 60 seconds to test this. UPDATE weave.available_nodes SET available=0; for Sync 1.1, at least, in staging. And immediately broke my Aurora 2012-04-04 UI with a progress bar stuck at 0% and, if I do Sync Now, a yellow error bar complaining about NS_ERROR_UNKNOWN_HOST, probably related to one of the sync about:config URL preferences being, in its entirety, "user/". (And jbonacci got cluster URL "https://null/", because the client is failing to handle the server's "no nodes available" null response.) - R.
Priority: -- → P1
Summary: Sync returns "null" cluster when no node is available → Sync client mis-handles "null" cluster response when no node is available
Depends on: 674280
If you "Sync Now" and an error is encountered, there is supposed to be an error bar telling you why Sync didn't work. The bug is the crummy error message. I think it's also fair to notify the user, even if from, auto sync, if they were not assigned a node.
Yea, atoll saw that error bar. I did not get one, which is rather strange. But I got the accompanying error message in the sync log.
There is/was great UI for this, back in the day. rnewman found 12:00 <rnewman> browser-syncui.js:157, onSyncDelay , which may be of some relevance in relinking it in modern UI.
We have a test for this at test_service_cluster.js:90. Test output: A 'null' response won't make a difference either. Sync.Identity INFO Username changed. Removing stored credentials. Sync.Identity INFO Basic password has no value. Removing. Sync.Identity INFO Sync Key has no value. Deleting. Sync.Service DEBUG Finding cluster for user jimdoe Sync.Resource DEBUG No authenticator found. Sync.Resource DEBUG mesg: GET success 200 http://localhost:8080/user/1.0/jimdoe/node/weave Sync.Resource DEBUG GET success 200 http://localhost:8080/user/1.0/jimdoe/node/weave Sync.Service DEBUG Cluster value = null TEST-PASS | /Users/gps/src/services-central/obj-ff-dbg/_tests/xpcshell/services/sync/tests/unit/test_service_cluster.js | [test_setCluster : 92] false == false TEST-PASS | /Users/gps/src/services-central/obj-ff-dbg/_tests/xpcshell/services/sync/tests/unit/test_service_cluster.js | [test_setCluster : 93] http://weave.user.node/ == http://weave.user.node/ I don't suppose it is possible that the server is sending "https://null/" as the response body? I can easily add some logging and test on stage...
The HTTP response body sent by the server is "https://null/". I added some additional logging to service.js and captured it on stage: 1333648724641 Sync.Service INFO Find cluster response code: 200 1333648724641 Sync.Service INFO Find cluster response body: 'https://null/' 1333648724641 Sync.Service DEBUG Cluster value = https://null/ Not sure if production is affected. If so, that would be very bad. Also, I'm not sure I got the right Bugzilla component.
Component: Firefox Sync: Backend → Server: Registration
QA Contact: sync-backend → reg-server
Summary: Sync client mis-handles "null" cluster response when no node is available → Node assignment sends invalid "null" cluster response when no node is available
Had to rewrite an entire test suite to find it, but eventually tracked it down. There's a small hole in the communication between node and snode. This patch is 2 lines to fix it, then a whole lot of new testing :P
Assignee: nobody → telliott
Attachment #612673 - Flags: review?(rkelly)
:telliott thanks for the digging. Services QA will add this to our "must test" list for AITC client.
Whiteboard: [qa+]
I remember reading it on IRC, but can somebody please document the production (non-?)impact of this bug as a bug comment.
The early assessment is correct. If we (as the only people using the node/snode split) ever run out of nodes, the user will get a https://null/ This has never happened and would pretty much require a DOS to occur, but it would be client-affecting and that's why we'll fix it.
Attachment #612673 - Flags: review?(rkelly) → review+
Fixed in http://hg.mozilla.org/services/server-node-assignment/rev/d3b1cda1f77f Will open a bug to do a production push.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 744051
Verified as part of the Production push for core and node.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: