Closed Bug 950476 Opened 11 years ago Closed 11 years ago

Sync not connecting to phx-sync586.services.mozilla.com

Categories

(Cloud Services :: Operations: Miscellaneous, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: jlaz)

References

Details

I was away from my desktop computer for 3 weeks. When I came back today, I got a Sync error bar telling me Sync has not been able to complete for 14 days. Looking at about:sync-log: 1387127803973 Sync.Service INFO Starting sync at 2013-12-15 09:16:43 1387127803973 Sync.Service DEBUG In sync: should login. 1387127803974 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1387127803974 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1387127803974 Sync.Service INFO Logging in user sy62mhxnajvc4viq3dvhywgohsqft7ch 1387127803974 Sync.Service DEBUG Caching URLs under storage user base: https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/ 1387127807045 Sync.SyncScheduler DEBUG Next sync in 3600000 ms. 1387127809691 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1387127815303 Sync.Tracker.History DEBUG Saving changed IDs to history 1387127822700 Sync.Service DEBUG verifyLogin failed: NS_ERROR_UNKNOWN_HOST JS Stack trace: Res_get@resource.js:413 < verifyLogin@service.js:683 < onNotify@service.js:926 < WrappedNotify@util.js:142 < WrappedLock@util.js:97 < WrappedCatch@util.js:71 < login@service.js:937 < @service.js:1174 < WrappedCatch@util.js:71 < sync@service.js:1170 1387127822700 Sync.Status DEBUG Status.login: success.login => error.login.reason.network 1387127822700 Sync.Status DEBUG Status.service: success.status_ok => error.login.failed 1387127822700 Sync.Status DEBUG Status.login: error.login.reason.network => error.login.reason.network 1387127822700 Sync.Status DEBUG Status.service: error.login.failed => error.login.failed 1387127822700 Sync.SyncScheduler DEBUG Clearing sync triggers and the global score. 1387127822700 Sync.SyncScheduler DEBUG Next sync in 3600000 ms. Confirming via manual DNS lookup, phx-sync586.services.mozilla.com is failing to resolve. My guess is this host was decommissioned and users were swung over to a new host. But how long were the redirects up? Perhaps not long enough for my 3 week idle client to not pick up the redirect? Perhaps this is a legitimate client bug where DNS failure doesn't properly cause the client to go back to the login server? I've been away from the Sync code base for so long I can't remember. Filing against Operations because DNS failure seems wrong to me.
Related: Bug 674280 and Bug 716816 are client work to recover from this kind of dead-end. But operationally: we should never, ever remove a server URL from DNS unless we are 100% sure that all of the users on that server have picked up a node reassignment on all of their devices. Once we stop advertising an adddress, clients will never be able to get a 401, and so they'll never be able to recover.
See Also: → 674280
Depends on: 950585
Throwing a followup from our IRC discussion, we've disabled DNSSEC for *.services.mozilla.com, which seems to have been causing much of the resolver related issues that we've been seeing. Much of the DNS propagation has been completed by now, but let us know if you are able to Sync successfully
Flags: needinfo?(gps)
My desktop machine is now syncing properly. 1387245716529 Sync.Service DEBUG Caching URLs under storage user base: https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/ 1387245716782 Sync.Resource DEBUG mesg: GET success 200 https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/info/collections 1387245716782 Sync.Resource DEBUG GET success 200 https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/info/collections
Flags: needinfo?(gps)
Assignee: nobody → jlaz
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.