Closed Bug 950476 Opened 11 years ago Closed 10 years ago

Sync not connecting to phx-sync586.services.mozilla.com

Categories

(Cloud Services :: Operations: Miscellaneous, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: jlaz)

References

Details

I was away from my desktop computer for 3 weeks. When I came back today, I got a Sync error bar telling me Sync has not been able to complete for 14 days. Looking at about:sync-log:

1387127803973	Sync.Service	INFO	Starting sync at 2013-12-15 09:16:43
1387127803973	Sync.Service	DEBUG	In sync: should login.
1387127803974	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1387127803974	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1387127803974	Sync.Service	INFO	Logging in user sy62mhxnajvc4viq3dvhywgohsqft7ch
1387127803974	Sync.Service	DEBUG	Caching URLs under storage user base: https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/
1387127807045	Sync.SyncScheduler	DEBUG	Next sync in 3600000 ms.
1387127809691	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1387127815303	Sync.Tracker.History	DEBUG	Saving changed IDs to history
1387127822700	Sync.Service	DEBUG	verifyLogin failed: NS_ERROR_UNKNOWN_HOST JS Stack trace: Res_get@resource.js:413 < verifyLogin@service.js:683 < onNotify@service.js:926 < WrappedNotify@util.js:142 < WrappedLock@util.js:97 < WrappedCatch@util.js:71 < login@service.js:937 < @service.js:1174 < WrappedCatch@util.js:71 < sync@service.js:1170
1387127822700	Sync.Status	DEBUG	Status.login: success.login => error.login.reason.network
1387127822700	Sync.Status	DEBUG	Status.service: success.status_ok => error.login.failed
1387127822700	Sync.Status	DEBUG	Status.login: error.login.reason.network => error.login.reason.network
1387127822700	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1387127822700	Sync.SyncScheduler	DEBUG	Clearing sync triggers and the global score.
1387127822700	Sync.SyncScheduler	DEBUG	Next sync in 3600000 ms.

Confirming via manual DNS lookup, phx-sync586.services.mozilla.com is failing to resolve.

My guess is this host was decommissioned and users were swung over to a new host. But how long were the redirects up? Perhaps not long enough for my 3 week idle client to not pick up the redirect? Perhaps this is a legitimate client bug where DNS failure doesn't properly cause the client to go back to the login server? I've been away from the Sync code base for so long I can't remember.

Filing against Operations because DNS failure seems wrong to me.
Related: Bug 674280 and Bug 716816 are client work to recover from this kind of dead-end.

But operationally: we should never, ever remove a server URL from DNS unless we are 100% sure that all of the users on that server have picked up a node reassignment on all of their devices.

Once we stop advertising an adddress, clients will never be able to get a 401, and so they'll never be able to recover.
See Also: → 674280
Depends on: 950585
Throwing a followup from our IRC discussion, we've disabled DNSSEC for *.services.mozilla.com, which seems to have been causing much of the resolver related issues that we've been seeing.  Much of the DNS propagation has been completed by now, but let us know if you are able to Sync successfully
Flags: needinfo?(gps)
My desktop machine is now syncing properly.

1387245716529	Sync.Service	DEBUG	Caching URLs under storage user base: https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/
1387245716782	Sync.Resource	DEBUG	mesg: GET success 200 https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/info/collections
1387245716782	Sync.Resource	DEBUG	GET success 200 https://phx-sync586.services.mozilla.com/1.1/sy62mhxnajvc4viq3dvhywgohsqft7ch/info/collections
Flags: needinfo?(gps)
Assignee: nobody → jlaz
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.