Closed Bug 1570491 Opened 6 years ago Closed 6 years ago

Firefox Sync stopped working: GET fail 503 info/collections

Categories

(Firefox :: Firefox Accounts, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: joystus, Unassigned)

References

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

Try to sync account (bookmarks, addons, settings).

Actual results:

Fails, with following log (similar on multiple machines connected to the same account):

1564608078180 Sync.LogManager DEBUG Flushing file log
1564608078180 FirefoxAccounts TRACE not checking freshness of profile as it remains recent
1564608078195 Sync.LogManager DEBUG Log cleanup threshold time: 1563744078195
1564608078213 Sync.LogManager DEBUG Done deleting files.
1564608081363 Sync.RemoteTabs INFO Generating tab list with filter
1564608081363 Sync.RemoteTabs INFO Final tab list has 0 clients with 0 tabs.
1564608212826 FirefoxAccounts INFO Polling device commands.
1564608212826 Sync.Service DEBUG User-Agent: Firefox/68.0.1 (Windows NT 10.0; Win64; x64) FxSync/1.70.0.20190717172542.desktop
1564608212826 Sync.Service INFO Starting sync at 2019-07-31 23:23:32 in browser session hCmBNbumh4Be
1564608212827 Sync.Service DEBUG In sync: should login.
1564608212827 Sync.Service INFO User logged in successfully - verifying login.
1564608212829 Sync.BrowserIDManager DEBUG unlockAndVerifyAuthState already has (or can fetch) sync keys
1564608212829 Sync.Status DEBUG Status.login: error.sync.reason.serverMaintenance => success.status_ok
1564608212829 Sync.Status DEBUG Status.service: error.login.failed => error.login.failed
1564608212829 Sync.Service DEBUG Fetching unlocked auth state returned success.status_ok
1564608212830 Services.Common.RESTRequest TRACE local time, offset: 1564608212830, -1376
1564608212834 Services.Common.RESTRequest DEBUG GET request to https://api.accounts.firefox.com/v1/account/device/commands?index=0
1564608212835 Services.Common.RESTRequest TRACE HTTP Header authorization: ***** (suppressed)
1564608212835 Services.Common.RESTRequest TRACE HTTP Header content-type: text/plain
1564608212835 Services.Common.RESTRequest TRACE HTTP Header accept-language: pl,en,en-us,da
1564608213473 Sync.Resource DEBUG GET fail 503 https://sync-578-us-west-2.sync.services.mozilla.com/1.5/90689953/info/collections
1564608213473 Sync.Resource WARN GET request to https://sync-578-us-west-2.sync.services.mozilla.com/1.5/90689953/info/collections failed with status 503
1564608213474 Sync.Status DEBUG Status.login: success.status_ok => error.login.reason.server
1564608213474 Sync.Status DEBUG Status.service: error.login.failed => error.login.failed
1564608213474 Sync.ErrorHandler DEBUG Got Retry-After: 3600
1564608213474 Sync.Status DEBUG Status.login: error.login.reason.server => error.sync.reason.serverMaintenance
1564608213474 Sync.Status DEBUG Status.service: error.login.failed => error.login.failed
1564608213474 Sync.SyncScheduler DEBUG Got backoff notification: 3600000ms
1564608213475 Sync.SyncScheduler DEBUG Fuzzed minimum next sync: 1564611813474
1564608213475 Sync.ErrorHandler ERROR Sync encountered a login error
1564608213475 Sync.SyncScheduler DEBUG Clearing sync triggers and the global score.
1564608213476 Sync.SyncScheduler DEBUG Next sync in 4121483.8759988826 ms. (why=schedule)
1564608213479 Sync.Service DEBUG Exception calling WrappedLock: Error: Login failed: error.sync.reason.serverMaintenance(resource://services-sync/service.js:869:15) JS Stack trace: onNotify@service.js:869:15
1564608213481 Sync.Service DEBUG Not syncing: login returned false.

Component: Untriaged → Firefox Accounts

Hey Bob,
Is there any issue with sync-578-us-west-2?

Flags: needinfo?(bobm)

There is another report on errors on the same sync node in Bug 1570653

There was a perfect storm of issues that combined to cause this problem and for us to not catch it. This server rebooted and got into a partially operating state. There was a configuration management race condition that caused TokuDB not to load when the database was started, which caused all database operations to fail. Memcached was running so those requests were working. At the same time, the metrics collection was in a bad state and not reporting the 5xx errors which would normally signal this problem. The Sync heartbeat request does not check for database operations so we didn't catch it there either.

I'll file a bug to fix the race condition that's breaking TokuDB. Changing our metrics collection to Telegraf is already on my plate, so I'll file a bug for that. We should also consider making the heartbeat query more robust.

Flags: needinfo?(bobm)
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.