Closed
Bug 1045691
Opened 10 years ago
Closed 10 years ago
Firefox sync fails with "unknown error" graybar message and 502s
Categories
(Cloud Services :: Operations: Miscellaneous, task)
Cloud Services
Operations: Miscellaneous
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: michal, Assigned: bobm)
References
Details
(Whiteboard: [qa+])
Attachments
(6 files)
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release) Build ID: 20140716183446 Steps to reproduce: Since 28/7/2014 my two Firefox installations started to spit "unknown errors" while trying to sync data. One is FF31 on Windows 7 32bit, other one FF31 on Gentoo Linux. This issue has been confirmed with two other users, who reported simillar issue on #sync IRC channel. Actual results: 502 bad gateway error is seen in attached sync-log. The problem is persistent. Happens every time a sync attempt is initiated, on both platforms. Expected results: Successfull sync shoud occur. ;-)
Comment 1•10 years ago
|
||
There were some server-side issues on a couple of our storage nodes, which should now be resolved. Can you please confirm whether sync is working now for you?
Flags: needinfo?(michal)
Reporter | ||
Comment 2•10 years ago
|
||
Hello, I don't see any more 502 bad gateway error in sync-log, but a new one is beeing shown. Don't know, if it is relevant to this issue. Log attached.
Flags: needinfo?(michal)
Comment 3•10 years ago
|
||
I see: "history failed: Error: App. Quitting" This usually indicates that Firefox was shut down mid-sync, and things should pick up again when you restart firefox. If this happens sporadically then it shouldn't be a problem, but if you see these errors from pretty much every run of firefox, it may indicate a deeper issue. I will tentatively close this as resolved since the 502/grey-bar is gone; please don't hesitate the file a new bug if the above errors persist.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•10 years ago
|
||
After letting Firefox more time to sync with all accumulated changes, problem seems to be solved indeed. Thank you.
Reporter | ||
Comment 5•10 years ago
|
||
Well, looks like someone is playing with me. :-) 502 bad gateway error just got back with graybar unknown error message. :-( I'm attaching sync-log again.
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 6•10 years ago
|
||
Updated•10 years ago
|
Status: VERIFIED → REOPENED
Ever confirmed: true
Resolution: FIXED → ---
Updated•10 years ago
|
Whiteboard: [qa+]
Updated•10 years ago
|
Summary: Firefox sync fails with "unknown error" graybar message → Firefox sync fails with "unknown error" graybar message and 502s
Comment 7•10 years ago
|
||
I have this exact same issue. Log attached.
Comment 8•10 years ago
|
||
I'm running Disconnet.Me, Ad Block Plus, and HTTPS everywhere if those would affect sync connectivity
Comment 9•10 years ago
|
||
me too 1406830720223 Sync.Collection DEBUG mesg: POST fail 502 https://sync-39-us-west-2.sync.services.mozilla.com/1.5/1798505/storage/history 1406830720223 Sync.Collection DEBUG POST fail 502 https://sync-39-us-west-2.sync.services.mozilla.com/1.5/1798505/storage/history 1406830720224 Sync.Engine.History DEBUG Uploading records failed: <html> <head><title>502 Bad Gateway</title></head> <body bgcolor="white"> <center><h1>502 Bad Gateway</h1></center> <hr><center>nginx/1.6.0</center> </body> </html> 1406830720244 Sync.Status DEBUG Status for engine history: error.engine.reason.record_upload_fail 1406830720244 Sync.Status DEBUG Status.service: success.status_ok => error.sync.failed_partial 1406830720244 Sync.ErrorHandler DEBUG history failed: <html> <head><title>502 Bad Gateway</title></head> <body bgcolor="white"> <center><h1>502 Bad Gateway</h1></center> <hr><center>nginx/1.6.0</center> </body> </html> I don't get this with CLI: https://github.com/edmoz/fxa-sync-client
Comment 10•10 years ago
|
||
Looks like ongoing performance issues on sync-39-us-west-2.sync.services.mozilla.com, escalating for Bob's attention. Bob, does this machine have the increased --graceful-timeout value? If not, it may be worth rolling out even without the other config changes.
Flags: needinfo?(bobm)
Comment 11•10 years ago
|
||
I have been having these issues for a couple of days now across multiple platforms. I do not get a 502, but instead a 504 from sync-40-us-west-2.sync.services.mozilla.com 1406876833967 Sync.Resource DEBUG mesg: GET fail 504 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/<snip>/info/collections 1406876833967 Sync.Resource DEBUG GET fail 504 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/<snip>/info/collections
Comment 12•10 years ago
|
||
I randomly get both errors 504 and 502, which lead to that graybar appearing randomly. You can see the errors here: https://gist.github.com/gonzalezsieira/c28dbd8a8c151f2f66c9
Comment 13•10 years ago
|
||
I have this problem too. I think it is worth mentioning this issue on the status page since I had to go to #sync on IRC to learn about this ticket:
>15:39 <BYK> anyone there?
>15:39 <BYK> Getting 504 for https://sync-40-us-west-2.sync.services.mozilla.com/1.5/1871300/info/collections request for the last 2 days now
>15:46 <@rnewman> BYK: known problem, bug 1045691 is probably the best place to track
>15:46 <firebot> https://bugzil.la/1045691 — REOPENED — Firefox sync fails with "unknown error" graybar message and 502s
>15:47 <BYK> rnewman: oh, thanks!
>15:47 <BYK> the status page should indicate this IMO
>15:47 <BYK> any plans for that?
>15:47 <@rnewman> I think it's an AWS issue that's only affecting one node; it's up to services ops to decide whether to post on one of the status pages
>15:47 <@rnewman> might be done on a different one to the one you're looking at
>15:49 <BYK> rnewman: alright, will mention this on the ticket
Also it's a bit weird to see that I'm trying to connect to US West Coast servers since I'm based in Turkey.
Assignee | ||
Comment 14•10 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #10) > Looks like ongoing performance issues on > sync-39-us-west-2.sync.services.mozilla.com, escalating for Bob's attention. > Bob, does this machine have the increased --graceful-timeout value? If not, > it may be worth rolling out even without the other config changes. It does. All machines have a --graceful-timeout setting now. Also, ~20% of the users on that node have been migrated. Operations is discussing how to revamp the status page for Sync. I filed a bug 1047498 to track that issue.
Flags: needinfo?(bobm)
Comment 15•10 years ago
|
||
Apologies if I'm not submitting this correctly.
Comment 16•10 years ago
|
||
Sync log showing the intermittent nature of the problem - initial GET and POST both return 200, but subsequent post returns 504.
Comment 17•10 years ago
|
||
Still having problems 1407187537416 Sync.Collection DEBUG mesg: POST success 200 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/4693169/storage/forms 1407187537416 Sync.Collection DEBUG POST success 200 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/4693169/storage/forms 1407187537538 Sync.Engine.Forms INFO Uploading 100 - 200 out of 3596 records 1407187537538 Sync.BrowserIDManager DEBUG _ensureValidToken already has one 1407187537541 Sync.Collection DEBUG POST Length: 30093 1407187547388 Sync.Collection DEBUG mesg: POST fail 502 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/4693169/storage/forms 1407187547388 Sync.Collection DEBUG POST fail 502 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/4693169/storage/forms 1407187547389 Sync.Engine.Forms DEBUG Uploading records failed: <html> <head><title>502 Bad Gateway</title></head> <body bgcolor="white"> <center><h1>502 Bad Gateway</h1></center> <hr><center>nginx/1.6.0</center> </body> </html> 1407187547455 Sync.Status DEBUG Status for engine forms: error.engine.reason.record_upload_fail 1407187547455 Sync.Status DEBUG Status.service: success.status_ok => error.sync.failed_partial 1407187547455 Sync.ErrorHandler DEBUG forms failed: <html> <head><title>502 Bad Gateway</title></head> <body bgcolor="white"> <center><h1>502 Bad Gateway</h1></center> <hr><center>nginx/1.6.0</center> </body> </html> No traceback available 1407187547455 Sync.Synchronizer INFO Aborting sync for failure in forms 1407187547455 Sync.Synchronizer INFO Sync completed at 2014-08-04 23:25:47 after 178.54 secs. 1407187547456 Sync.Declined DEBUG Handling remote declined: [] 1407187547456 Sync.Declined DEBUG Handling local declined: [] 1407187547456 Sync.Declined DEBUG Declined changed? false 1407187547456 Sync.Service INFO No change to declined engines. Not reuploading meta/global. 1407187547457 Sync.SyncScheduler DEBUG Starting client-initiated backoff. Next sync in 4573852 ms. 1407187547457 Sync.SyncScheduler DEBUG Next sync in 4573852 ms. 1407187547457 Sync.ErrorHandler DEBUG Some engines did not sync correctly.
Comment 18•10 years ago
|
||
I'm seeing this problem too, on an OSX machine and and 2 Win7 machines. Node is sync-40-us-west-2 same as Michal. 1407188151148 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151149 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151149 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151149 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151149 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151149 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151149 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151150 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151150 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151150 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188151151 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188241148 Sync.Service DEBUG User-Agent: Firefox/31.0 FxSync/1.33.0.20140716183446. 1407188241149 Sync.Service INFO Starting sync at 2014-08-05 07:37:21 1407188241160 Sync.SyncScheduler DEBUG Clearing sync triggers and the global score. 1407188241161 Sync.Status INFO Resetting Status. 1407188241161 Sync.Status DEBUG Status.service: success.status_ok => success.status_ok 1407188241161 Sync.BrowserIDManager DEBUG _ensureValidToken already has one 1407188301738 Sync.Resource DEBUG mesg: GET fail 504 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/1843843/info/collections 1407188301738 Sync.Resource DEBUG GET fail 504 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/1843843/info/collections 1407188301739 Sync.SyncScheduler DEBUG Starting client-initiated backoff. Next sync in 1225409 ms. 1407188301739 Sync.SyncScheduler DEBUG Next sync in 1225409 ms.
Comment 19•10 years ago
|
||
At this point I consider sync non-functional and in outage.
Comment 20•10 years ago
|
||
:michal Ops is working on it. Usually the first thing is to move sync accounts off of the problematic nodes: 39 and 40. After service is restored, we should investigate route cause. I got moved to sync-78 and I'm not longer having any issues.
Updated•10 years ago
|
Component: General → Operations
Updated•10 years ago
|
Assignee: nobody → bobm
Status: REOPENED → ASSIGNED
Comment 21•10 years ago
|
||
Related to the cause but more about the sync UI - can we do something about sync error bar popping up constantly? It's annoying and obscures webpage and having to to close it again and again...
Comment 22•10 years ago
|
||
Still having problems here. Seems like the 502 error has dissapeared from the logs, but the graybar with the error is still appearing. Error trace is the following: 1407255017642 Sync.Collection DEBUG SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data (resource://services-sync/record.js:77:4) JS Stack trace: deserialize@record.js:77:5 < Collection.prototype.recordHandler/this._onProgress@record.js:624:9 < Channel_onDataAvail@resource.js:542:7 < waitForSyncCallback@async.js:99:7 < Res__request@resource.js:389:7 < Res_get@resource.js:413:5 < SyncEngine.prototype._processIncoming@engines.js:1069:11 < SyncEngine.prototype._sync@engines.js:1481:7 < WrappedNotify@util.js:141:15 < Engine.prototype.sync@engines.js:655:5 < _syncEngine@enginesync.js:199:7 < sync@enginesync.js:149:13 < onNotify@service.js:1254:7 < WrappedNotify@util.js:141:15 < WrappedLock@util.js:96:9 < _lockedSync@service.js:1248:1 < sync/<@service.js:1239:7 < WrappedCatch@util.js:70:9 < sync@service.js:1227:5 1407255017643 Sync.Status DEBUG Status for engine tabs: error.engine.reason.unknown_fail 1407255017643 Sync.Status DEBUG Status.service: success.status_ok => error.sync.failed_partial 1407255017644 Sync.ErrorHandler DEBUG tabs failed: Error: NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS (resource://services-sync/resource.js:393:10) JS Stack trace: Res_get@resource.js:413:5 < SyncEngine.prototype._processIncoming@engines.js:1069:11 < SyncEngine.prototype._sync@engines.js:1481:7 < WrappedNotify@util.js:141:15 < Engine.prototype.sync@engines.js:655:5 < _syncEngine@enginesync.js:199:7 < sync@enginesync.js:149:13 < onNotify@service.js:1254:7 < WrappedNotify@util.js:141:15 < WrappedLock@util.js:96:9 < _lockedSync@service.js:1248:1 < sync/<@service.js:1239:7 < WrappedCatch@util.js:70:9 < sync@service.js:1227:5
Comment 23•10 years ago
|
||
I'm having this issue on multiple computers, both OS X and Windows. Been happening for about a week.
Comment 24•10 years ago
|
||
Well, I did not say anything. In the subsequent logs I received both 504 and 502 errors, too.
Comment 25•10 years ago
|
||
It would appear from the continuing reports, that the problem has been resolved for sync-39-us-west-2 but persists (or has appeared) on sync-40-us-west-2. Can we get the same love they got? :)
Comment 26•10 years ago
|
||
1407256751578 Sync.Engine.History INFO Uploading 100 - 200 out of 200 records 1407256751578 Sync.BrowserIDManager DEBUG _ensureValidToken already has one 1407256751583 Sync.Collection DEBUG POST Length: 55730 1407256812817 Sync.Collection DEBUG mesg: POST fail 504 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/1849639/storage/history 1407256812818 Sync.Collection DEBUG POST fail 504 https://sync-40-us-west-2.sync.services.mozilla.com/1.5/1849639/storage/history 1407256812818 Sync.Engine.History DEBUG Uploading records failed: <html> <head><title>504 Gateway Time-out</title></head> <body bgcolor="white"> <center><h1>504 Gateway Time-out</h1></center> <hr><center>nginx/1.6.0</center> </body> </html> 1407256812824 Sync.Status DEBUG Status for engine history: error.engine.reason.record_upload_fail 1407256812824 Sync.Status DEBUG Status.service: success.status_ok => error.sync.failed_partial 1407256812824 Sync.ErrorHandler DEBUG history failed: <html> Yup, same here, 504 https://sync-40-us-west-2.sync.services.mozilla.com.
Assignee | ||
Comment 27•10 years ago
|
||
We'll be rolling out some node maintenance today (I'm looking at you sync-40). As well as making some operational improvements, that should alleviate the 502 and 504 errors in general, later this week.
Updated•10 years ago
|
Mentor: rnewman
Updated•10 years ago
|
Mentor: rnewman
Comment 31•10 years ago
|
||
As of this morning, the sync errors seem to have been eliminated. Am I being overly optimistic?
Comment 32•10 years ago
|
||
In the morning I got a complete sync (the first in the week), but an hour ago I received both 502 and 504 errors again.
Comment 33•10 years ago
|
||
I don't see warning either, but sync fails every now and then. Yesterday it failed every time, today only five times, so there's some progress. I can also see history and settings synchronized. Time whehn synchronization failed in CEST: 12:08:28 AM 12:27:42 AM 1:04:29 AM 4:57:15 PM 5:35:14 PM
Comment 34•10 years ago
|
||
I'm cautiously optimistic - last error logged was at 1:55pm CDT (1855Z) yesterday, and curiously that was a 401, not a 502 or 504. No sign of any popups or logged errors since then. Thanks!
Assignee | ||
Comment 35•10 years ago
|
||
The server side issues have been resolved. There may be occaisional load related errors, however the servers have been configured to convert these errors into a response that the Sync client should deal with gracefully (not cause a popup or error bar), and scale back traffic if necessary. Please note, a 401 isn't the sign of a failed Sync, it just indicates that the client should fetch a new authentication token, and perhaps new Sync node. For more information see: https://docs.services.mozilla.com/storage/apis-1.5.html Operations would like to close this bug. Anyone still seeing an error bar?
Reporter | ||
Comment 36•10 years ago
|
||
Looks good to me. Thanks.
Comment 37•10 years ago
|
||
Looking good from here. Thanks for your efforts.
Comment 38•10 years ago
|
||
:bobm thumbsup mate, resolution looks good to me.
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Comment 40•10 years ago
|
||
Fixed, thank you.
You need to log in
before you can comment on or make changes to this bug.
Description
•