Sync not working on Firefox 29 (it's blue and continuously rolling) after update to the new Firefox Sync

RESOLVED WORKSFORME

Status

()

defect
P2
critical
RESOLVED WORKSFORME
5 years ago
a year ago

People

(Reporter: b5b5b5b5, Assigned: markh)

Tracking

29 Branch
x86_64
All
Points:
---

Firefox Tracking Flags

(firefox29 wontfix, firefox30+ fixed, firefox31+ fixed, firefox32+ fixed)

Details

(Whiteboard: [qa+] )

Attachments

(10 attachments, 1 obsolete attachment)

Reporter

Description

5 years ago
After update to the new ver. of Firefox's Sync in FF29 (and account created and verified!!!), the Sync is continuously rolling, doesn't stop even after a many hours of running.

First report describing this problem I've found: https://support.mozilla.org/en-US/questions/997761
Seems to be platform independent (Windows, Linux).
Can you upload your sync log.  You can find here:
1. in Firefox go to
about:sync-log
2. copy the text file on the day that you start Firefox Account Sync and add it as attachment to this bug.  

For other debugging tips look here:
https://wiki.mozilla.org/User_Services/Sync/Debug_Sync
Reporter

Comment 2

5 years ago
Attaching the logs (in zip) from two stations of mine:
1. 01-main_station-sync-log directory has logs from the station, where registration with two sync accounts were made.
2. 02-client_joining-sync-log directory has logs from another client station joining the sync account.
here are some of the errors:
client A:
>> some sync 1.1 reminants:
1399365405549	Sync.ErrorHandler	DEBUG	Flushing file log.
1399365405550	Sync.BrowserIDManager	ERROR	Failed to fetch a token for authentication: TokenServerClientError({"message":"assertion argument is not valid."})
1399365405554	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.network due to previous failure
1399365405554	Sync.Status	DEBUG	Status.login: error.login.reason.network => error.login.reason.network
1399365405554	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1399365405556	Sync.ErrorHandler	DEBUG	Log cleanup threshold time: 1398501405556
1399365405556	Sync.ErrorHandler	DEBUG	No logs to clean up.
1399365405909	Sync.Resource	DEBUG	mesg: DELETE fail 401 https://sync-7-us-west-2.sync.services.mozilla.com/1.5/512680/storage/tabs/Wu9ejxrR4yv0
1399365405909	Sync.Resource	DEBUG	DELETE fail 401 https://sync-7-us-west-2.sync.services.mozilla.com/1.5/512680/storage/tabs/Wu9ejxrR4yv0
1399365405909	Sync.Service	DEBUG	Finished deleting client data.
1399365405909	Sync.Service	INFO	Service.startOver dropping sync key and logging out.
1399365405909	Sync.Status	DEBUG	Status.login: error.login.reason.network => error.login.reason.no_recoverykey
1399365405909	Sync.Status	DEBUG	Status.service: error.login.failed => service.client_not_configured
1399365405909	Sync.Service	INFO	Logging out
1399365405911	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.network due to previous failure
1399365405911	Sync.Status	DEBUG	Status.login: error.login.reason.no_recoverykey => error.login.reason.network
1399365405911	Sync.Status	DEBUG	Status.service: service.client_not_configured => error.login.failed

I also see a lot of this:
1399364830730	Sync.Service	DEBUG	User-Agent: Firefox/29.0 FxSync/1.31.0.20140430131441.
1399364830730	Sync.Service	INFO	Starting sync at 2014-05-06 10:27:10
1399364830730	Sync.Service	DEBUG	Exception: Could not acquire lock. Label: "service.js: sync". No traceback available
1399364830730	Sync.Service	INFO	Cannot start sync: already syncing?
1399364831674	Sync.Tracker.Bookmarks	DEBUG	Saving changed IDs to bookmarks
1399364832357	Sync.SyncScheduler	DEBUG	Global Score threshold hit, triggering sync.
1399364832398	Sync.Service	DEBUG	User-Agent: Firefox/29.0 FxSync/1.31.0.20140430131441.

client B
1399384520282	Sync.Service	DEBUG	Exception: Could not acquire lock. Label: "service.js: sync". No traceback available
1399384520282	Sync.Service	INFO	Cannot start sync: already syncing?
1399384521181	Sync.Tracker.History	DEBUG	Saving changed IDs to history
1399385174319	Sync.Status	DEBUG	Status for engine bookmarks: error.engine.reason.unknown_fail
1399385174319	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed_partial
1399385174319	Sync.ErrorHandler	DEBUG	bookmarks failed: App. Quitting Stack trace: checkAppReady/onQuitApplication/Async.checkAppReady()@resource://services-common/async.js:123 < waitForSyncCallback()@resource://services-common/async.js:98 < makeSpinningCallback/callback.wait()@resource://services-common/async.js:141 < _syncStart()@resource://gre/modules/services-sync/engines/bookmarks.js:376
Status: UNCONFIRMED → NEW
Ever confirmed: true
/cc :rnewman for sync log expertise
This seems like a strange sequence of actions:

"""
1399365405550	Sync.BrowserIDManager	ERROR	Failed to fetch a token for authentication: TokenServerClientError({"message":"assertion argument is not valid."})
1399365405554	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.network due to previous failure
1399365405554	Sync.Status	DEBUG	Status.login: error.login.reason.network => error.login.reason.network
1399365405554	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1399365405556	Sync.ErrorHandler	DEBUG	Log cleanup threshold time: 1398501405556
1399365405556	Sync.ErrorHandler	DEBUG	No logs to clean up.
1399365405909	Sync.Resource	DEBUG	mesg: DELETE fail 401 https://sync-7-us-west-2.sync.services.mozilla.com/1.5/512680/storage/tabs/Wu9ejxrR4yv0
1399365405909	Sync.Resource	DEBUG	DELETE fail 401 https://sync-7-us-west-2.sync.services.mozilla.com/1.5/512680/storage/tabs/Wu9ejxrR4yv0
1399365405909	Sync.Service	DEBUG	Finished deleting client data.
"""

A failure to fetch a token, followed by a delete request?  (that fails with an auth error, perhaps because of the token that it failed to fetch?)
This unexpected, from the 02-client_joining-sync-log, first file:

"""
1399381642598   Sync.Status     DEBUG   Status.service: success.status_ok => success.status_ok
1399381642599   Sync.BrowserIDManager   INFO    Fetching assertion and token from: https://token.services.mozilla.com/1.0/sync/1.5
1399381642602   Sync.BrowserIDManager   DEBUG   Getting an assertion
1399381642635   Sync.BrowserIDManager   DEBUG   Getting a token
1399381642635   Sync.BrowserIDManager   ERROR   Non-authentication error in _fetchTokenForUser: assertion argument is not valid.
1399381642635   Sync.Status     DEBUG   Status.login: success.login => error.login.reason.network
1399381642635   Sync.Status     DEBUG   Status.service: success.status_ok => error.login.failed
"""

"assertion argument is not valid" is a new one on me.
/cc mark for perspective on this error mesage, the client seems to stay stuck in this "assertion argument is not valid" state for several attempts to sync.
Assignee

Comment 10

5 years ago
I believe "assertion argument is not valid" is coming from the token server (which obviously doesn't mean the client isn't doing something wrong.)  Chris, any ideas (eg, are there likely to be useful logs on the server?)
Also of note, the first device seems to have been previously attached to an old-sync account, the first error log shows (failing) attempts to sync to https://phx-sync570.services.mozilla.com.

OP, can you please tell us more about the profile on this first machine?  Was it previously connected to sync?  Did you have to do anything in particular to get it talking to the new sync system, or just follow the "setup sync" menu and prompts?
AFAICT it's from the getTokenFromBrowserIDAssertion function, before it ever hits the wire to the tokenserver:

    """
    getTokenFromBrowserIDAssertion:
      function getTokenFromBrowserIDAssertion(url, assertion, cb, addHeaders={}) {
      if (!url) {
        throw new TokenServerClientError("url argument is not valid.");
      }
      if (!assertion) {
        throw new TokenServerClientError("assertion argument is not valid.");
      }
    """

Both clients seem to be hitting this error state.
Also, I don't see logging information from fxa in the traces here, it goes from "Getting an assertion" to "Getting a token" with no log lines in between.  Do we have to tweak some additional settings to enable fxa-related debug output?  If so then we should do an updated "how to file a good sync bug" guide with this info.
Flags: needinfo?(mhammond)
Assignee

Comment 14

5 years ago
(In reply to Ryan Kelly [:rfkelly] from comment #13)
> Also, I don't see logging information from fxa in the traces here, it goes
> from "Getting an assertion" to "Getting a token" with no log lines in
> between.  Do we have to tweak some additional settings to enable fxa-related
> debug output?  If so then we should do an updated "how to file a good sync
> bug" guide with this info.

I think we need to tweak the logging code itself to ensure the FxA stuff does go to the sync logs by default - IIUC, there is nothing the user can do to arrange that at the moment.  I opened bug 1006943.

(In reply to Ryan Kelly [:rfkelly] from comment #12)
> AFAICT it's from the getTokenFromBrowserIDAssertion function, before it ever
> hits the wire to the tokenserver:

Doh, yes.  The 2 obvious cases that return null there are no user being signed in, or the user not being verified - but we shouldn't be able to get to that code in both those cases (unless we logout during sync - see below!).  A request to /certificate/sign might also return a null certificate, but I'm not sure if that's possible.

So - looking at the logs in detail:  error-1399365404947.txt:
...
1399361287295	Sync.BrowserIDManager	DEBUG	Getting a token
1399361289507	Sync.BrowserIDManager	DEBUG	Successfully got a sync token
1399361289566	Sync.BulkKeyBundle	INFO	BulkKeyBundle being created for undefined
... so we seem to have fetched the token again (the previous log also showed success doing this) and requests are working with it:
1399361293278	Sync.BrowserIDManager	DEBUG	_ensureValidToken already has one
1399361293482	Sync.Resource	DEBUG	mesg: GET success 200 https://sync-7-us-west-2.sync.services.mozilla.com/1.5/512680/info/collections

... and after that, it looks as though sync itself might be seeing incoming changes as tripping the Global Score threshold and causing sync to try again - that seems sub-optimal, but I'm not sure it is a real problem.

1399364809310	Sync.Tracker.History	DEBUG	Saving changed IDs to history
1399364830730	Sync.SyncScheduler	DEBUG	Global Score threshold hit, triggering sync.
1399364830730	Sync.Service	DEBUG	User-Agent: Firefox/29.0 FxSync/1.31.0.20140430131441.
1399364830730	Sync.Service	INFO	Starting sync at 2014-05-06 10:27:10
1399364830730	Sync.Service	DEBUG	Exception: Could not acquire lock. Label: "service.js: sync". No traceback available

... this repeats lots of times, until:

1399365404937	Sync.BrowserIDManager	DEBUG	observed fxaccounts:onlogout
1399365404937	Sync.AddonsReconciler	DEBUG	Removing change listener.
1399365404937	Sync.AddonsReconciler	DEBUG	Stopping listening and removing AddonManager listeners.
1399365404937	Sync.Tracker.History	INFO	Removing Places observer.
1399365404938	Sync.Status	INFO	Resetting Status.
1399365404938	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1399365404939	Sync.BrowserIDManager	INFO	Fetching assertion and token from: https://token.services.mozilla.com/1.0/sync/1.5
1399365404944	Sync.BrowserIDManager	DEBUG	Getting an assertion
1399365404945	Sync.BrowserIDManager	DEBUG	Getting a token
1399365404946	Sync.BrowserIDManager	ERROR	Non-authentication error in _fetchTokenForUser: assertion argument is not valid.

note the "onlogout" - so it looks as though the user is logging our *during* a sync, which we do not guard against and would explain this.

OP: do you remember explicitly logging out?
Flags: needinfo?(mhammond) → needinfo?(b5b5b5b5)
Reporter

Comment 15

5 years ago
(In reply to Ryan Kelly [:rfkelly] from comment #11)
> Also of note, the first device seems to have been previously attached to an
> old-sync account, the first error log shows (failing) attempts to sync to
> https://phx-sync570.services.mozilla.com.
> 
> OP, can you please tell us more about the profile on this first machine? 
> Was it previously connected to sync?  Did you have to do anything in
> particular to get it talking to the new sync system, or just follow the
> "setup sync" menu and prompts?

Yes, it was previously connected to the previous sync version.
I mentioned this in the very first post "After update to the new ver. of Firefox's Sync in FF29 (and account created and verified!!!)" when I opened this bug.
What have I done was (as I did found in Mozilla's HOWTO on the web, about how to update to the new sync):
- unlink (disconnect) the previous (old) sync version.
- afterwards, simply opened the new "setup sync" menu and prompts, have registered my email, confirmed it, and then it started spinning continuously.
- nothing special has been made, simply followed the menu and prompts of the new sync registration.




(In reply to Mark Hammond [:markh] from comment #14)
> 1399365404937	Sync.BrowserIDManager	DEBUG	observed fxaccounts:onlogout
> 1399365404937	Sync.AddonsReconciler	DEBUG	Removing change listener.
> 1399365404937	Sync.AddonsReconciler	DEBUG	Stopping listening and removing
> AddonManager listeners.
> 1399365404937	Sync.Tracker.History	INFO	Removing Places observer.
> 1399365404938	Sync.Status	INFO	Resetting Status.
> 1399365404938	Sync.Status	DEBUG	Status.service: success.status_ok =>
> success.status_ok
> 1399365404939	Sync.BrowserIDManager	INFO	Fetching assertion and token from:
> https://token.services.mozilla.com/1.0/sync/1.5
> 1399365404944	Sync.BrowserIDManager	DEBUG	Getting an assertion
> 1399365404945	Sync.BrowserIDManager	DEBUG	Getting a token
> 1399365404946	Sync.BrowserIDManager	ERROR	Non-authentication error in
> _fetchTokenForUser: assertion argument is not valid.
> 
> note the "onlogout" - so it looks as though the user is logging our *during*
> a sync, which we do not guard against and would explain this.
> 
> OP: do you remember explicitly logging out?

Well, I've been doing some "testing".
Have tried two different email accounts for registration with the new sync, to see whether anything unusual will happen.
Have also made few DISCONNECT and JOIN (via "Already have an account? Sign in") it again to existing accounts of mine for the same testing purposes.
So, this is the only way it could have been logged out, via the disconnect button.
Flags: needinfo?(b5b5b5b5)
Reporter

Comment 16

5 years ago
(In reply to 8472 from comment #15)
> (In reply to Mark Hammond [:markh] from comment #14)
> > 1399365404937	Sync.BrowserIDManager	DEBUG	observed fxaccounts:onlogout
> > 1399365404937	Sync.AddonsReconciler	DEBUG	Removing change listener.
> > 1399365404937	Sync.AddonsReconciler	DEBUG	Stopping listening and removing
> > AddonManager listeners.
> > 1399365404937	Sync.Tracker.History	INFO	Removing Places observer.
> > 1399365404938	Sync.Status	INFO	Resetting Status.
> > 1399365404938	Sync.Status	DEBUG	Status.service: success.status_ok =>
> > success.status_ok
> > 1399365404939	Sync.BrowserIDManager	INFO	Fetching assertion and token from:
> > https://token.services.mozilla.com/1.0/sync/1.5
> > 1399365404944	Sync.BrowserIDManager	DEBUG	Getting an assertion
> > 1399365404945	Sync.BrowserIDManager	DEBUG	Getting a token
> > 1399365404946	Sync.BrowserIDManager	ERROR	Non-authentication error in
> > _fetchTokenForUser: assertion argument is not valid.
> > 
> > note the "onlogout" - so it looks as though the user is logging our *during*
> > a sync, which we do not guard against and would explain this.
> > 
> > OP: do you remember explicitly logging out?
> 
> Well, I've been doing some "testing".
> Have tried two different email accounts for registration with the new sync,
> to see whether anything unusual will happen.
> Have also made few DISCONNECT and JOIN (via "Already have an account? Sign
> in") it again to existing accounts of mine for the same testing purposes.
> So, this is the only way it could have been logged out, via the disconnect
> button.


Here you have attached one more recent sync-log.
What I've made in order to get it:
- disconnected previously joined session & continuously spinning sync wheel (note, that no sync-log entry has been made at all).
- restarted browser (note, that no sync-log entry has been made at all).
- joined existing sync account again, the sync wheel started continuously spinning again (note, that no sync-log entry has been made at all).
- only after I restarted the browser again, the sync-log finally wrote down something, created three new txt files I'm providing here now.
Flags: firefox-backlog+
:8472 by default sync logs are not created unless you have errors.  You must set this to True in about:config
 services.sync.log.appender.file.logOnSuccess

FYI, I launched Firefox today and my sync spinner was blue for about 60 sec.  But no errors were reported.
Reporter

Comment 18

5 years ago
(In reply to Edwin Wong [:edwong] from comment #17)
> :8472 by default sync logs are not created unless you have errors.  You must
> set this to True in about:config
>  services.sync.log.appender.file.logOnSuccess
> 
> FYI, I launched Firefox today and my sync spinner was blue for about 60 sec.
> But no errors were reported.

Ok, I was merely curious, that nothing else has been logged.
Your explanation is enough.


Well, I don't know how else to help you to emulate this problem.
But if you check the two support.mozzila.org URLs mentioned previously, at least 35 people confirmed this problem too (those which voted via the button "I have this problem, too").
Meaning, there could be much, much more of affected users, than just those which added their "vote" to having problem.

If you say, that you got no error recorded, perhaps try what I've noticed and mentioned before.
Restart your browser.
It happened to me today, when I checked it again, that no errors were written in the logs, though the sync-wheel has been spinning forever, not until I restarted the browser. Right after it went down, new log files appeared in the directory instantly.

What else can I say about my sync env. for the emulating purposes:
- I have a quite lot of bookmarks (don't have exact numbers, but it might be 100-200, perhaps more, really don't know).
- I also have lot of passwords stored in there.
- and of course, history and add-ons to be synchronized as well.
Don't know if this will help you anyhow.
> If you say, that you got no error recorded, perhaps try what I've noticed and mentioned before.
> Restart your browser.
> It happened to me today, when I checked it again, that no errors were written in the logs, though the
> sync-wheel has been spinning forever, not until I restarted the browser. Right after it went down,
> new log files appeared in the directory instantly.

Yep, I believe the log file is only flushed at the end of a sync.  So if it's hanging forever, it won't write until you force it to finish by e.g. restarting the browser.  This also explains the "app is quitting" error messages in some of the logs, if they're being flushed by a restart.
Assignee

Comment 20

5 years ago
(In reply to 8472 from comment #16)
> Here you have attached one more recent sync-log.

Thanks!  However, these logs don't show anything problematic - they do have some noise due to shutting down during a sync, but it looks like there is alot of stuff to sync, and I see no evidence that these syncs wouldn't have completed.
Reporter

Comment 21

5 years ago
(In reply to Mark Hammond [:markh] from comment #20)
> (In reply to 8472 from comment #16)
> > Here you have attached one more recent sync-log.
> 
> Thanks!  However, these logs don't show anything problematic - they do have
> some noise due to shutting down during a sync, but it looks like there is
> alot of stuff to sync, and I see no evidence that these syncs wouldn't have
> completed.

Well, these were a fresh logs from a client I disconnected, and joined again.
I kept it running for about hour (the sync-wheel has been spinning all the time after i rejoined it), and only then restarted it.
So, if it does not show anything worth in my logs, I don't know, it's your product guys, you should know better why it ain't logging any errors while sync-wheel is spinning forewer.
Perhaps the date & time I mentioned of logs creation (20140507 11:37 CET) might help you too, to check on the server side, whether anything has happened in there.
Assignee

Comment 22

5 years ago
Closer examination of the logs shows that the "assertion argument is not valid." message doesn't seem a real problem - when sync does a startOver after a "disconnect", we attempt to fetch a token so we can delete the client record, and this is what fails - we expect this to fail occasionally and handle it fine (although the log messages are unfortunate).  I'm guessing the other users who see this error are also doing a "disconnect" in an attempt to stop it synching.

The real problem seems to be that sync is hanging - error-1399361271582.txt shows a complete sync which demonstrates an outrageous amount of time between sync starting and the user shutting down before it completed.  I'll try and analyze the logs a little more tomorrow to see if I can determine what engine is causing this and why.

(As a side-note, I wonder if users would have noticed the same issue in the old sync, or if it the new UI in 29 making it more obvious - which might imply this issue exists before 29.)
Mark, what do you make of this snippet from the most recent log?

"""
1399451898470   Sync.Engine.Bookmarks   DEBUG   Engine syncIDs: gUu6k4Rqowzl,W8uK6EYQE8DY
1399451898470   Sync.Engine.Bookmarks   DEBUG   Resetting bookmarks last sync time
1399451898471   Sync.Engine.Bookmarks   DEBUG   First sync, uploading all items
1399451898856   Sync.Tracker.Clients    DEBUG   Saving changed IDs to clients
1399451900879   Sync.Engine.Bookmarks   INFO    6508 outgoing items pre-reconciliation
1399451901880   Sync.Tracker.Bookmarks  DEBUG   Saving changed IDs to bookmarks
1399452307093   Sync.SyncScheduler      DEBUG   Next sync in 90000 ms.
1399452397093   Sync.Service    DEBUG   User-Agent: Firefox/29.0 FxSync/1.31.0.20140430131441.
1399452397093   Sync.Service    INFO    Starting sync at 2014-05-07 10:46:37
1399452397093   Sync.Service    DEBUG   Exception: Could not acquire lock. Label: "service.js: sync". No traceback available
1399452397094   Sync.Service    INFO    Cannot start sync: already syncing?
"""

It says it has 6508 outgoing items for bookmarks, but doesn't appear to actually send them, it just schedules a new sync for 90s.  Which does indeed start 90s later, and promptly fails because there's already a sync in progress.

Bob, can you please check for nginx logs (approximate unixtimestamp of 1399451898) with the following URL prefix? 

  https://sync-7-us-west-2.sync.services.mozilla.com/1.5/531316/storage/bookmarks

There are no client-side logs of such a request, but maybe we have a record of a failed request on the server.

:8472 , as a workaround you could try disabling bookmarks syncing (in the sync menu dialog, there are checkboxes for the different datatypes) and see if the other types will then sync successfully.  This might at least get you some sort of syncing happening, as well as narrow down whether the issue is indeed in the bookmarks sync engine.
Flags: needinfo?(bobm)
Reporter

Comment 24

5 years ago
(In reply to Mark Hammond [:markh] from comment #22)
> (As a side-note, I wonder if users would have noticed the same issue in the
> old sync, or if it the new UI in 29 making it more obvious - which might
> imply this issue exists before 29.)

Before updating to new version of sync once it has been released in 29, I was running the new FF29 for about a week with the old version of sync, and there were no problems with it, worked fine.
Only after I updated to new sync in the 29, then these problems started.
Reporter

Comment 25

5 years ago
(In reply to Ryan Kelly [:rfkelly] from comment #23)
> It says it has 6508 outgoing items for bookmarks, but doesn't appear to
> actually send them, it just schedules a new sync for 90s.  Which does indeed
> start 90s later, and promptly fails because there's already a sync in
> progress.
> 

Oh my, 6508 items in my bookmarks?
Never realized I could have that much of them. Does it count the History items as a bookmarks too? Because there I do have quite a lot of them, which could make this nr. so high.


(In reply to Ryan Kelly [:rfkelly] from comment #23)
> :8472 , as a workaround you could try disabling bookmarks syncing (in the
> sync menu dialog, there are checkboxes for the different datatypes) and see
> if the other types will then sync successfully.  This might at least get you
> some sort of syncing happening, as well as narrow down whether the issue is
> indeed in the bookmarks sync engine.


Have tried your suggestion (no need to explain where do I find that checkboxes, I'm no newbie).
Have tested it with two different clients.
1st client - a laptop, had profile full of data from previous old sync. Most of the log data I provided you before were also from this one.
	- After I disabled sync of the bookmarks, I have initiated a sync, and after a minute or two of spinning, it ended.
	- Because this one client had already lot of data in the profile from previous old sync, I can't tell, whether sync is working at all.
	- Have also restarted the browser, while bookmarks sync were disabled, and nothing has been logged.
	- Afterwards, I've tried to reenable bookmarks sync again, and immeadiately after I checked that option on, it started spinning again, and is still spinning (would be now more than 1hour since it started).
	- No bookmarks have been synchronized so far.


2nd client - a virtual machine (VM), for testing purposes like this one.
	- Had absolutely no FF profile before, clean start.
	a)
		- After I joined it to my sync account, I've disabled sync of Tabs and Bookmarks immediately.
		- Sync wheel was spinning for a minute or two, but has nothing synchronized.
		- I have restarted the browser, and only then it started sync of add-ons, history, passwords, and so on.
		- It took a while, since all non-bookmarks data were loaded, but at least they were.
		- So, there is at least this problem, that synchronization of non-bookmarks data needed a restart of the browser, to realize, that it has to start downloading the sync data for the local profile.

	b)
		- Once non-bookmarks data were synchronized, I enabled the sync of bookmarks on the VM.
		- Started synchronizing immediately after I checked the option on, and the wheel is still spinning forewer.
		- No bookmarks have been synchronized so far - since it was a empty FF profile at the beginning, the bookmarks here are completely empty (I also removed the default bookmars of your company at the beginning).
(In reply to Ryan Kelly [:rfkelly] from comment #23)

> 1399451898856   Sync.Tracker.Clients    DEBUG   Saving changed IDs to clients
> 1399451900879   Sync.Engine.Bookmarks   INFO    6508 outgoing items
> pre-reconciliation
> 1399451901880   Sync.Tracker.Bookmarks  DEBUG   Saving changed IDs to
> bookmarks
> 1399452307093   Sync.SyncScheduler      DEBUG   Next sync in 90000 ms.

My guess is something vaguely like Bug 932759 -- we're starting the incoming part of the sync and stalling hard.

"Saving changed IDs" occurs on a timer, so it doesn't count.

The next sync is also fired on a timer, so that doesn't count either.

The "6508 outgoing items" is the last log item in syncStartup:

      this._syncStartup();
      Observers.notify("weave:engine:sync:status", "process-incoming");
      this._processIncoming();

Nothing non-trace will be printed before a bunch of other work happens, including network fetching, and -- for bookmarks -- computing the GUID map, which involves a lot of Places stuff that could throw or misbehave.

A bookmarks Trace log would help.
(In reply to Mark Hammond [:markh] from comment #22)

> (As a side-note, I wonder if users would have noticed the same issue in the
> old sync, or if it the new UI in 29 making it more obvious - which might
> imply this issue exists before 29.)

We've seen this occasionally. E.g., Bug 665872.
Reporter

Comment 28

5 years ago
(In reply to Richard Newman [:rnewman] from comment #26)
> A bookmarks Trace log would help.

I presume, that this was meant for me?
So, I've set the "services.sync.log.logger.engine.bookmarks" for Trace.
Reenabled bookmarks sync, waited a while, and restarted the browser, in order to get the logs.
Here attached you have those bookmarks trace logs.

Comment 29

5 years ago
Hi,

Sorry to interrupt. A user on the SUMO forums is having the same issue. https://support.mozilla.org/en-US/questions/999480

Should that user create a new bug?
(In reply to Ryan Kelly [:rfkelly] from comment #23) 
> Bob, can you please check for nginx logs (approximate unixtimestamp of
> 1399451898) with the following URL prefix? 
> 
>  
> https://sync-7-us-west-2.sync.services.mozilla.com/1.5/531316/storage/
> bookmarks
This is the closest entry to that timestamp.  
A.B.C.D [2014-05-07T08:55:35+00:00] "POST /1.5/531316/storage/bookmarks HTTP/1.1" 200 78 2727 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.588 0.588 "200" 

There are many more successful requests by this user, but I believe those aren't as interesting.  Here are the most recent errors:
A.B.C.D [2014-05-06T09:23:57+00:00] "GET /1.5/531316/storage/meta/global HTTP/1.1" 404 1 710 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.008 0.008 "404" 
A.B.C.D [2014-05-06T13:07:23+00:00] "DELETE /1.5/531316/storage/clients/XXXXXXXXX HTTP/1.1" 401 1 404 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.003 0.003 "401" 
A.B.C.D [2014-05-06T13:07:23+00:00] "DELETE /1.5/531316/storage/tabs/XXXXXXXXX HTTP/1.1" 401 1 401 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.003 0.003 "401" 
A.B.C.D [2014-05-06T13:11:09+00:00] "DELETE /1.5/531316/storage/tabs/XXXXXXXXX HTTP/1.1" 404 1 719 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.006 0.006 "404" 
A.B.C.D [2014-05-06T13:30:44+00:00] "DELETE /1.5/531316/storage/tabs/XXXXXXXXX HTTP/1.1" 404 1 719 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.006 0.006 "404"

A.B.C.D [2014-05-07T08:37:13+00:00] "DELETE /1.5/531316/storage/tabs/XXXXXXXXX HTTP/1.1" 404 1 719 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.005 0.005 "404" 
A.B.C.D [2014-05-07T15:27:30+00:00] "POST /1.5/531316/storage/history HTTP/1.1" 503 62 7209 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.007 0.007 "503" 
A.B.C.D [2014-05-07T16:42:49+00:00] "POST /1.5/531316/storage/history HTTP/1.1" 503 62 4515 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.007 0.007 "503" 
A.B.C.D [2014-05-07T17:19:44+00:00] "POST /1.5/531316/storage/history HTTP/1.1" 503 62 74708 "Firefox/29.0 FxSync/1.31.0.20140430131441.desktop" 0.396 0.012 "503"

The 503s are from MySQL pool exhaustion.  The client handles those gracefully, correct?
1399570150022   Sync.Tracker.Bookmarks  TRACE   Adding changed ID: b0PvfNs9P2lf, 0
1399570150022   Sync.Tracker.Bookmarks  TRACE   Adding changed ID: DEGJb5ogpVCF, 0
1399570150022   Sync.Tracker.Bookmarks  TRACE   Adding changed ID: LpFGRZBW7kyn, 0
1399570150023   Sync.Engine.Bookmarks   TRACE   Event: weave:engine:sync:error
1399570150023   Sync.Status     DEBUG   Status for engine bookmarks: error.engine.reason.unknown_fail
1399570150023   Sync.Status     DEBUG   Status.service: success.status_ok => error.sync.failed_partial
1399570150024   Sync.ErrorHandler       DEBUG   bookmarks failed: App. Quitting Stack trace: checkAppReady/onQuitApplication/Async.checkAppRead

... on that log, you quit Firefox before it was finished syncing -- half a second after it started. Give it a bit more time!


1399570253882   Sync.Resource   DEBUG   mesg: GET fail 503 https://sync-7-us-west-2.sync.services.mozilla.com/1.5/531316/storage/meta/global
1399570253882   Sync.Resource   DEBUG   GET fail 503 https://sync-7-us-west-2.sync.services.mozilla.com/1.5/531316/storage/meta/global
1399570253882   Sync.Service    DEBUG   Weave Version: 1.31.0 Local Storage: 5 Remote Storage:
1399570253882   Sync.Service    INFO    One of: no meta, no meta storageVersion, or no meta syncID. Fresh start needed.
1399570253882   Sync.Status     DEBUG   Status.sync: success.sync => error.sync.reason.metarecord_download_fail
1399570253882   Sync.Status     DEBUG   Status.service: success.status_ok => error.sync.failed
1399570253882   Sync.ErrorHandler       DEBUG   Got Retry-After: 1851
1399570253882   Sync.Status     DEBUG   Status.login: success.login => error.sync.reason.serverMaintenance

... this was server maintenance.
Reporter

Comment 32

5 years ago
(In reply to Richard Newman [:rnewman] from comment #31)
> 1399570150022   Sync.Tracker.Bookmarks  TRACE   Adding changed ID:
> b0PvfNs9P2lf, 0
> 1399570150022   Sync.Tracker.Bookmarks  TRACE   Adding changed ID:
> DEGJb5ogpVCF, 0
> 1399570150022   Sync.Tracker.Bookmarks  TRACE   Adding changed ID:
> LpFGRZBW7kyn, 0
> 1399570150023   Sync.Engine.Bookmarks   TRACE   Event:
> weave:engine:sync:error
> 1399570150023   Sync.Status     DEBUG   Status for engine bookmarks:
> error.engine.reason.unknown_fail
> 1399570150023   Sync.Status     DEBUG   Status.service: success.status_ok =>
> error.sync.failed_partial
> 1399570150024   Sync.ErrorHandler       DEBUG   bookmarks failed: App.
> Quitting Stack trace: checkAppReady/onQuitApplication/Async.checkAppRead
> 
> ... on that log, you quit Firefox before it was finished syncing -- half a
> second after it started. Give it a bit more time!


No.
Do you really think I'm that stupid?
I was away from the computer for more than 5 minutes.
And only then, after another few minutes I restarted the browser.
(In reply to 8472 from comment #32)

> Do you really think I'm that stupid?
> I was away from the computer for more than 5 minutes.
> And only then, after another few minutes I restarted the browser.

Those numbers on the left are the timestamps of when the log entry was written, in milliseconds. As you can see, Sync was still working at

1399570150022

right up until it found that the browser was quitting, two milliseconds later.
Reporter

Comment 34

5 years ago
(In reply to Richard Newman [:rnewman] from comment #33)
> (In reply to 8472 from comment #32)
> 
> > Do you really think I'm that stupid?
> > I was away from the computer for more than 5 minutes.
> > And only then, after another few minutes I restarted the browser.
> 
> Those numbers on the left are the timestamps of when the log entry was
> written, in milliseconds. As you can see, Sync was still working at
> 
> 1399570150022
> 
> right up until it found that the browser was quitting, two milliseconds
> later.


Well, then the trace/logging is somehow not working properly.
Because as I told you, I was more than 5 minutes away from the computer after I started that previous sync.
And once I got back, the sync wheel was still spinning, without any interruption from my side.
Also the log file from that time confirms the start and the end of it's sync:
{
> 1399569723685	Sync.Service	INFO	Starting sync at 2014-05-08 19:22:03 (CET)
...
> 1399570150044	Sync.Synchronizer	INFO	Sync completed at 2014-05-08 19:29:10 (CET) after 254.92 secs.
}



Here you have another bookmarks sync trace log.
This one has been running for aprox. 1h25min:
{
> 1399574100274	Sync.Service	INFO	Starting sync at 2014-05-08 20:35:00 (CET)
...
> 1399579284892	Sync.Synchronizer	INFO	Sync completed at 2014-05-08 22:01:24 (CET) after 5184.62 secs.
}
As you can see, I started the sync at 20:35:00 (CET), and left it running for 1h25min, again, without any interruption from my side.
Please note, that there are newly many events in the log like "Exception: Could not acquire lock. Label: "service.js: sync". No traceback available ; Cannot start sync: already syncing?" in between, which ended only shortly before I initiated restart of browser at 22:01:16 (CET).
So that logging is coming from syncCleanup (which ends up being called after quit begins), which means we're stalling some time after syncStartup and before the bookmarks sync finishes.

Presumably that's inside processIncoming, because there's no other log output, and quite probably the network fetch.

Back to Mark so I can get back to my review queue :D
Assignee

Comment 36

5 years ago
I'm relatively confident the issue is that the bookmarks backup is failing, and the sync code isn't handling that failure.  By simulating an error immediately after the bookmarks backup I can reproduce the problem and the logs in that case are consistent with the uploaded logs.

This patch fixes that - it will not fix the underlying problem causing the backups to fail (and I have no idea what might be causing that - disk-space?) but will log the error and allow sync to continue.

It's not 100% clear that we should just ignore this backup failure, but I'm even less clear we should abort bookmark syncing due to it.  rnewman seems to lean towards ignoring the error, so that's what this patch does (it logs it, but still ignores it)

Getting this into 30 ASAP would make sense, then hopefully we can (a) get confirmation it fixes the problem and (b) get logs that can tell us details of the failure so we can decide how to move further forward.
Assignee: nobody → mhammond
Status: NEW → ASSIGNED
Attachment #8419846 - Flags: review?(rnewman)
Assignee

Comment 37

5 years ago
I moved where the new log messages are generated - the old patch would log "doing backups" and "did backups" even if they weren't actually done.  Now they are logged only when it actually happens.
Attachment #8419846 - Attachment is obsolete: true
Attachment #8419846 - Flags: review?(rnewman)
Attachment #8419848 - Flags: review?(rnewman)
Comment on attachment 8419848 [details] [diff] [review]
0001-Bug-1006360-prevent-failures-backing-up-bookmarks-fr.patch

Review of attachment 8419848 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/sync/modules/engines/bookmarks.js
@@ +355,3 @@
>        }
> +    }.bind(this)).then(
> +      () => cb(),

.then(cb, ex => {
  ...

@@ +355,5 @@
>        }
> +    }.bind(this)).then(
> +      () => cb(),
> +      ex => {
> +        // failure to create a backup is somewhat bad, but probably not bad

Nit: "Failure"

@@ +359,5 @@
> +        // failure to create a backup is somewhat bad, but probably not bad
> +        // enough to prevent syncing of bookmarks - so just log the error and
> +        // continue.
> +        this._log.warn("Got exception \"" + Utils.exceptionStr(ex) +
> +                       "\" backing up bookmarks, but continuiing with sync.");

s/ii/i

Consider filing a follow-up to report something in telemetry, FHR, *something*. We don't really know how widespread this kind of database corruption is, but I'll bet it's more common than we think.

(I heard from one user that their bookmarks had been screwed up ever since they used a Yahoo-branded Firefox one time.)

@@ +360,5 @@
> +        // enough to prevent syncing of bookmarks - so just log the error and
> +        // continue.
> +        this._log.warn("Got exception \"" + Utils.exceptionStr(ex) +
> +                       "\" backing up bookmarks, but continuiing with sync.");
> +        cb()

Nit: trailing semicolon.
Attachment #8419848 - Flags: review?(rnewman) → review+
Assignee

Comment 39

5 years ago
(In reply to Richard Newman [:rnewman] from comment #38)

> Consider filing a follow-up to report something in telemetry, FHR,
> *something*. We don't really know how widespread this kind of database
> corruption is, but I'll bet it's more common than we think.

Good idea - but I'm not sure what the bug would say yet, so let's wait and see if we can get logs once this patch is in place and have a better idea of the underlying problem.

Seeing this patch is somewhat speculative, I've flagged as leave-open so we can verify the work-around is effective and decide what followups are needed.

https://hg.mozilla.org/integration/fx-team/rev/34b0645a3ad0
Keywords: leave-open
Assignee

Comment 40

5 years ago
Comment on attachment 8419848 [details] [diff] [review]
0001-Bug-1006360-prevent-failures-backing-up-bookmarks-fr.patch

Landed on fx-team, but speculatively requesting approval on the assumption it will stick and make it to central, so we can ask users to try beta early next week.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Unknown
User impact if declined: Sync may never complete and bookmarks will not sync for an unknown subset of users.
Testing completed (on m-c, etc.): Has test coverage, manual testing.
Risk to taking this patch (and alternatives if risky): Low risk, limited to sync bookmarks
String or IDL/UUID changes made by this patch: None
Attachment #8419848 - Flags: approval-mozilla-beta?
Attachment #8419848 - Flags: approval-mozilla-aurora?
Attachment #8419848 - Flags: approval-mozilla-beta?
Attachment #8419848 - Flags: approval-mozilla-beta+
Attachment #8419848 - Flags: approval-mozilla-aurora?
Attachment #8419848 - Flags: approval-mozilla-aurora+
Duplicate of this bug: 1007852
Whiteboard: [qa+]
:bobm has already fulfilled the needinfo here
Flags: needinfo?(bobm)
Assignee

Comment 46

5 years ago
The work-around has landed - it is certainly in the most recent nightly and I *think* it is in the most recent Aurora.  If anyone who experiences this problem is able to try Nightly or Aurora, and is able to upload their logs after trying it would be appreciated.  Note that I'm hoping sync will complete, so the logs I'm interested in that case will be "success" logs - but if it still fails, the failure logs would be great too.
Reporter

Comment 47

5 years ago
(In reply to Mark Hammond [:markh] from comment #46)
> The work-around has landed - it is certainly in the most recent nightly and
> I *think* it is in the most recent Aurora.  If anyone who experiences this
> problem is able to try Nightly or Aurora, and is able to upload their logs
> after trying it would be appreciated.  Note that I'm hoping sync will
> complete, so the logs I'm interested in that case will be "success" logs -
> but if it still fails, the failure logs would be great too.


Have tried to get the recent versions:
- Aurora (https://www.mozilla.org/en-US/firefox/channel/#aurora), has a version 31.0a2 (2014-05-14).
- Nightly (https://nightly.mozilla.org/), has version 32.0a1 (2014-05-14).
Tested with both. But mainly with Aurora.
Just for sure, I've also enabled services.sync.log.logger.engine.bookmarks = Trace again.
I also created a new empty profile, to see how it works from the scratch.
Because previously with FF29 the bookmarks weren't uploaded, I had to "restore" them again into the profile, and sync it afterwards in Nightly (had not trace loggeed a single word), clearing the profile to new - empty one afterwards.
Testing on Aurora with the new - empty profile, once I logged to my sync profile, initial sync started, and the bookmarks I uplodaded previously in Nightly were downloaded very fast.
But the rest, especially the sync of add-ons needed a restart of browser to download these.


Your recent solution seem to be working for the bookmarks so far.
Only if perhaps the other sync stuff could also get synchronized after the first initial sync, which needed the mentioned restart of browser to realize it. Dunno, perhaps it's a feature rather than a bug?

Also, attached you'll find the new sync (trace) log (3MB large).
It is the only log I got, from the initial sync + restart of browser was required to be written down.
No other logs were created whatsoever, no matter how many times I restarted afterwards, this is the only one it provided.
Assignee

Comment 48

5 years ago
Thanks for the update!

(In reply to 8472 from comment #47)
> Created attachment 8422989 [details]
> error-1400141513252.txt.zip
 
> Have tried to get the recent versions:

The logs you uploaded confirm that it has the patch I was hoping to see, which is great!

> I also created a new empty profile, to see how it works from the scratch.
> Because previously with FF29 the bookmarks weren't uploaded, I had to
> "restore" them again into the profile, and sync it afterwards in Nightly
> (had not trace loggeed a single word), clearing the profile to new - empty
> one afterwards.

Unfortunately, that probably worked-around the initial problem I was trying to reproduce - we suspect some corruption of the database storing the bookmarks, and the process of restoring them would also have worked-around that problem.  I can verify this in the logs from the lines:

 1400141271120	Sync.Service	INFO	Starting sync at 2014-05-15 10:07:51
...
 1400141291175	Sync.Engine.Bookmarks	DEBUG	Bookmarks backup starting.
 1400141291323	Sync.Engine.Bookmarks	DEBUG	Bookmarks backup done.

If the bookmarks were corrupt and had the same problem as previous tries, I was expecting to see a report that the backups failed - but they worked fine here, and bookmarks sync completed without error:

 1400141342764	Sync.Engine.Bookmarks	TRACE	Finishing up sync
 1400141342889	Sync.Engine.Bookmarks	TRACE	Event: weave:engine:sync:finish

But that's great - bookmarks *did* complete for you.  However, history never completed in this case :(  Of interest from the logs:

 1400141347075	Sync.Tracker.History	DEBUG	Saving changed IDs to history
 1400141463824	Sync.Collection	DEBUG	mesg: GET success 200 https://.../storage/history?full=1&limit=5000
 1400141463825	Sync.Collection	DEBUG	GET success 200 https://.../storage/history?full=1&limit=5000
 1400141466664	Sync.Collection	DEBUG	mesg: GET success 200 https://.../storage/history?sort=index&limit=5000
 1400141466665	Sync.Collection	DEBUG	GET success 200 https://.../storage/history?sort=index&limit=5000
 1400141468659	Sync.Engine.History	ERROR	null

 1400141472428	Sync.Collection	DEBUG	mesg: GET success 200 https://.../storage/history?full=1&ids=5cQAkTRzVe8-,...
 1400141472428	Sync.Collection	DEBUG	GET success 200 https://.../history?full=1&ids=5cQAkTRzVe8-,...
1400141475801	Sync.Engine.History	ERROR	null

repeated a few times - although with different "ids=" in each case.  The "ERROR null" is somewhat strange - but even though this repeats, it's not clear there is a real problem - progress seems to be being made, until:

 1400141513181	Sync.Engine.History	WARN	Got exception [Exception... "App. Quitting"...applyIncomingBatch()...
...
 1400141513229	Sync.Synchronizer	INFO	Sync completed at 2014-05-15 10:11:53 after 224.56 secs.

So it *looks* to me that the app was quit 3 minutes after the sync started.  History was synching for ~2 minutes at this point, and it may well have completed - although I still suspect some degree of corruption in the places database is causing this to go slow and report those strange errors.

It would be very helpful if you can try again, but give the sync at least 20 minutes to complete - just so we can see if it does :)  Also, if it isn't too inconvenient, it would still be great to see the results from your initial profile which previously hung on bookmarks.

> Also, attached you'll find the new sync (trace) log (3MB large).
> It is the only log I got, from the initial sync + restart of browser was
> required to be written down.
> No other logs were created whatsoever, no matter how many times I restarted
> afterwards, this is the only one it provided.

This might just mean that sync never started in those other runs.  It might be helpful to force a sync (Tools->Sync Now) - note that on Windows, you can press and release the "Alt" key to show the menu bar so the Tools menu can be selected.

Thanks for you help in tracking this down!
Reporter

Comment 49

5 years ago
(In reply to Mark Hammond [:markh] from comment #48)
> 
> > I also created a new empty profile, to see how it works from the scratch.
> > Because previously with FF29 the bookmarks weren't uploaded, I had to
> > "restore" them again into the profile, and sync it afterwards in Nightly
> > (had not trace loggeed a single word), clearing the profile to new - empty
> > one afterwards.
> 
> Unfortunately, that probably worked-around the initial problem I was trying
> to reproduce - we suspect some corruption of the database storing the
> bookmarks, and the process of restoring them would also have worked-around
> that problem.  I can verify this in the logs from the lines:
...
> Also, if it isn't too inconvenient, it would still be great to see the results from your
> initial profile which previously hung on bookmarks.

Not quite true.
Before, while testing still on the affected FF29, I also created a completely new - empty profile.
Once I started that new - empty profile with FF29 before, I joined my sync account. And no bookmarks restore was on this one!
So, though my profile was new & empty, no restore was made (therefore, I doubt it there was any chance for corruption of the database, right?), the previous testing & sync of FF29 still caused the sync wheel to be spinning forever.
...
Also, many of the sync logs I provided you before were created on that initial profile "which previously hung on bookmarks".



(In reply to Mark Hammond [:markh] from comment #48)
> It would be very helpful if you can try again, but give the sync at least 20
> minutes to complete - just so we can see if it does :)

Well, I just tried again to create another completely new - empty profile.
Have enabled the Trace logging.
And have joined my sync account.
After a while, add-ons were loaded.
Once I enabled bookmarks to be synchronized as well, sync wheel was spinning only for a while, but had not synchronized a single bookmark.
Also, despite of Trace logging was enabled, it had not wrote down a single word, none at all!
Tried both, Aurora and Nightly (the same builds from yesterday), no difference.
Have tried it three times, mostly with same results as described.
Only the third attempt produced some small sync logs (attached), finally.



(In reply to Mark Hammond [:markh] from comment #48)
> > Also, attached you'll find the new sync (trace) log (3MB large).
> > It is the only log I got, from the initial sync + restart of browser was
> > required to be written down.
> > No other logs were created whatsoever, no matter how many times I restarted
> > afterwards, this is the only one it provided.
> 
> This might just mean that sync never started in those other runs.  It might
> be helpful to force a sync (Tools->Sync Now) - note that on Windows, you can
> press and release the "Alt" key to show the menu bar so the Tools menu can
> be selected.

I know how to force a sync.
And I made that too, without any effect.

Comment 50

5 years ago
I hope my sync log can be of use..
I synced today also but the last log visible is the one from yesterday that I have attached.

Thnx to everybody for all the hard work.
Assignee

Comment 51

5 years ago
(In reply to 8472 from comment #49)
> Not quite true.
> Before, while testing still on the affected FF29, I also created a
> completely new - empty profile.
> Once I started that new - empty profile with FF29 before, I joined my sync
> account. And no bookmarks restore was on this one!
> So, though my profile was new & empty, no restore was made (therefore, I
> doubt it there was any chance for corruption of the database, right?), the
> previous testing & sync of FF29 still caused the sync wheel to be spinning
> forever.

hmm - I was previously focused on the logs that showed many bookmarks, and the sync of those bookmarks not completing.  From what you say above, it seems impossible that this happened with your clean profile, as there would have been very few bookmarks in that profile.

> ...
> Also, many of the sync logs I provided you before were created on that
> initial profile "which previously hung on bookmarks".

Unfortunately there are many logs without any context.  It would be very helpful for me if you could point out which of those logs is from a new profile (with no bookmarks and presumably no history or anything else) that failed to complete.  A new log that demonstrates this would also be good if finding the previous log is difficult.

> Tried both, Aurora and Nightly (the same builds from yesterday), no
> difference.
> Have tried it three times, mostly with same results as described.
> Only the third attempt produced some small sync logs (attached), finally.

Unfortunately, these logs only show server maintenance - ie, the sync servers asked your firefox to try again later, which it did.

Note that every time you "disconnect" your account from sync, all preferences are reset - including ones related to logging.  So it sounds to me like the preference to log on success might have been reset, and thus, the only logs are for errors - which in this case includes that "try again later" state.

> I know how to force a sync.
> And I made that too, without any effect.

See above, it sounds like maybe the prefs were reset - if sync is setup to log on success, every manual sync attempt should generate a log (or of sync isn't completing, I'd expect to see the log created at shutdown time and should be visible next time you restart)
Assignee

Comment 52

5 years ago
(In reply to David from comment #50)
> Created attachment 8424007 [details]
> sync-log after working update
> 
> I hope my sync log can be of use..
> I synced today also but the last log visible is the one from yesterday that
> I have attached.
> 
> Thnx to everybody for all the hard work.

Thanks for the update - does sync seem to be working for you?  This log seems to show that sync was working correctly, but Firefox was shutdown before it completed.  ie:

1400180186094	Sync.Engine.Forms	INFO	Uploading 700 - 800 out of 2484 records
1400180186094	Sync.BrowserIDManager	DEBUG	_ensureValidToken already has one
1400180186095	Sync.Collection	DEBUG	POST Length: 30405
1400180186673	Sync.Status	DEBUG	Status for engine forms: error.engine.reason.unknown_fail
1400180186673	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed_partial
1400180186673	Sync.ErrorHandler	DEBUG	forms failed: App. Quitting JS Stack trace: 

this shows Sync was ~1/3 of the way through syncing form data when the app shut down.  Earlier lines show that Sync was making regular progress through the form records - and Sync was only running for a little over 3 seconds.  So I see nothing here that implies Sync wouldn't have finished.
(In reply to Mark Hammond [:markh] from comment #52)

> Thanks for the update - does sync seem to be working for you?  This log
> seems to show that sync was working correctly, but Firefox was shutdown
> before it completed.  ie:
...
> So I see nothing here that implies Sync wouldn't have finished.

Since about Comment 35, I no longer trust that kind of log. There's some evidence that this is really a hang -- a callback never being called -- and all that crap gets performed and flushed as a result of the quit, not merely coincidentally being interrupted by the quit.

The app quitting check is in the event loop spinner; it's the equivalent of Java's InterruptedException.
Assignee

Comment 54

5 years ago
(In reply to Richard Newman [:rnewman] (pto -> May 19) from comment #53)
> (In reply to Mark Hammond [:markh] from comment #52)
> 
> > Thanks for the update - does sync seem to be working for you?  This log
> > seems to show that sync was working correctly, but Firefox was shutdown
> > before it completed.  ie:
> ...
> > So I see nothing here that implies Sync wouldn't have finished.
> 
> Since about Comment 35, I no longer trust that kind of log. There's some
> evidence that this is really a hang -- a callback never being called -- and
> all that crap gets performed and flushed as a result of the quit, not merely
> coincidentally being interrupted by the quit.

That's certainly worth exploring - but in this case I see no evidence of that.  I see timestamps uniformly incrementing before finishing with the "app quiting" exception and the message:

1400180186678	Sync.Synchronizer	INFO	Sync completed at 2014-05-15 20:56:26 after 3.42 secs.

Is there anything in attachment 8424007 [details] specifically that makes you skeptical?
Reporter

Comment 55

5 years ago
(In reply to Mark Hammond [:markh] from comment #51)
> (In reply to 8472 from comment #49)
> > Not quite true.
> > Before, while testing still on the affected FF29, I also created a
> > completely new - empty profile.
> > Once I started that new - empty profile with FF29 before, I joined my sync
> > account. And no bookmarks restore was on this one!
> > So, though my profile was new & empty, no restore was made (therefore, I
> > doubt it there was any chance for corruption of the database, right?), the
> > previous testing & sync of FF29 still caused the sync wheel to be spinning
> > forever.
> 
> hmm - I was previously focused on the logs that showed many bookmarks, and
> the sync of those bookmarks not completing.  From what you say above, it
> seems impossible that this happened with your clean profile, as there would
> have been very few bookmarks in that profile.

Impossible? With SW bugs, everything is possible.
Believe what you want. I don't have no intention in misleading you. Why would I want to do that?
But I tell you what I think could have (but not why) happened.
After switch to the new sync of FF29, everything, including the bookmars should've been uploaded to my new sync account.
Because of this bug, the bookmarks most probably weren't uploaded at all.
Once I started a new empty profile in my VM test machine, there were at least the very few default bookmarks as you say.
After I joined the new empty profile to my sync account, again, because of this bug probably no bookmarks were uploaded, nothing would be available for download, nor were the default bookmars uploaded, again probably due to the bug.
However, the sync wheel was spinning in the FF29 forever as reported, though nothing has been exchanged in between of server and local bookmarks.



(In reply to Mark Hammond [:markh] from comment #51)
> (In reply to 8472 from comment #49)
> > Also, many of the sync logs I provided you before were created on that
> > initial profile "which previously hung on bookmarks".
> 
> Unfortunately there are many logs without any context.  It would be very
> helpful for me if you could point out which of those logs is from a new
> profile (with no bookmarks and presumably no history or anything else) that
> failed to complete.
I'm not exactly sure, but also that file error-1399579284893.zip (Comment #34) could be one such.




Currently, I'm somehow surprised and **** of what just happened with it to me.
I was always testing it with my real production data, on 1 up to 3 computers (one of them is a VM where I tested those new empty profiles).
About 2 hours ago I started that VM (still with that FF Aurora 31.0a2 (2014-05-14)), hoping, that I will get finally some more trace logs for you.
That VM had currently only addons, history, passwords synchronized.
Once I enabled the bookmarks to be synchronized on the VM, it started spinning shortly.
I presume, that the setting change of bookmars to be synchronized was sent to the server.
Afterwards, my other work computer with FF29 & the prod. data "realized" that change of bookmarks sync setting was set to enabled, and started spinning.
I thought, it will be spinning forever as did before on the FF29 as had always happend until today (unless I stopped it manually by disabling that option again), and as is actually this bug report about.
What was my surprise, when it stopped after some time, and the VM sync FF Aurora 31.0a2 (2014-05-14) started spinning and when completed, all my production bookmarks from the work computer (with the affected FF29), even the recent fresh one, showed up on the other VM instance (though, the sort order of them was screwed up side down ...)?!?!?!?! And vice versa, the few defaults from that new empty profile were copied to my production work computer (with FF29) too!!!!
I don't understand it at all!
I have no idea what happened, why so suddenly!
I've been reproducing these same steps many times before, ever since you announced a "possible" fix in Comment #46, and when I reported some success with testing it (tested at the VM of mine) in Comment #47 / Comment #48, there were no recent/fresh data at that time in the VM, the restore I made there was from few days old backup of my bookmarks, and those restored data were not synchronized to my production work computer with FF29 at that time (more recent/fresh data were preserved in the FF29), but the sync wheel was spinning forever once it realized it has been enabled elsewhere.
But now, it suddenly did synchronized with the affected FF29, in both directions, though, screwing up the sort order of my data.
Thus now I restored my bookmarks in the prod. work comp. with FF29 (to get rid of all that mess it got from the VM), the sync wheel started spinning again, and it doesn't seem it would like want to stop! So, now the bug is back in action in the FF29.


I have the feeling, that every time I try something with that new sync (even though I'm reproducing the same steps I made many times before), I get different results!!!
This is very difficult to emulate this bug.
One time it decides itself, that some trace logs will be written down, another time I'm unable to get the logs no matter how much and how long I try (and the bookmarks data weren't synchronized).
It consumes very much time of mine, with no useful testing results for you.
And I'm getting tired (& ****) of it, of how randomly does it behave!!!
No wonder, that you guys get that impression of impossible things are happening at us when testing it.
################################################################
################################################################
One thing would have make the testing much easier. To see my own bookmarks data on the server, their status, items listed, like the Xmarks add-on option allows. Then a man could at least compare, of whether there is anything in there, or not at all. If yes, what actuall data are there, and so on...
################################################################
################################################################

Also, since there is already more than 75 users confirming the problem https://support.mozilla.org/en-US/questions/997761 & https://support.mozilla.org/en-US/questions/998492 (and many more would be out there which have not reported it), I believe, that perhaps you could also try to search for some affected users within your own company.
There must be surely somebody in your own company with these same problems as we're reporting, which could even share their FF user profiles with you for testing purposes.
This could perhaps make the testing for you easier.
Because after what happened today (what I described above), I'm now very sceptical, about any further testing of this in it's current "random result" state (meaning, trying to prevent any further damage to my data).
Had another user run into this on SUMO -- logs showing App Quitting during early bookmarks sync.

Upgrading to 30 fixed the problem.

He had about 1600 bookmarks.
Assignee

Comment 57

5 years ago
(In reply to Richard Newman [:rnewman] from comment #56)
> Had another user run into this on SUMO -- logs showing App Quitting during
> early bookmarks sync.
> 
> Upgrading to 30 fixed the problem.
> 
> He had about 1600 bookmarks.

Are the logs for the successful sync available? I'd be really interested to know if the fix we made here caused success, and if so, what the logs said about the bookmark backup failing.
I don't think he had logOnSuccess set.

Comment 59

5 years ago
This has happened to me in 29 and 30. I'll attach a log in a sec.

The failure is preceded by a large period of time where the system is hibernated/sleeping. I typically hibernate my home PC when leaving for work and resume it at some point that evening. I have yet to see the new sync service complete successfully when resuming from hibernation. I did not have a visible way to monitor the activity of the old sync service so can't be sure if the problem existed previously.

I can successfully sync by restarting Firefox after resuming from hibernation.
Comment on attachment 8439573 [details]
Log of sync failure after resuming from system hibernation (win7)

Scary log:

1402597725970	FirefoxAccounts	DEBUG	(Response) /certificate/sign: code: null - Status text: null
1402597725970	Sync.RESTResponse	DEBUG	Caught exception processing response headers:Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIHttpChannel.visitResponseHeaders] Stack trace: RESTResponse.prototype.headers()@resource://services-common/rest.js:670 < this.HawkClient.prototype._maybeNotifyBackoff()


Relevant code:

      log.debug("(Response) " + path + ": code: " + status +
                " - Status text: " + restResponse.statusText);
      if (logPII) {
        log.debug("Response text: " + restResponse.body);
      }

      // All responses may have backoff headers, which are a server-side safety
      // valve to allow slowing down clients without hurting performance.
      self._maybeNotifyBackoff(restResponse, "x-weave-backoff");
      self._maybeNotifyBackoff(restResponse, "x-backoff");

      if (error) {
        // When things really blow up, reconstruct an error object that follows
        // the general format of the server on error responses.
        return deferred.reject(self._constructError(restResponse, error));
      }

And:

  _maybeNotifyBackoff: function (response, headerName) {
    if (!this.observerPrefix || !response.headers) {
      return;
    }
    let headerVal = response.headers[headerName];


My guess is that `response.headers` is throwing, rather than returning null, so we don't get to that error clause.

Waddaya reckon, Mark?
Flags: needinfo?(mhammond)
Assignee

Comment 62

5 years ago
(In reply to Richard Newman [:rnewman] from comment #61)

>   _maybeNotifyBackoff: function (response, headerName) {
>     if (!this.observerPrefix || !response.headers) {
>       return;
>     }

That check in the first line for '!response.headers' should protect us from that.  However, that only landed on central days ago - so 30 and earlier will go directly to:

>     let headerVal = response.headers[headerName];

which will cause this.  I opened bug 1024811.

Worse, those "scary" messages in the log are simply telling us that response.headers is returning null.  What we *don't* see is the actual exception caused when we try and index into it - ie, the *real* exception isn't being reported at all and is causing the hang.  So I opened bug 1024812.

Thanks!
> 
> 
> My guess is that `response.headers` is throwing, rather than returning null,
> so we don't get to that error clause.
> 
> Waddaya reckon, Mark?
Flags: needinfo?(mhammond)

Comment 63

5 years ago
I've been using 31 beta for a few weeks and now the stable release. I've not seen the never ending sync problem at all, even after resuming from hibernation. Thanks!
Reporter

Comment 64

5 years ago
Have just checked it with FF31, meaning reenabled sync. of bookmarks again, tested it at least with two existing FF profiles of mine (which were failing before), and a one/third fresh/new FF profile (complete new sync), and the problem seems to be gone.
Thank you folks.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WORKSFORME
Flags: firefox-backlog+
Reporter

Comment 65

5 years ago
Reopening, sorry.
Yesterday it appeared to be working, today I have some doubts after I came home from work, and some bookmarks I saved (sync was fully completed before I left) in work were missing at home.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Reporter

Comment 66

5 years ago
So, have made some further testing, and it really ain't working as it happened by accident at yesterday.
Though the sync wheel now at least ain't spinning forever, a change on one pc/browser doesn't reflect on another one.
It also takes a quite long (sevelal minutes) to complete one sync round.


Furthermore I have a feeling, that the sync of history is broken now too.
I'm not sure exactly when it started, but I think it could've started sometime around the FF31 was released and updated.
Both - FF sync and the Xmarks aren't syncing the history correctly, if at all.
Any idea about this new one? So far I haven't found anything reported.
> Both - FF sync and the Xmarks aren't syncing the history correctly, if at all.

Are you using FF sync and Xmarks at the same time on the same browser?
Assignee

Comment 68

5 years ago
Thanks for continuing to help us improve sync and I'm confident there are a number of outstanding issues we need to fix.  However, I think it is best to leave this bug as the "continually rolling" issue (which I think we have addressed) and have different bugs targeted at additional issues - eg, specifically about history not syncing correctly.  If you are confident there is an issue with history sync and Xmarks, please open a new bug specifically for that.  Feel free to CC me on any new sync-related bugs you open.

(If it is any consolation, I had issues yesterday syncing history that, as Murphy would have it, I can't reproduce - so I *am* sure there are history sync issues, I just can't put my finger on them)
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → WORKSFORME
Reporter

Comment 69

5 years ago
(In reply to Chris Karlof [:ckarlof] from comment #67)
> > Both - FF sync and the Xmarks aren't syncing the history correctly, if at all.
> 
> Are you using FF sync and Xmarks at the same time on the same browser?

Don't worry, I disable functionalities of one or another when testing it, so that it doesn't get in conflict with the another. I don't leave them enabled both at the same time.
And I was forced to start using Xmarks for bookmarks only because of this bug.


(In reply to Mark Hammond [:markh] from comment #68)
> Thanks for continuing to help us improve sync and I'm confident there are a
> number of outstanding issues we need to fix.  However, I think it is best to
> leave this bug as the "continually rolling" issue (which I think we have
> addressed) and have different bugs targeted at additional issues - eg,
> specifically about history not syncing correctly.  If you are confident
> there is an issue with history sync and Xmarks, please open a new bug
> specifically for that.  Feel free to CC me on any new sync-related bugs you
> open.
> 
> (If it is any consolation, I had issues yesterday syncing history that, as
> Murphy would have it, I can't reproduce - so I *am* sure there are history
> sync issues, I just can't put my finger on them)

Why did you resolved this bug after I reopened it?
The problem still persist. It was only one day when I though it was working, but at the next day the problem with bookmarks sync was back again.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Assignee

Comment 70

5 years ago
(In reply to 8472 from comment #69)
> Why did you resolved this bug after I reopened it?

You quoted the reason I gave.

As you said yourself:
(In reply to 8472 from comment #66)
> Though the sync wheel now at least ain't spinning forever, a change on one
> pc/browser doesn't reflect on another one.
> It also takes a quite long (sevelal minutes) to complete one sync round.

The bug as originally reported said: "After update to the new ver. of Firefox's Sync in FF29 (and account created and verified!!!), the Sync is continuously rolling, doesn't stop even after a many hours of running."

That problem is no longer being experienced by you, so we are working under the assumption that the problem as described is fixed.  As you mention, other people were seeing *this* problem.

It might help if you re-read comment 68 - marking this bug as closed doesn't reflect that we don't take the other problems you see seriously, it is just a reflection that the problem described in this bug - and only this problem - can no longer be reproduced.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → WORKSFORME
Reporter

Comment 71

5 years ago
(In reply to Mark Hammond [:markh] from comment #70)
> (In reply to 8472 from comment #69)
> > Why did you resolved this bug after I reopened it?
> 
> You quoted the reason I gave.
> 
> As you said yourself:
> (In reply to 8472 from comment #66)
> > Though the sync wheel now at least ain't spinning forever, a change on one
> > pc/browser doesn't reflect on another one.
> > It also takes a quite long (sevelal minutes) to complete one sync round.
> 
> The bug as originally reported said: "After update to the new ver. of
> Firefox's Sync in FF29 (and account created and verified!!!), the Sync is
> continuously rolling, doesn't stop even after a many hours of running."
> 
> That problem is no longer being experienced by you, so we are working under
> the assumption that the problem as described is fixed.  As you mention,
> other people were seeing *this* problem.
> 
> It might help if you re-read comment 68 - marking this bug as closed doesn't
> reflect that we don't take the other problems you see seriously, it is just
> a reflection that the problem described in this bug - and only this problem
> - can no longer be reproduced.

ok, understood & thx
regarding the other bugs, I'm currently in no mood for searching through the list for a possible existing bug report for the other problems I've detected, perhaps later.
Assignee

Comment 72

5 years ago
(In reply to 8472 from comment #71)
> ok, understood & thx

Thanks for understanding!

> regarding the other bugs, I'm currently in no mood for searching through the
> list for a possible existing bug report for the other problems I've
> detected, perhaps later.

You need not be worried about finding duplicates - we can find them later - duplicates for an issue is better than no issue at all :)  However, to be frank, bugs like this are going to be difficult to action without a fair amount of detail (eg, logs etc), and I do understand that takes time.  When you do find time to take this further, I'll be happy to help as much as I can, and thanks for your contributions so far!
Reporter

Comment 73

5 years ago
(In reply to Mark Hammond [:markh] from comment #72)
> (In reply to 8472 from comment #71)
> > regarding the other bugs, I'm currently in no mood for searching through the
> > list for a possible existing bug report for the other problems I've
> > detected, perhaps later.
> 
> You need not be worried about finding duplicates - we can find them later -
> duplicates for an issue is better than no issue at all :)  However, to be
> frank, bugs like this are going to be difficult to action without a fair
> amount of detail (eg, logs etc), and I do understand that takes time.  When
> you do find time to take this further, I'll be happy to help as much as I
> can, and thanks for your contributions so far!

Sure I understand that bugs like this one could be difficult.
Regarding the logs, now this would be even harder, because as I've been testing it  recently & monitoring for logs of course, even though the services.sync.log.appender.file.logOnError is set to true, there are no logs at all on the next sync problems I've mentioned.
Assignee

Comment 74

5 years ago
If you set services.sync.log.appender.file.logOnSuccess to true, you will also get logs from successful runs.  If you also set services.sync.log.logger.engine.history and services.sync.log.logger.engine.bookmarks to "Trace", you should get relatively detailed logs.

In the ideal world, we'd see these logs from the very first sync done by 2 devices *and* with a brand-new Firefox account - then we should see a record of everything "outgoing" from the first configured device and everything "incoming" to the second.  We'd correlate that with the items known to be missing on the second device and see where to look next.

If you do get such logs, please attach them to a new bug :)
I had a very similar problem: bookmarks not synchronizing between my desktop (Linux Mint 17 AMD64 MATE, Firefox 39) and laptop (Ubuntu 14.04 AMD64 Unity, Firefox 39).  When I made a bookmark on my desktop PC, it would not sync to my laptop.  I fixed this by going to my laptop and unlinking the device from Edit > Preferences > Sync > Unlink.  I thought to do this because the desktop Sync preferences page said my email address, whereas my laptop Sync preferences page didn't have the @gmail.com part of the address.  I then logged in again on my laptop, and voila!  Everything works perfectly.

I had previously tried the Places Maintenance add-on on both computers to no effect.  The sync log in about:sync-log wasn't showing any errors, except for a several month old "can't connect to server" type error, which I believe occurred while I was actually disconnected from the Internet.

I hope this helps someone!
I should note that these computers had been syncing to and from each other for over a year before it stopped working.  I suspect a Firefox or Sync upgrade was the problem, or maybe one day I just answered a dialog box the wrong way.  Who knows?
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.