Closed Bug 443385 Opened 16 years ago Closed 16 years ago

Deadlock when scheduled sync fails to complete and release lock

Categories

(Cloud Services :: General, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: debbiep-mozilla, Unassigned)

References

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9) Gecko/2008061004 Firefox/3.0
Build Identifier: 0.2.0

Reproducing from forum post https://labs.mozilla.com/forum/index.php/topic,935.msg4070.html#msg4070

Scheduled sync sometimes halts partway through.  In the case here, the scheduled sync either hangs or dies between fetching the private key and the public key.

2008-07-02 21:20:30     Service.Main    INFO    Running scheduled sync
2008-07-02 21:20:30     Service.DAV     DEBUG   LOCK request for lock
2008-07-02 21:20:31     Service.DAV     DEBUG   GET request for meta/version
2008-07-02 21:20:36     Service.Main    TRACE   Retrieving keypair from server
2008-07-02 21:20:36     Service.DAV     DEBUG   GET request for private/privkey
2008-07-02 21:21:30     Service.Main    INFO    Running scheduled sync
2008-07-02 21:21:30     Async.Generator ERROR   Exception: Could not acquire lock (lock already held)

After this, the lock is never released, so no further scheduling happens.  This is visible to the user as a sync-on-quit window that never finishes and never goes away.


Reproducible: Sometimes

Steps to Reproduce:
This doesn't happen reliably.  I have seen it happen twice in two continuous days of use.  Probably it can be made to happen more when server load is high or network latency is poor.  I seem to only need to leave it running for an hour or two and it happens.

Both times it happened on a Mac OS X 10.5.3 Core Duo laptop.  This error hasn't happened on my Windows XP 64 machine at work, which is connected through a different ISP.

Actual Results:  
Synchronization ceases.  When quitting Firefox, the small window that pops up informing me of the final sync remains with the spinning wheel icon, longer than I have been patient to wait for it (several minutes).  The log continues to show ad infinitum:

2008-07-02 21:55:30     Service.Main    INFO    Running scheduled sync
2008-07-02 21:55:30     Async.Generator ERROR   Exception: Could not acquire lock (lock already held)


Expected Results:  
Expected no deadlock to occur.  Expected final sync to be done in tens of seconds.

The laptop does go to sleep per power-saving settings.  I don't know if this is related.
The computer in question is the only one connected to services.mozilla.org.  The others are switched off.
I'm seeing this as well. After using weave for some time, sooner or later I'll get a lock already held error, and from this point on, Weave will stop working. 

If I quit Firefox, I get a sync window that never goes away, while the log file says "waiting for current action to finish".
I have the same problem of Marcus, here.
In the logs I see an infinite list of 

2008-07-04 12:03:05	Service.Main	INFO	Running scheduled sync
2008-07-04 12:03:05	Async.Generator	ERROR	Exception: Could not acquire lock (lock already held)
2008-07-04 12:04:05	Service.Main	INFO	Running scheduled sync
2008-07-04 12:05:03	Async.Generator	ERROR	Exception: Could not acquire lock (lock already held)
2008-07-04 12:05:05	Service.Main	INFO	Running scheduled sync
2008-07-04 12:05:05	Async.Generator	ERROR	Exception: Could not acquire lock (lock already held)


Seems to me that when a sync takes some time (network problems, or such) and the timer for the scheduled sync fires, the sync is made whether there is something in progress or not.
the subsequent syncs fail because cannot get the lock, and apparently the original sync op is screwed by the new ones and never completes.
Having the same problems, I posted info in the duplicate bug 443401.
I have a good idea what the problem is, I'll work on a fix + update tomorrow.
Just to echo the original reporters problem:  I'm having the same exact issue on both clients (FF3.0 release, one client is XP the other client is Vista).

The verbose log shows:

2008-07-07 11:04:08	Service.DAV	DEBUG	UNLOCK request for lock
2008-07-07 11:04:54	Service.Main	INFO	Running scheduled sync
2008-07-07 11:04:54	Async.Generator	ERROR	Exception: Could not acquire lock (lock already held)
2008-07-07 11:04:54	Async.Generator	DEBUG	Stack trace:
No traceback available.
This exception was raised by an asynchronous coroutine.
Initial async stack trace:
unknown (async) :: WeaveLockWrapper-3150
module:wrap.js:92 :: WeaveNotifyWrapper
module:service.js:271 :: WeaveSvc__onSchedule
module:util.js:399 :: innerBind
module:util.js:439 :: EL_notify
2008-07-07 11:05:02	Service.DAV	WARN	_makeRequest: got status 400
2008-07-07 11:05:02	Async.Generator	WARN	Async method 'DC_unlock-3147' may have yielded without an outstanding callback.
2008-07-07 11:05:54	Service.Main	INFO	Running scheduled sync
2008-07-07 11:05:54	Async.Generator	ERROR	Exception: Could not acquire lock (lock already held)
2008-07-07 11:05:54	Async.Generator	DEBUG	Stack trace:
No traceback available.
This exception was raised by an asynchronous coroutine.
Initial async stack trace:
unknown (async) :: WeaveLockWrapper-3152
module:wrap.js:92 :: WeaveNotifyWrapper
module:service.js:271 :: WeaveSvc__onSchedule
module:util.js:399 :: innerBind
module:util.js:439 :: EL_notify
2008-07-07 11:06:54	Service.Main	INFO	Running scheduled sync
2008-07-07 11:06:54	Async.Generator	ERROR	Exception: Could not acquire lock (lock already held)
2008-07-07 11:06:54	Async.Generator	DEBUG	Stack trace:
No traceback available.
This exception was raised by an asynchronous coroutine.
Initial async stack trace:
unknown (async) :: WeaveLockWrapper-3154
module:wrap.js:92 :: WeaveNotifyWrapper
module:service.js:271 :: WeaveSvc__onSchedule
module:util.js:399 :: innerBind
module:util.js:439 :: EL_notify
<and then the scheduled sync continue (and fail) indefinitely because it appears that it thinks that the lock is always set>

Note that the triggering condition appears to be "Service.DAV	WARN _makeRequest: got status 400"

The only remedy to this problem (which seems to happen quite frequently for me at least) is to close down the browser and cancel out of the closing sync, and restart it.  That will sometimes clear the problem condition.
Fix checked in changeset 7a4e6dabedeb.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
I've tried building from tip, and triggering the bug. After a while I get the following exception:

DAVLocks.default is undefined (file 'file:///Users/pazu/Library/Application%20Support/Firefox/Profiles/b3whpukl.default/extensions/%7B340c2bbc-ce74-4362-90b5-7c26312808ef%7D/modules/dav.js', line 267)

Since the lock is unconditionally deleted on dav.js#426, the UNLOCK function calledcan't access DAVLocks['default'].token and fails.

Sorry if I'm jumping the gun, by the way. Maybe this wasn't supposed to be tested yet.
Here's a patch that seems to address the problem mentioned in my previous comment. I just moved the line that deletes  DAVLocks['default'] after the unlock call.
You're right.  I fixed this in a slightly different way before I saw your patch.  Version 41816235a157 fixes the problem by moving the delete below the UNLOCK call like you do, but it also adds a try/catch.
Where would updated Weave that has this fixed be to download it?  Do we just download it again from the main page of Weave?
The fix is committed to our source repository, but we have not yet made a release that includes it.  We will make an update in the next couple of days.
Component: Weave → General
Product: Mozilla Labs → Weave
Target Milestone: -- → ---
Component: Weave → General
Product: Mozilla Labs → Weave
Target Milestone: -- → ---
QA Contact: weave → general
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: