Closed Bug 1052266 Opened 5 years ago Closed 5 years ago

Potential deadlock detected: CacheEntry, CacheFile.mLock, CacheStorageService at CacheEntry

Categories

(Core :: Networking: Cache, defect, major)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla35
Tracking Status
firefox32 --- unaffected
firefox33 --- unaffected
firefox34 --- fixed
firefox35 --- fixed
b2g-v2.0 --- unaffected
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: kinetik, Assigned: michal)

References

Details

Attachments

(1 file, 1 obsolete file)

My console contains the following potential deadlock message (repeated) when playing back video on YouTube:

Potential deadlock detected:
Cyclical dependency starts at
Mutex : CacheEntry
Next dependency:
Mutex : CacheFile.mLock
Next dependency:
Mutex : CacheStorageService (currently acquired)
Cycle completed at
Mutex : CacheEntry
Deadlock may happen for some other execution

...with the following stack:

#0  mozilla::BlockingResourceBase::CheckAcquire (this=0x7fe953652040) at xpcom/glue/BlockingResourceBase.cpp:125
#1  mozilla::OffTheBooksMutex::Lock (this=0x7fe953652040) at xpcom/glue/BlockingResourceBase.cpp:222
#2  mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock (this=0x7fffcfa5f290, aLock=..., _notifier=...) at ../../dist/include/mozilla/Mutex.h:164
#3  mozilla::net::CacheEntry::DoomAlreadyRemoved (this=0x7fe953652000) at netwerk/cache2/CacheEntry.cpp:1495
#4  mozilla::net::CacheStorageService::AddStorageEntry (this=0x7fe9620d0200, aContextKey=..., aURI=0x7fe953651f00, aIdExtension=..., aWriteToDisk=true, aCreateIfNotExist=true, aReplace=true, aResult=0x7fffcfa5f480) at netwerk/cache2/CacheStorageService.cpp:1408
#5  mozilla::net::CacheEntry::ReopenTruncated (this=0x7fe953652000, aMemoryOnly=false, aCallback=0x0) at netwerk/cache2/CacheEntry.cpp:482
#6  mozilla::net::CacheEntry::Recreate (this=0x7fe953652000, aMemoryOnly=false, _retval=0x7fe957779c60) at netwerk/cache2/CacheEntry.cpp:1306
#7  mozilla::net::CacheEntryHandle::Recreate (this=0x7fe9533cf100, aMemoryOnly=false, _retval=0x7fe957779c60) at netwerk/cache2/CacheEntry.h:361
#8  mozilla::net::nsHttpChannel::InitCacheEntry (this=0x7fe957779800) at netwerk/protocol/http/nsHttpChannel.cpp:3718
#9  mozilla::net::nsHttpChannel::ContinueProcessNormal (this=0x7fe957779800, rv=NS_OK) at netwerk/protocol/http/nsHttpChannel.cpp:1480
#10 mozilla::net::nsHttpChannel::ProcessNormal (this=0x7fe957779800) at netwerk/protocol/http/nsHttpChannel.cpp:1447
#11 mozilla::net::nsHttpChannel::ProcessResponse (this=0x7fe957779800) at netwerk/protocol/http/nsHttpChannel.cpp:1276
#12 mozilla::net::nsHttpChannel::OnStartRequest (this=0x7fe957779800, request=0x7fe952917ab0, ctxt=0x0) at netwerk/protocol/http/nsHttpChannel.cpp:4939
#13 nsInputStreamPump::OnStateStart (this=0x7fe952917ab0) at netwerk/base/src/nsInputStreamPump.cpp:521
#14 nsInputStreamPump::OnInputStreamReady (this=0x7fe952917ab0, stream=0x7fe94c3e4f38) at netwerk/base/src/nsInputStreamPump.cpp:433
#15 nsInputStreamReadyEvent::Run (this=0x7fe9453d5800) at xpcom/io/nsStreamUtils.cpp:88
#16 nsThread::ProcessNextEvent (this=0x7fe987e39e20, aMayWait=false, aResult=0x7fffcfa5f9df) at xpcom/threads/nsThread.cpp:770
Same, just trying to load a gUM testing page with pinned gmail/twitter tabs. This is a debug build on Windows from m-c. It's asserting all the time which makes debugging (other stuff) very problematic.

I've got two different stack traces hitting the assertion:

###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : CacheEntry calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : CacheFile.mLock calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : CacheStorageService (currently acquired)
 calling context
  [stack trace unavailable]

=== Cycle completed at
--- Mutex : CacheEntry calling context
  [stack trace unavailable]

Deadlock may happen for some other execution

[2536] ###!!! ASSERTION: Potential deadlock detected:
Cyclical dependency starts at
Mutex : CacheEntry
Next dependency:
Mutex : CacheFile.mLock
Next dependency:
Mutex : CacheStorageService (currently acquired)
Cycle completed at
Mutex : CacheEntry
Deadlock may happen for some other execution

Stack trace 1:

>	xul.dll!mozilla::BlockingResourceBase::CheckAcquire()  Line 108	C++
 	xul.dll!mozilla::OffTheBooksMutex::Lock()  Line 223	C++
 	xul.dll!mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock<mozilla::Mutex>(mozilla::Mutex & aLock={...}, const mozilla::detail::GuardObjectNotifier & _notifier={...})  Line 165	C++
 	xul.dll!mozilla::net::CacheEntry::DoomAlreadyRemoved()  Line 1495 + 0x18 bytes	C++
 	xul.dll!mozilla::net::CacheStorageService::AddStorageEntry(const nsACString_internal & aContextKey={...}, nsIURI * aURI=0x1fd2db80, const nsACString_internal & aIdExtension={...}, bool aWriteToDisk=false, bool aCreateIfNotExist=true, bool aReplace=true, mozilla::net::CacheEntryHandle * * aResult=0x0020450c)  Line 1410	C++
 	xul.dll!mozilla::net::CacheEntry::ReopenTruncated(bool aMemoryOnly=true, nsICacheEntryOpenCallback * aCallback=0x00000000)  Line 482 + 0x65 bytes	C++
 	xul.dll!mozilla::net::CacheEntry::Recreate(bool aMemoryOnly=true, nsICacheEntry * * _retval=0x07346760)  Line 1306 + 0x13 bytes	C++
 	xul.dll!mozilla::net::CacheEntryHandle::Recreate(bool aMemoryOnly=true, nsICacheEntry * * _retval=0x07346760)  Line 361 + 0x29 bytes	C++
 	xul.dll!mozilla::net::nsHttpChannel::InitCacheEntry()  Line 3718 + 0x3b bytes	C++
 	xul.dll!mozilla::net::nsHttpChannel::ContinueProcessNormal(tag_nsresult rv=NS_OK)  Line 1480 + 0x8 bytes	C++
 	xul.dll!mozilla::net::nsHttpChannel::ProcessNormal()  Line 1448	C++
 	xul.dll!mozilla::net::nsHttpChannel::ProcessResponse()  Line 1276 + 0x8 bytes	C++
 	xul.dll!mozilla::net::nsHttpChannel::OnStartRequest(nsIRequest * request=0x16a1cc10, nsISupports * ctxt=0x00000000)  Line 4939 + 0xe bytes	C++
 	xul.dll!nsInputStreamPump::OnStateStart()  Line 521 + 0x2c bytes	C++
 	xul.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream=0x0d2a3f90)  Line 433 + 0xb bytes	C++
 	xul.dll!nsInputStreamReadyEvent::Run()  Line 90	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aMayWait=true, bool * aResult=0x00204898)  Line 770 + 0x19 bytes	C++
 	xul.dll!NS_InvokeByIndex(nsISupports * that=0x00a22380, unsigned int methodIndex=8, unsigned int paramCount=2, nsXPTCVariant * params=0x00204888)  Line 71	C++
 	xul.dll!CallMethodHelper::Invoke()  Line 2369 + 0x1c bytes	C++
 	xul.dll!CallMethodHelper::Call()  Line 1730 + 0x8 bytes	C++
 	xul.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=CALL_METHOD)  Line 1697 + 0x16 bytes	C++
 	xul.dll!XPC_WN_CallMethod(JSContext * cx=0x03c0e100, unsigned int argc=1, JS::Value * vp=0x00204a2c)  Line 1288 + 0xe bytes	C++

Stack trace 2:

>	xul.dll!mozilla::BlockingResourceBase::CheckAcquire()  Line 108	C++
 	xul.dll!mozilla::OffTheBooksMutex::Lock()  Line 223	C++
 	xul.dll!mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock<mozilla::Mutex>(mozilla::Mutex & aLock={...}, const mozilla::detail::GuardObjectNotifier & _notifier={...})  Line 165	C++
 	xul.dll!mozilla::net::CacheEntry::DoomAlreadyRemoved()  Line 1495 + 0x18 bytes	C++
 	xul.dll!mozilla::net::CacheStorageService::CacheFileDoomed(nsILoadContextInfo * aLoadContextInfo=0x1bc30ba0, const nsACString_internal & aIdExtension={...}, const nsACString_internal & aURISpec={...})  Line 1763	C++
 	xul.dll!mozilla::net::CacheFileIOManager::DoomFileInternal(mozilla::net::CacheFileHandle * aHandle=0x064efc00)  Line 2123	C++
 	xul.dll!mozilla::net::CacheFileIOManager::DoomFileByKeyInternal(const unsigned char [20]* aHash=0x0349f8a8, bool aFailIfAlreadyDoomed=true)  Line 2181 + 0x11 bytes	C++
 	xul.dll!mozilla::net::CacheFileIOManager::OverLimitEvictionInternal()  Line 2693 + 0xe bytes	C++
 	xul.dll!nsRunnableMethodImpl<enum tag_nsresult (__thiscall mozilla::net::CacheFileIOManager::*)(void),void,1>::Run()  Line 393	C++
 	xul.dll!mozilla::net::CacheIOThread::LoopOneLevel(unsigned int aLevel=8)  Line 292	C++
 	xul.dll!mozilla::net::CacheIOThread::ThreadFunc()  Line 223	C++
 	xul.dll!mozilla::net::CacheIOThread::ThreadFunc(void * aClosure=0x05cfaca0)  Line 168	C++
 	nss3.dll!_PR_NativeRunThread(void * arg=0x00a11de0)  Line 397 + 0xf bytes	C
 	nss3.dll!pr_root(void * arg=0x00a11de0)  Line 90 + 0xf bytes	C
 	msvcr100.dll!_callthreadstartex()  Line 314 + 0x6 bytes	C
 	msvcr100.dll!_threadstartex(void * ptd=0x002be090)  Line 292 + 0x5 bytes	C
Severity: normal → major
OS: Linux → All
I think this comes from bug 1035411.  We know there is a code that calls back on the cache from readsegments hanler called from cache input stream.  This way we call locks in opposite orders.  The warning is legitimate.
Blocks: 1035411
That bug is marked fixed, though? The warnings here are on newer m-c.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #3)
> That bug is marked fixed, though? The warnings here are on newer m-c.

That bug is a revert of unlock over reader handler call, that unlock was there because I was encountering potential deadlocks.  Since bug 1035411 we keep the lock closed because there was a serious problem with buffer lifetime (freed outside the lock).  Now we want to find again the code that calls on the cache from the readsegments handler and fix it.  --enable-trace-malloc builds will show stacks during the warning.
--enable-trace-malloc no longer works on Windows, see for example bug 1022449.
I reproduced this now on Linux in a build with --enable-trace-malloc and debugging info, but still no stacks:

###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : CacheEntry calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : CacheFile.mLock calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : CacheStorageService (currently acquired)
 calling context
  [stack trace unavailable]

=== Cycle completed at
--- Mutex : CacheEntry calling context
  [stack trace unavailable]

Deadlock may happen for some other execution

[21883] ###!!! ASSERTION: Potential deadlock detected:
Cyclical dependency starts at
Mutex : CacheEntry
Next dependency:
Mutex : CacheFile.mLock
Next dependency:
Mutex : CacheStorageService (currently acquired)
Cycle completed at
Mutex : CacheEntry
Deadlock may happen for some other execution

: 'Error', file /home/morbo/hg/mozilla-central/xpcom/glue/BlockingResourceBase.cpp, line 323
What needs to happen to fix this?  Having a large number of 30+ line errors dumped to the console is making development harder than necessary, aside from the fact these errors almost certainly indicate a real bug.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)
> I reproduced this now on Linux in a build with --enable-trace-malloc and
> debugging info, but still no stacks:

It looks like the code was rewritten (bug 1049068, bug 1050445) to no longer use trace-malloc, and instead you need to cmment out the line:

#define MOZ_CALLSTACK_DISABLED

at the top of xpcom/glue/BlockingResourceBase.h.

(I haven't tried this at all, though.)
(In reply to Matthew Gregan [:kinetik] from comment #7)
> What needs to happen to fix this?

Do you have a simple STR? I'm actively working on this but I can't reproduce it easily. I hit this warning few times when I tried to restore a very large session from my default firefox profile, but startup of this session with debug build takes several hours and FF often crashes before I hit this warning. So far I wasn't able to reproduce it with MOZ_CALLSTACK_DISABLED commented out since FF sooner or later crashes in the stackwalking code that collects the stack traces.

Anyway, I've created a patch that asserts when CacheStorageService's lock is acquired under the CacheFile's lock and the code didn't assert while I hit the deadlock warning so this invalidates the theory in comment #2.
Assignee: nobody → michal.novotny
Status: NEW → ASSIGNED
Michal, it might be CacheEntry lock instead of CacheStorageService lock, just worth mentioning it.
I finally got the deadlock detector dump with the stack traces and it showed up that there is a typo and the lock CacheFile.mLock is in fact CacheIndex.mLock. But besides this, I find the deadlock detector output useless. I had to go through the ordering information manually in gdb to understand the cycle. The potential cycle (which we never really complete) is formed by these lock acquisitions:

1) CacheIndex -> CacheStorageService
2) CacheEntry -> CacheIndex
3) CacheStorageService -> CacheEntry

To break the potential cycle we need to remove one of them but I'm not sure which one. The best candidate is (1) which was added by bug 1020416, but we cannot release the the index lock before calling CacheStorageService. We could maybe protect CacheStorageService::mForcedValidEntries with a new separate lock different from CacheStorageService::mLock. Honza, what's your opinion?
Flags: needinfo?(honzab.moz)
(In reply to Michal Novotny (:michal) from comment #11)
> I finally got the deadlock detector dump with the stack traces and it showed
> up that there is a typo and the lock CacheFile.mLock is in fact
> CacheIndex.mLock. But besides this, I find the deadlock detector output
> useless. I had to go through the ordering information manually in gdb to
> understand the cycle. The potential cycle (which we never really complete)
> is formed by these lock acquisitions:
> 
> 1) CacheIndex -> CacheStorageService
> 2) CacheEntry -> CacheIndex
> 3) CacheStorageService -> CacheEntry
> 
> To break the potential cycle we need to remove one of them but I'm not sure
> which one. The best candidate is (1) which was added by bug 1020416, but we
> cannot release the the index lock before calling CacheStorageService. We
> could maybe protect CacheStorageService::mForcedValidEntries with a new
> separate lock different from CacheStorageService::mLock. Honza, what's your
> opinion?

Great find.  I'm not against having a new lock for mForcedValidEntries, if that doesn't bring a similar problem again (just make the cycle chain larger).  CacheStorageService is a singleton and it won't change IMO.  So, adding one more lock is not a big waste.
Flags: needinfo?(honzab.moz)
Attached patch fix (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=ca55248fcd02
Attachment #8486375 - Flags: review?(honzab.moz)
Comment on attachment 8486375 [details] [diff] [review]
fix

Review of attachment 8486375 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: netwerk/cache2/CacheStorageService.cpp
@@ +971,5 @@
>        LOG(("  still referenced, not removing"));
>        return false;
>      }
>  
>      if (!aEntry->IsUsingDisk() && IsForcedValidEntryInternal(entryKey)) {

rather use IsForcedValidEntry and don't have the lock2 at all.
Attachment #8486375 - Flags: review?(honzab.moz) → review+
Comment on attachment 8487154 [details] [diff] [review]
patch v2

Approval Request Comment
[Feature/regressing bug #]: 1020416
[User impact if declined]: potential deadlock
[Describe test coverage new/current, TBPL]: none
[Risks and why]: low
[String/UUID change made/needed]: none
Attachment #8487154 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/643b371e51df
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment on attachment 8487154 [details] [diff] [review]
patch v2

Aurora+
Attachment #8487154 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.