Better CacheEntry purge prevention: browser_bug666317.js | application crashed [@ nsHttpChannelAuthProvider::Release()] with cache2=on

RESOLVED FIXED in mozilla29

Status

()

Core
Networking: Cache
--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: mayhemer, Assigned: mayhemer)

Tracking

Trunk
mozilla29
x86_64
Windows 7
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

https://hg.mozilla.org/projects/gum/rev/875d13e28b98:

https://tbpl.mozilla.org/php/getParsedLog.php?id=30912328&tree=Gum
https://tbpl.mozilla.org/php/getParsedLog.php?id=30914194&tree=Gum

13:08:42     INFO -  Operating system: Windows NT
13:08:42     INFO -                    6.1.7601 Service Pack 1
13:08:42     INFO -  CPU: x86
13:08:42     INFO -       GenuineIntel family 6 model 30 stepping 5
13:08:42     INFO -       8 CPUs
13:08:42     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
13:08:42     INFO -  Crash address: 0x65d06508
13:08:42     INFO -  Thread 36 (crashed)
13:08:42     INFO -   0  xul.dll!nsHttpChannelAuthProvider::Release() [nsHttpChannelAuthProvider.cpp:875d13e28b98 : 1338 + 0x26]
13:08:42     INFO -      eip = 0x65d06508   esp = 0x0ec4fb34   ebp = 0x0ec4fb40   ebx = 0x0062ecb0
13:08:42     INFO -      esi = 0x1c6ba6a8   edi = 0x63f45ff0   eax = 0x00000000   ecx = 0x6447bfed
13:08:42     INFO -      edx = 0x6825e4d8   efl = 0x00000212
13:08:42     INFO -      Found by: given as instruction pointer in context
13:08:42     INFO -   1  xul.dll!nsCOMPtr<nsIHttpChannelAuthProvider>::~nsCOMPtr<nsIHttpChannelAuthProvider>() [nsCOMPtr.h:875d13e28b98 : 514 + 0x5]
13:08:42     INFO -      eip = 0x65d07a9e   esp = 0x0ec4fb48   ebp = 0x0ec4fb64
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   2  xul.dll!mozilla::net::nsHttpChannel::~nsHttpChannel() [nsHttpChannel.cpp:875d13e28b98 : 216 + 0x51]
13:08:42     INFO -      eip = 0x65d23475   esp = 0x0ec4fb54   ebp = 0x0ec4fb64
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   3  xul.dll!mozilla::net::nsHttpChannel::`vector deleting destructor'(unsigned int) + 0xa
13:08:42     INFO -      eip = 0x65d241f6   esp = 0x0ec4fb60   ebp = 0x0ec4fb64
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   4  xul.dll!nsHashPropertyBag::Release() [nsHashPropertyBag.cpp:875d13e28b98 : 28 + 0x6c]
13:08:42     INFO -      eip = 0x65bab2d2   esp = 0x0ec4fb6c   ebp = 0x0ec4fb7c
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   5  xul.dll!mozilla::net::HttpBaseChannel::Release() [HttpBaseChannel.cpp:875d13e28b98 : 159 + 0xb]
13:08:42     INFO -      eip = 0x65ce62df   esp = 0x0ec4fb84   ebp = 0x0ec4fb8c
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   6  xul.dll!mozilla::net::nsHttpChannel::Release() [nsHttpChannel.cpp:875d13e28b98 : 4298 + 0xb]
13:08:42     INFO -      eip = 0x65d057ec   esp = 0x0ec4fb94   ebp = 0x0ec4fb9c
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   7  xul.dll!nsCOMPtr<nsICacheEntryOpenCallback>::~nsCOMPtr<nsICacheEntryOpenCallback>() [nsCOMPtr.h:875d13e28b98 : 514 + 0x5]
13:08:42     INFO -      eip = 0x65caf1e5   esp = 0x0ec4fba4   ebp = 0x0ec4fbb8
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   8  xul.dll!nsTArray_Impl<mozilla::net::CacheEntry::Callback,nsTArrayInfallibleAllocator>::DestructRange(unsigned int,unsigned int) [nsTArray.h:875d13e28b98 : 1573 + 0x6]
13:08:42     INFO -      eip = 0x65cb67da   esp = 0x0ec4fbb0   ebp = 0x0ec4fbb8
13:08:42     INFO -      Found by: call frame info
13:08:42     INFO -   9  xul.dll!nsTArray_Impl<mozilla::net::CacheEntry::Callback,nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int,unsigned int) [nsTArray.h:875d13e28b98 : 1290 + 0x8]
We must release the callback on the same thread where AsyncOpen with it has been called.
Created attachment 8338956 [details] [diff] [review]
v1

This was a tricky one!  I realized that just releasing the callback where needed is not a 100% solution here.  And more problems appeared when finding the right and a simple as possible solution.

First, we must not purge an entry when it's in "progress of being accessed".  This access progress is bound to the fact, we want just a single instance of a CacheEntry related to a single storage/URI combination.  It means, we must not allow purging between call to CacheStorageService::AddStorageEntry (responsible for creating or finding an entry for a URI/storage and keeping such entry a single instance only) and invoking the callback it self.  Besides, when the entry is still referenced by some consumer, we must prevent it as well.  Simple reference counting doesn't work here, we store CacheEntry object in more then one hashtable, and the number of internal references is hard to track.

So, I decided to have a new counter (beside the standard ref counter) that counts consumers that keep the entry + counts the number of pending open operations.

It works like this:

Process of opening an entry: we call AddStorageEntry, that returns a handle that first increases the handle-count under the service lock, the handle gets to the storage that calls AsyncOpen on the entry.  The entry creates Callback record (put to the queue) that increases the handle-count again.  Then, we give a new handle to the entry to the consumer via OnAvail callback.  The consumer keeps the handle until it no longer needs it.  The handle-count is all this time > 0.

                       
cache service lock:     |-----|
opening reference:           |-----|
callback reference:               |-----|
consumer reference:                    |--...--|
total 'handle-count':  0000001111122111221...111000....

When we want to purge an entry, we check the handle-count value again under the service lock.  When non-null, we don't purge.  If null, we remove the entry.  When a new consumer wants it right away or a different thread it must wait for the service lock acquirement.  After that, the entry will be reloaded and single again.


The patch:

- renames and de-nests CacheEntry::Handle to a lonely class CacheEntryHandle
- introduced mHandleCount atomic counter on CacheEntry 
- CacheEntryHandle 'refs' this counter
- CacheEntry::Callback does so as well
- CacheEntryHandle is passed to OnCacheEntryAvailable also when only open for reading or as a ready entry to keep info on being referenced by all consumers
- CacheStorageService::AddStorageEntry returns CacheEntryHandle as well, needed to ++mHandleCount under the lock while manipulating/searching the entry hash table

This all ensures that a cache entry is not purged when still in use.  Thus, we will never have more then one entry to access the same content.

https://tbpl.mozilla.org/?tree=Try&rev=24df36c150ce
Attachment #8338956 - Flags: review?(michal.novotny)
Status: NEW → ASSIGNED
Summary: browser_bug666317.js | application crashed [@ nsHttpChannelAuthProvider::Release()] with cache2=on → Better CacheEntry purge prevention: browser_bug666317.js | application crashed [@ nsHttpChannelAuthProvider::Release()] with cache2=on
Comment on attachment 8338956 [details] [diff] [review]
v1

Review of attachment 8338956 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheEntry.cpp
@@ +808,5 @@
> +bool CacheEntry::IsReferenced() const
> +{
> +  CacheStorageService::Self()->Lock().AssertCurrentThreadOwns();
> +
> +  // No need to lock, since:

This comment is misleading since this code is actually executed under the lock.

@@ +810,5 @@
> +  CacheStorageService::Self()->Lock().AssertCurrentThreadOwns();
> +
> +  // No need to lock, since:
> +  // 1. increasing this counter from 0 to non-null happens only under
> +  //    the the service lock

It would be good to have AssertCurrentThreadOwns() at all places where the counter is increased.
Attachment #8338956 - Flags: review?(michal.novotny) → review+
(In reply to Michal Novotny (:michal) from comment #4)
> Comment on attachment 8338956 [details] [diff] [review]
> v1
> 
> Review of attachment 8338956 [details] [diff] [review]:

Thanks.

> This comment is misleading since this code is actually executed under the
> lock.

Yep.

> It would be good to have AssertCurrentThreadOwns() at all places where the
> counter is increased.

This is only needed in a place where this can get from 0 to 1 (which is at CacheStorageService::AddStorageEntry) since we only need to check for non-nullness of the counter.  Other places are not needed to sync, that is the magic of this mechanism :)  Note that the counter it self is atomic.

I'll add an assertion checking there is one of service lock being held or mRefCnt > 0.
Created attachment 8348954 [details] [diff] [review]
v1->v1.1 idiff

- added the assertion suggested in comment 5
- caught instantly a small issue with Recreate: we were throwing the handle away too soon and created later a new one, out of the lock
- changed to return the handle that was created in the service under the lock all the way up to caller of recreate()
Attachment #8338956 - Attachment is obsolete: true
Attachment #8348954 - Flags: review?(michal.novotny)
Created attachment 8348966 [details] [diff] [review]
v1.2

Forget the 1-1.1 idiff.  This is even better + fixes the bad comment.  I've added even more asserts to all other places we increase the handler counter to check it's already non-null.
Attachment #8348954 - Attachment is obsolete: true
Attachment #8348954 - Flags: review?(michal.novotny)
Attachment #8348966 - Flags: review?(michal.novotny)
Attachment #8348966 - Flags: review?(michal.novotny) → review+
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Needs a followup, I mistakenly landed the 1.1 version.

Try for cache2=on: https://tbpl.mozilla.org/?tree=Try&rev=cdeb2d7640b3
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.