Closed Bug 723577 Opened 12 years ago Closed 12 years ago

Make reading/writing most info in a cache entry not block on the rest of the cache

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: u408661, Assigned: u408661)

References

Details

Attachments

(1 file, 1 obsolete file)

We need to make sure we don't block on the cache i/o thread when we want to read data from a cache entry that we already have in memory.
Blocks: 722201
Summary: Make reading most info from a cache entry not block on the rest of the cache → Make reading/writing most info in a cache entry not block on the rest of the cache
Depends on: 723582
Attached patch patch (obsolete) — Splinter Review
Here's the first shot at a patch for this. It all looks good on try (https://tbpl.mozilla.org/?tree=Try&rev=76107749f267).
Attachment #599341 - Flags: review?(michal.novotny)
No longer depends on: 723582
Blocks: 723582
Per Josh's request, the general strategy for this bug is to make locking in the cache somewhat finer-grained than it currently is. Right now, we have one big lock that controls access to pretty much anything in the cache, including things that are completely independent of one another. For example, reading a single metadata item from a disk cache entry (when the metadata has already been read into a buffer in memory) can be blocked by the reading of the data for a completely unrelated cache entry! This patch gets most of the low-hanging fruit, moving things that are specific to either an nsCacheEntryDescriptor or an nsCacheEntry to a lock specific to the particular instance of the class that is being operated on. We still have the Big Cache Lock for other things, but this (anecdotally, at least) appears to be a good start.
Comment on attachment 599341 [details] [diff] [review]
patch

> -    PRUint32 LastFetched()                             { return mLastFetched; }
> +    PRUint32 LastFetched()                             { nsCacheEntryAutoLock lock(this); return mLastFetched; }

I think (correct me if I'm wrong) that this operation is atomic on all supported platforms and it isn't necessary to protect the access to the variable. So the only effect of the lock is that we don't return the value in the middle of some longer method (e.g. nsCacheEntry::RequestAccess()), but I think that we don't care about this in case of getters. I.e. we don't want to block nsCacheEntry::IsValid() on one thread while some other thread executes nsCacheEntry::RequestAccess() even if the valid flag might change.

This patch protects every method of the cache entry by a lock, but we IMO need to protect several accesses by a single lock at some places. For example in nsCacheService::DoomEntry_Internal(), all the operations on the entry should be protected by a single lock so that it cannot change its status in the meantime.
Attachment #599341 - Flags: review?(michal.novotny)
Attached patch patchSplinter Review
OK, let's give this one a shot. It adds some larger-scale locking for a few things, and uses a reentrant monitor for locking to avoid proliferation of "Foo_Locked" methods (I believe there was another, more technical, reason behind that change as well, but it's been a while since I made that change, so I can't remember for certain). Try run at https://tbpl.mozilla.org/?tree=Try&rev=5839d177d837 looks just fine to me.
Attachment #599341 - Attachment is obsolete: true
Attachment #611941 - Flags: review?(michal.novotny)
Michal - ping on the r? this is still on the snappy projects radar.
Comment on attachment 611941 [details] [diff] [review]
patch

>  void
> -nsCacheService::ReleaseObject_Locked(nsISupports * obj,
> -                                     nsIEventTarget * target)
> +nsCacheService::ReleaseObject(nsISupports * obj, nsIEventTarget * target)
>  {
> -    gService->mLock.AssertCurrentThreadOwns();
> -
>      bool isCur;
>      if (!target || (NS_SUCCEEDED(target->IsOnCurrentThread(&isCur)) && isCur)) {
> -        gService->mDoomedObjects.AppendElement(obj);
> +        obj->Release();
>      } else {
>          NS_ProxyRelease(target, obj);
>      }
>  }

I'm still looking at the other changes, so this isn't a final review. Anyway, this change is definitely wrong. The object must be released outside the cache lock, otherwise there is a risk that the cache service will be used by the caller again and the cache lock will be reentered.

By the way, the current code is also buggy since in very rare cases we can call nsCacheService::ReleaseObject_Locked() outside the cache lock (the first "delete request;" in nsCacheService::OpenCacheEntry()).
> This patch protects every method of the cache entry by a lock,
> but we IMO need to protect several accesses by a single lock
> at some places.

This is what I think too.

In particular, why does nsCacheEntryDescriptor have a seperate lock from nsCacheEntry? Don't the operations that hold the nsCacheEntryDescriptor lock need to hold the nsCacheEntry lock instead/too? Currently, each method of nsCacheEntryDescriptor can rely on the fact that each cache entry will not change at all during the execution of the method (at least, for the parts that are currently holding the cache service lock).

For example, consider this code:

nsresult
nsCacheEntryDescriptor::RequestDataSizeChange(PRInt32 deltaSize)
{
    nsCacheServiceAutoLock lock;
    if (!mCacheEntry)  return NS_ERROR_NOT_AVAILABLE;

    nsresult  rv;
    rv = nsCacheService::OnDataSizeChange(mCacheEntry, deltaSize);
    if (NS_SUCCEEDED(rv)) {
        // XXX review for signed/unsigned math errors
/*1*/   PRUint32  newDataSize = mCacheEntry->DataSize() + deltaSize;
/*2*/   mCacheEntry->SetDataSize(newDataSize);
        mCacheEntry->TouchData();
    }
    return rv;
}

Doesn't mCacheEntry need to be locked across mCacheEntry->DataSize() mCacheEntry->SetDataSize(newDataSize), so that SetDataSize() or other similar operations do not occur between the two lines?
Sorry, that was a bad example. Here's one from the patch:

NS_IMETHODIMP nsCacheEntryDescriptor::GetDataSize(PRUint32 *result)
 {
+    nsCacheEntryDescriptorAutoLock lock(this);
     NS_ENSURE_ARG_POINTER(result);
-    nsCacheServiceAutoLock lock;
     if (!mCacheEntry)  return NS_ERROR_NOT_AVAILABLE;
 
/*1*/ const char* val = mCacheEntry->GetMetaDataElement("uncompressed-len");
     if (!val) {
/*2*/    *result = mCacheEntry->DataSize();
     } else {
         *result = atol(val);
     }
 
     return NS_OK;
 }

It isn't clear why the cache entry (vs just the descriptor) does not need to be locked across line 1 and line 2.

NS_IMETHODIMP
 nsCacheEntryDescriptor::SetExpirationTime(PRUint32 expirationTime)
 {
-    nsCacheServiceAutoLock lock;
+    nsCacheEntryDescriptorAutoLock lock(this);
     if (!mCacheEntry)  return NS_ERROR_NOT_AVAILABLE;
 
/*1*/mCacheEntry->SetExpirationTime(expirationTime);
/*2*/mCacheEntry->MarkEntryDirty();
     return NS_OK;
 }

It seems to me that lines 1 & 2 should be done atomically (the cache entry lock should be held over both calls).

I did not look at the other parts of the patch in detail, but this is my general concern.

I did not investigate the possibility of deadlock as the result of acquiring the nsCacheEntryDescriptor lock, nsCacheEntry lock, and/or cache service lock out of order, because the required lock acquisition order isn't documented.
Closing this as WONTFIX since this won't get into fx14 as our original plan had... planned, now that m-c is approval required. Since the plan is to get bug 722034 in for fx15 (which makes the changes proposed in this bug moot), there's no point in spending development effort on this right now. If bug 722034 seems like more work than initially thought (and we DON'T think we'll get it in for fx15), we'll reopen this so we can get the short-term gain again.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Attachment #611941 - Flags: review?(michal.novotny)
You need to log in before you can comment on or make changes to this bug.