Last Comment Bug 513008 - Eliminate synchronous reads from cache (make cache reads asynchronous)
: Eliminate synchronous reads from cache (make cache reads asynchronous)
Status: RESOLVED FIXED
: dev-doc-complete
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: Trunk
: All All
: -- normal with 4 votes (vote)
: mozilla2.0b5
Assigned To: Michal Novotny (:michal)
:
:
Mentors:
Depends on: 556698 578939 589296 591046 594882
Blocks: 695399
  Show dependency treegraph
 
Reported: 2009-08-27 10:02 PDT by Vladimir Vukicevic [:vlad] [:vladv]
Modified: 2011-10-18 11:38 PDT (History)
39 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
patch v1 (26.48 KB, patch)
2010-01-06 07:34 PST, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch v2 - WIP (27.72 KB, patch)
2010-03-29 08:20 PDT, Michal Novotny (:michal)
jduell.mcbugs: review+
Details | Diff | Splinter Review
patch v3 (28.89 KB, patch)
2010-04-07 19:11 PDT, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch v4 (38.81 KB, patch)
2010-04-12 17:31 PDT, Michal Novotny (:michal)
jduell.mcbugs: review+
Details | Diff | Splinter Review
patch v5 (38.71 KB, patch)
2010-05-18 03:13 PDT, Michal Novotny (:michal)
shaver: superreview+
Details | Diff | Splinter Review
patch v6 (38.46 KB, patch)
2010-06-21 08:38 PDT, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch v7 (44.54 KB, patch)
2010-06-29 02:05 PDT, Michal Novotny (:michal)
jduell.mcbugs: review+
Details | Diff | Splinter Review
patch v8 (39.93 KB, patch)
2010-07-21 07:21 PDT, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
patch v10 (46.50 KB, patch)
2010-08-18 07:49 PDT, Michal Novotny (:michal)
no flags Details | Diff | Splinter Review
difference between v8 and v10 (7.17 KB, patch)
2010-08-19 11:13 PDT, Michal Novotny (:michal)
sdwilsh: review+
Details | Diff | Splinter Review

Description Vladimir Vukicevic [:vlad] [:vladv] 2009-08-27 10:02:12 PDT
Right now, the cache does reads and, more importantly, writes synchronously in its channel implementation.  While this does happen on an IO thread, for writes in particular, that sync write means that the write will block any future reads from the socket (and thus handing off data to the parsers) until we finish that writing.  This is really very very bad for mobile perf.

This is intended as a stopgap until we do bug 512849.
Comment 1 Michal Novotny (:michal) 2010-01-06 07:34:17 PST
Created attachment 420327 [details] [diff] [review]
patch v1

This patch tries to eliminate only blocking reads since the writes will be fixed in bug #513074. This version is not yet ready for review, but I would like to hear some opinions. Also can someone suggest a way to verify a performance gain?

It works so that there is a nsIInputStreamAsyncWrapper that converts any blocking input stream into async input stream. Reads are done on a background thread. Anyone who wants to read from the cache asynchronously will use this wrapper initialized with input stream obtained from cache entry descriptor (the patch currently does this only in nsHttpChannel).

What needs to be changed yet:
 - use one global thread or threadpool
 - implement nsISeekableStream?
Comment 2 Shawn Wilsher :sdwilsh 2010-01-08 15:08:54 PST
(In reply to comment #1)
> This patch tries to eliminate only blocking reads since the writes will be
> fixed in bug #513074. This version is not yet ready for review, but I would
> like to hear some opinions. Also can someone suggest a way to verify a
> performance gain?
It may be difficult to do that, unless you have a really slow hard drive.  A lot of this depends on a number of factors (disk fragmentation, disk cache, file system, disk load, etc), so getting good consistent numbers can be hard.

> It works so that there is a nsIInputStreamAsyncWrapper that converts any
> blocking input stream into async input stream. Reads are done on a background
> thread. Anyone who wants to read from the cache asynchronously will use this
> wrapper initialized with input stream obtained from cache entry descriptor (the
> patch currently does this only in nsHttpChannel).
What about using NS_AsyncCopy instead of creating a whole new stream wrapper?
Comment 3 Johnny Stenback (:jst, jst@mozilla.com) 2010-01-08 15:36:27 PST
Stuart et al, this could use some testing on some slow mobile devices where we know disk IO on the main thread is a big deal. The patch is not ready to land, but should apply and show whether it's worth spending time on getting this ready to land and so on.
Comment 4 Michal Novotny (:michal) 2010-01-10 18:00:31 PST
(In reply to comment #2)
> It may be difficult to do that, unless you have a really slow hard drive.  A
> lot of this depends on a number of factors (disk fragmentation, disk cache,
> file system, disk load, etc), so getting good consistent numbers can be hard.

I can imagine a testing environment, e.g. profile directory on filesystem located on remote nbd or iscsi, that is accessed via some throttled bandwidth (e.g. OpenVPN). But how should look the testing page and what exactly should I measure?

> What about using NS_AsyncCopy instead of creating a whole new stream wrapper?

Where exactly would you use the NS_AsyncCopy? The nsIInputStreamAsyncWrapper is really easy to use, see the tiny change in nsHttpChannel.cpp. Also NS_AsyncCopy copies always all content but I think that it is useful to have the possibility to read asynchronously only specified amount of data.
Comment 5 Shawn Wilsher :sdwilsh 2010-03-01 12:44:34 PST
(In reply to comment #4)
> Where exactly would you use the NS_AsyncCopy? The nsIInputStreamAsyncWrapper is
> really easy to use, see the tiny change in nsHttpChannel.cpp. Also NS_AsyncCopy
> copies always all content but I think that it is useful to have the possibility
> to read asynchronously only specified amount of data.
In what situations do we need to do that?  Having concrete data here would be good, otherwise I'm not sure the extra code is worthwhile.
Comment 6 Michal Novotny (:michal) 2010-03-17 19:17:08 PDT
(In reply to comment #5)
> In what situations do we need to do that?  Having concrete data here would be
> good, otherwise I'm not sure the extra code is worthwhile.

We don't need it right now. The bad is that while looking at where exactly to put the NS_AsyncCopy I found out that reading from the cache is already done on separate thread using NS_AsyncCopy. nsCacheEntryDescriptor::nsInputStreamWrapper is blocking therefore a transport that copies data asynchronously is created in nsInputStreamPump::AsyncRead.

In GDB I caught only this PR_Read on the main thread:

PR_Read
nsDiskCacheBlockFile::ReadBlocks
nsDiskCacheMap::ReadDiskCacheEntry
nsDiskCacheDevice::FindEntry
nsCacheService::SearchCacheDevices
nsCacheService::ActivateEntry
nsCacheService::ProcessRequest
nsCacheService::OpenCacheEntry

Could someone with DTrace check where are all the reads on the main thread called from?
Comment 7 Shawn Wilsher :sdwilsh 2010-03-17 21:23:35 PDT
(In reply to comment #6)
> Could someone with DTrace check where are all the reads on the main thread
> called from?
From http://files.shawnwilsher.com/2010/3/9/data.log:
operation:read (/Users/sdwilsh/Library/Caches/Firefox/Profiles/bsw4j032.io/Cache/_CACHE_001_)
              libSystem.B.dylib`read$UNIX2003+0xa
              XUL`nsDiskCacheBlockFile::ReadBlocks(void*, int, int, int*)+0xf0
              XUL`nsDiskCacheMap::ReadDiskCacheEntry(nsDiskCacheRecord*)+0x1d6
              XUL`nsDiskCacheDevice::FindEntry(nsCString*, int*)+0xc4
              XUL`nsCacheService::SearchCacheDevices(nsCString*, int, int*)+0xc7
              XUL`nsCacheService::ActivateEntry(nsCacheRequest*, nsCacheEntry**)+0xbd
              XUL`nsCacheService::ProcessRequest(nsCacheRequest*, int, nsICacheEntryDescriptor**)+0x55
              XUL`nsCacheService::OpenCacheEntry(nsCacheSession*, nsACString_internal const&, int, int, nsICacheListener*, nsICacheEntryDescriptor**)+0xc6
              XUL`nsCacheSession::OpenCacheEntry(nsACString_internal const&, int, int, nsICacheEntryDescriptor**)+0x35
              XUL`nsHttpChannel::OpenCacheEntry(int, int*)+0x4fb
              XUL`nsHttpChannel::Connect(int)+0x21e
              XUL`nsHttpChannel::AsyncOpen(nsIStreamListener*, nsISupports*)+0x360
Comment 8 Michal Novotny (:michal) 2010-03-23 18:19:58 PDT
Hmm, it isn't hard to do AsyncOpenCacheEntry() on a background thread but it isn't so easy with OpenCacheEntry() since it should succeed/fail synchronously. My idea is that we could post methods that do IO operations to the background thread and call NS_ProcessNextEvent() while waiting for the result. During that time we can't hold the cache lock on the main thread due to possible deadlock. I don't think we can safely release the lock anywhere in nsCacheService::ProcessRequest() so we should process whole this method on the background thread and implement the posting and event queue spinning in nsCacheService::OpenCacheEntry() (now I'm just talking about OpenCacheEntry() and ignoring that ProcessRequest() is called also from ProcessPendingRequests() which is called from DoomEntry_Internal(), CloseDescriptor() and ValidateEntry()). The problem is that in nsCacheService::OpenCacheEntry() we don't know if ProcessRequest() would do some IO and we must always post it to the background thread. So it is a question if there would be some performance gain.

We could also post only AsyncOpenCacheEntry() to the background thread and get rid of using OpenCacheEntry() in nsHttpChannel. This could be done probably easily in nsHttpChannel::OpenCacheEntry() but not in nsHttpChannel::OpenOfflineCacheEntryForWriting() and nsHttpChannel::MaybeInvalidateCacheEntryForSubsequentGet().

Any thoughts are appreciated.
Comment 9 Shawn Wilsher :sdwilsh 2010-03-23 19:11:35 PDT
(In reply to comment #8)
> Hmm, it isn't hard to do AsyncOpenCacheEntry() on a background thread but it
> isn't so easy with OpenCacheEntry() since it should succeed/fail synchronously.
Why does it need to fail synchronously?

> My idea is that we could post methods that do IO operations to the background
> thread and call NS_ProcessNextEvent() while waiting for the result. During that
> time we can't hold the cache lock on the main thread due to possible deadlock.
> I don't think we can safely release the lock anywhere in
> nsCacheService::ProcessRequest() so we should process whole this method on the
> background thread and implement the posting and event queue spinning in
> nsCacheService::OpenCacheEntry() (now I'm just talking about OpenCacheEntry()
> and ignoring that ProcessRequest() is called also from ProcessPendingRequests()
> which is called from DoomEntry_Internal(), CloseDescriptor() and
> ValidateEntry()). The problem is that in nsCacheService::OpenCacheEntry() we
> don't know if ProcessRequest() would do some IO and we must always post it to
> the background thread. So it is a question if there would be some performance
> gain.
It's not exactly safe to spin the event loop, so we should really try to avoid doing that.

> We could also post only AsyncOpenCacheEntry() to the background thread and get
> rid of using OpenCacheEntry() in nsHttpChannel. This could be done probably
> easily in nsHttpChannel::OpenCacheEntry() but not in
> nsHttpChannel::OpenOfflineCacheEntryForWriting() and
> nsHttpChannel::MaybeInvalidateCacheEntryForSubsequentGet().
Why can't we in the latter two?
Comment 10 Michal Novotny (:michal) 2010-03-24 09:39:55 PDT
(In reply to comment #9)
> Why does it need to fail synchronously?

Because it is how OpenCacheEntry() works.

> It's not exactly safe to spin the event loop, so we should really try to avoid
> doing that.

OK, so we need to use AsyncOpenCacheEntry() anywhere if possible.

> > rid of using OpenCacheEntry() in nsHttpChannel. This could be done probably
> > easily in nsHttpChannel::OpenCacheEntry() but not in
> > nsHttpChannel::OpenOfflineCacheEntryForWriting() and
> > nsHttpChannel::MaybeInvalidateCacheEntryForSubsequentGet().
> Why can't we in the latter two?

It seems to me that it will require quite a big change in the code, but I could be wrong. Anyway we should do it since we can't change behavior of OpenCacheEntry().
Comment 11 Shawn Wilsher :sdwilsh 2010-03-26 11:11:05 PDT
(In reply to comment #10)
> It seems to me that it will require quite a big change in the code, but I could
> be wrong. Anyway we should do it since we can't change behavior of
> OpenCacheEntry().
Agreed.  Do you have an estimate on when you might have a patch that does all this?
Comment 12 Michal Novotny (:michal) 2010-03-29 08:20:16 PDT
Created attachment 435632 [details] [diff] [review]
patch v2 - WIP

This patch eliminates now only calls to OpenCacheEntry in nsCacheService::OpenCacheEntry(). For now, please review only changes in nsHttpChannel and ignore changes in the cache part.
Comment 13 Jason Duell [:jduell] (needinfo me) 2010-04-01 22:47:58 PDT
Comment on attachment 435632 [details] [diff] [review]
patch v2 - WIP

Sorry to take so long to review this--I wasn't familiar with this code, and it's fairly deep spaghetti :)   

I think the patch looks good.  Despite adding more logic, it actually makes the code slightly easier to follow.  There are some nits, though, and at least one error that needs fixing.

I assume we've got decent test coverage of app caching in mochitests?  This code definitely needs to be extensively tested.  And I think biesi or bz should sr it.


>diff --git a/netwerk/protocol/http/src/nsHttpChannel.cpp 
>+        if (mLoadFlags & LOAD_BYPASS_LOCAL_CACHE_IF_BUSY) {
>+            // we must use synchronous open when LOAD_BYPASS_LOCAL_CACHE_IF_BUSY
>+            // was specified

shorter:
  // must use synchronous open for LOAD_BYPASS_LOCAL_CACHE_IF_BUSY


>+            rv = session->OpenCacheEntry(cacheKey,
>+                                         nsICache::ACCESS_READ, PR_FALSE,
>+                                         getter_AddRefs(mCacheEntry));
>+            if (NS_SUCCEEDED(rv)) {
>+                mCacheEntry->GetAccessGranted(&mCacheAccess);
>+                LOG(("nsHttpChannel::OpenCacheEntry [this=%p grantedAccess=%d]", this, mCacheAccess));
>+                return NS_OK;

I'm pretty sure you should set mLoadedFromApplicationCache = PR_TRUE here before returning--the code in trunk does it for the equivalent logic, and you do it for the async case, but you're missing it here.


>+            rv = session->AsyncOpenCacheEntry(cacheKey,
>+                                              nsICache::ACCESS_READ,
>+                                              this);
>+
>+            if (NS_SUCCEEDED(rv) || rv == NS_ERROR_CACHE_WAIT_FOR_VALIDATION) {
>+                *delayed = PR_TRUE;
>+                return NS_OK;
>             }

AFAICT you don't need to check for NS_ERROR_CACHE_WAIT_FOR_VALIDATION here--nsCacheSession::AsyncOpenCacheEntry converts it to NS_OK.


>+nsresult
>+nsHttpChannel::OnOfflineCacheEntryAvailable(nsICacheEntryDescriptor *aEntry,
>+                                            nsCacheAccessMode aAccess,
>+                                            nsresult aStatus,
>+                                            PRBool *aDelayed)
>+{

So this is called with aDelayed != null only if a sync request failed?  I'd like to see a comment explaining that (and an assert).  Otherwise it takes a while to figure out what's going on here.

More broadly: the whole "delayed" chain starts as a local variabe in Connect(), and is winding up running down quite a chain of function calls, where it ends up also meaning "this function is called synchronously".  Would it be better to just make it a member variable, say "mAsyncCacheOpen", and then have the function parameters be "bool sync"?  I think this would make the logic much clearer when you look at the function signatures.


>+    if (NS_SUCCEEDED(aStatus)) {
>+        // We successfully opened an offline cache session and the entry,
>+        // now indiciate we load from the offline cache.

edit comment: s/now indiciate we/so indicate we will/


>+    if (mCanceled && NS_FAILED(mStatus)) {
>+        LOG(("channel was canceled [this=%p status=%x]\n", this, mStatus));
>+        return mStatus;
>+    }
>+
>+    if (NS_SUCCEEDED(aStatus))
>+        // Called from OnCacheAvailable, advance to the next state
>+        return Connect(PR_FALSE);

Do you really want to drop down to Connect again if mCanceled, but mStatus is OK? (Is that possible?)

Is there any reason why the "return Connect" above shouldn't be part of the first if(NS_SUCCEEDED) block (the one with the indiciate typo)?

Also, in comment about "Called from OnCacheAvailable": s/OnCacheAvailable/onCacheEntryAvailable/


>+    if (!mCacheForOfflineUse && !mFallbackChannel) {

Would be nice to have a comment explaining what's going on here--i.e.  namespace search of app cache?


>+        if (aDelayed)
>+            NS_ENSURE_SUCCESS(rv, rv);
>+        else
>+            return Connect(PR_FALSE);

Either the "return Connect" here is wrong, or the next two are.  See the next few comments.

>+
>+        PRUint32 namespaceType = 0;
>+        if (!namespaceEntry ||
>+            NS_FAILED(namespaceEntry->GetItemType(&namespaceType)) ||
>+            (namespaceType &
>+             (nsIApplicationCacheNamespace::NAMESPACE_FALLBACK |
>+              nsIApplicationCacheNamespace::NAMESPACE_OPPORTUNISTIC |
>+              nsIApplicationCacheNamespace::NAMESPACE_BYPASS)) == 0) {
>+            // When loading from an application cache, only items
>+            // on the whitelist or matching a
>+            // fallback/opportunistic namespace should hit the
>+            // network...
>+            mLoadFlags |= LOAD_ONLY_FROM_CACHE;
>+
>+            // ... and if there were an application cache entry,
>+            // we would have found it earlier.
>+            if (aDelayed)
>+                return NS_ERROR_CACHE_KEY_NOT_FOUND;
>+            else
>+                return Connect(PR_FALSE);

We already called "return Connect" above if !aDelayed, so I don't see how we can get to the 2nd return Connect here.


>+        if (namespaceType &
>+            nsIApplicationCacheNamespace::NAMESPACE_FALLBACK) {
>+            rv = namespaceEntry->GetData(mFallbackKey);
>+            if (aDelayed)
>+                NS_ENSURE_SUCCESS(rv, rv);
>+            else
>+                return Connect(PR_FALSE);

Ditto for this 3rd "return Connect" as well.


>+    if (mLoadFlags & LOAD_BYPASS_LOCAL_CACHE_IF_BUSY) {
>+        if (!delayed) {
>+            // This is unexpected state. Delayed is valid iff called from
>+            // Connect(). We were called from OnCacheEntryAvailable() and that
>+            // shouldn't happen when LOAD_BYPASS_LOCAL_CACHE_IF_BUSY flag was
>+            // specifed unless somebody altered mLoadFlags between
>+            // OpenCacheEntry() and OnCacheEntryAvailable().

Shorter version:

            // Unexpected state: delayed==null iff we were called from
            // OnCacheEntryAvailable(), so LOAD_BYPASS_LOCAL_CACHE_IF_BUSY
            // shouldn't be set. Unless somebody altered mLoadFlags between
            // OpenCacheEntry() and OnCacheEntryAvailable()...

>+            NS_WARNING(
>+                "OpenNormalCacheEntry() called from OnCacheEntryAvailable() when"
>+                "LOAD_BYPASS_LOCAL_CACHE_IF_BUSY was specified");

Move "when" to next line to avoid 80+ character line


>+        // we must use synchronous open when LOAD_BYPASS_LOCAL_CACHE_IF_BUSY
>+        // was specified

Shorter:
  // must use synchronous open for LOAD_BYPASS_LOCAL_CACHE_IF_BUSY

>+    else {
>+        mOnCacheEntryAvailableCallback = &nsHttpChannel::OnNormalCacheEntryAvailable;
>         rv = session->AsyncOpenCacheEntry(cacheKey, accessRequested, this);
>-        if (NS_FAILED(rv)) return rv;
>-        // we'll have to wait for the cache entry
>-        *delayed = PR_TRUE;
>-    }
>-    else if (NS_SUCCEEDED(rv)) {
>-        mCacheEntry->GetAccessGranted(&mCacheAccess);
>-        LOG(("nsHttpChannel::OpenCacheEntry [this=%p grantedAccess=%d]", this, mCacheAccess));
>-    }
>+        if (NS_SUCCEEDED(rv) || rv == NS_ERROR_CACHE_WAIT_FOR_VALIDATION) {

Again, I think the check for NS_ERROR_CACHE_WAIT_FOR_VALIDATION isn't needed.


>+    if (!delayed)
>+        // Called from OnCacheAvailable, advance to the next state
>+        rv = Connect(PR_FALSE);

In comment about "Called from OnCacheAvailable": s/OnCacheAvailable/onCacheEntryAvailable

 
>+nsresult
>+nsHttpChannel::OnNormalCacheEntryAvailable(nsICacheEntryDescriptor *aEntry,
>+                                           nsCacheAccessMode aAccess,
>+                                           nsresult aStatus,
>+                                           PRBool *aDelayed)
>+{
>+    NS_ASSERTION(!aDelayed, "aDelayed should be null");
>+
>+    if (NS_SUCCEEDED(aStatus)) {
>+        mCacheEntry = aEntry;
>+        mCacheAccess = aAccess;
>+    }
>+
>+    if (mCanceled && NS_FAILED(mStatus)) {
>+        LOG(("channel was canceled [this=%p status=%x]\n", this, mStatus));
>+        return mStatus;
>+    }
>+
>+    if ((mLoadFlags & LOAD_ONLY_FROM_CACHE) && NS_FAILED(aStatus))
>+        // if this channel is only allowed to pull from the cache, then
>+        // we must fail if we were unable to open a cache entry.
>+        return NS_ERROR_DOCUMENT_NOT_CACHED;

I'd prefer

    if (NS_FAILED(aStatus) {
        if (mCanceled) {
            LOG(("channel was canceled [this=%p status=%x]\n", this, mStatus));
            return mStatus;
        } 
        if ((mLoadFlags & LOAD_ONLY_FROM_CACHE) && NS_FAILED(aStatus)) {
            // if this channel is only allowed to pull from the cache, then
            // we must fail if we were unable to open a cache entry.
            return NS_ERROR_DOCUMENT_NOT_CACHED;
        }
    }

Good work!
Comment 14 Brendan Eich [:brendan] 2010-04-01 23:16:17 PDT
(In reply to comment #13)
> >+{
> >+    NS_ASSERTION(!aDelayed, "aDelayed should be null");
> >+
> >+    if (NS_SUCCEEDED(aStatus)) {
> >+        mCacheEntry = aEntry;
> >+        mCacheAccess = aAccess;
> >+    }
> >+
> >+    if (mCanceled && NS_FAILED(mStatus)) {

Note mStatus test here.

> >+        LOG(("channel was canceled [this=%p status=%x]\n", this, mStatus));
> >+        return mStatus;
> >+    }
> >+
> >+    if ((mLoadFlags & LOAD_ONLY_FROM_CACHE) && NS_FAILED(aStatus))

Note aStatus test here.

> >+        // if this channel is only allowed to pull from the cache, then
> >+        // we must fail if we were unable to open a cache entry.
> >+        return NS_ERROR_DOCUMENT_NOT_CACHED;
> 
> I'd prefer
> 
>     if (NS_FAILED(aStatus) {
>         if (mCanceled) {
>             LOG(("channel was canceled [this=%p status=%x]\n", this, mStatus));
>             return mStatus;
>         } 
>         if ((mLoadFlags & LOAD_ONLY_FROM_CACHE) && NS_FAILED(aStatus)) {
>             // if this channel is only allowed to pull from the cache, then
>             // we must fail if we were unable to open a cache entry.
>             return NS_ERROR_DOCUMENT_NOT_CACHED;
>         }
>     }

Since mStatus is not necessarily aStatus, this is not equivalent -- or so it seems from my drive-by point of view. Hope this helps,

/be
Comment 15 Jason Duell [:jduell] (needinfo me) 2010-04-02 11:33:37 PDT
nice catch--I didn't notice the aStatus/mStatus difference.
Comment 16 Johnny Stenback (:jst, jst@mozilla.com) 2010-04-02 15:32:04 PDT
Let's get this done for 1.9.3.
Comment 17 Michal Novotny (:michal) 2010-04-07 19:11:32 PDT
Created attachment 437740 [details] [diff] [review]
patch v3

(In reply to comment #13)
> (From update of attachment 435632 [details] [diff] [review])
> Sorry to take so long to review this--I wasn't familiar with this code, and
> it's fairly deep spaghetti :)

Actually that was one of the quickest reviews I got :)


> I assume we've got decent test coverage of app caching in mochitests?  This
> code definitely needs to be extensively tested.

I need to check if the tests exist. If not, I'll create them.


> >+nsresult
> >+nsHttpChannel::OnOfflineCacheEntryAvailable(nsICacheEntryDescriptor *aEntry,
> >+                                            nsCacheAccessMode aAccess,
> >+                                            nsresult aStatus,
> >+                                            PRBool *aDelayed)
> >+{
> 
> So this is called with aDelayed != null only if a sync request failed?  I'd
> like to see a comment explaining that (and an assert).  Otherwise it takes a
> while to figure out what's going on here.
> 
> More broadly: the whole "delayed" chain starts as a local variabe in Connect(),
> and is winding up running down quite a chain of function calls, where it ends
> up also meaning "this function is called synchronously".  Would it be better to
> just make it a member variable, say "mAsyncCacheOpen", and then have the
> function parameters be "bool sync"?  I think this would make the logic much
> clearer when you look at the function signatures.

aDelayed is always non-null when called from Connect(), so in this case when sync or async request failed in OpenCacheEntry(). What should be the condition for the assertion? Anyway I've changed it to suggested mAsyncCacheOpen and aSync.


> >+    if (mCanceled && NS_FAILED(mStatus)) {
> >+        LOG(("channel was canceled [this=%p status=%x]\n", this, mStatus));
> >+        return mStatus;
> >+    }
> >+
> >+    if (NS_SUCCEEDED(aStatus))
> >+        // Called from OnCacheAvailable, advance to the next state
> >+        return Connect(PR_FALSE);
> 
> Do you really want to drop down to Connect again if mCanceled, but mStatus is
> OK? (Is that possible?)

In theory it could happen if somebody calls Cancel() with a success value (though we don't do this internally). I'm not sure that this is correct, but AFAICS this is the same behavior as in the old code.


> Is there any reason why the "return Connect" above shouldn't be part of the
> first if(NS_SUCCEEDED) block (the one with the indiciate typo)?

Because of the mCanceled check.


> >+    if (!mCacheForOfflineUse && !mFallbackChannel) {
> 
> Would be nice to have a comment explaining what's going on here--i.e. 
> namespace search of app cache?

I've just took the old code and tried to keep the logic with async opening. I'm afraid I don't know the appCache part enough to describe it exactly.


> Either the "return Connect" here is wrong, or the next two are.  See the next
> few comments.
> 
...
> 
> Ditto for this 3rd "return Connect" as well.

You're right! These 2 statements were wrong.


> > I'd prefer
> > 
> >     if (NS_FAILED(aStatus) {
> >         if (mCanceled) {
> >             LOG(("channel was canceled [this=%p status=%x]\n", this, mStatus));
> >             return mStatus;
> >         } 
> >         if ((mLoadFlags & LOAD_ONLY_FROM_CACHE) && NS_FAILED(aStatus)) {
> >             // if this channel is only allowed to pull from the cache, then
> >             // we must fail if we were unable to open a cache entry.
> >             return NS_ERROR_DOCUMENT_NOT_CACHED;
> >         }
> >     }
>
> nice catch--I didn't notice the aStatus/mStatus difference.

I agree that the names are easy to confuse. I've kept my code but I've renamed aStatus to aResult in OnOfflineCacheEntryAvailable() and OnNormalCacheEntryAvailable().
Comment 18 Michal Novotny (:michal) 2010-04-12 17:31:07 PDT
Created attachment 438628 [details] [diff] [review]
patch v4

Cache IO thread was added to nsCacheService and mCacheWriteThread was removed from nsHttpHandler.
Comment 19 Shawn Wilsher :sdwilsh 2010-05-03 12:18:41 PDT
Jason - do you know when you might be able to get to this review?
Comment 20 Jason Duell [:jduell] (needinfo me) 2010-05-14 01:05:02 PDT
Comment on attachment 438628 [details] [diff] [review]
patch v4

Looks good.  Only a couple of things.

You mentioned that you'd either find or write tests for this code.  There's no new test in the patch--did you find any existing tests?   I'm giving you a +r so this can move to the sr stage, but we'll need test coverage for as much of the many codepaths this touches before we can check this in. 


>+nsresult
>+nsHttpChannel::OnOfflineCacheEntryAvailable(nsICacheEntryDescriptor *aEntry,
>+                                            nsCacheAccessMode aAccess,
>+                                            nsresult aResult,
>+                                            PRBool aSync)

'aResult' is a terrible name.  "result" is generally understood to be an "out" parameter.  I understand you want to distinguish it from "mStatus":  I suggest you call it something like "aEntryStatus".

I don't really like 'aSync' either--it seems possible to parse it as meaning 'async', when it really means 'sync.'  But I can live with it.


>+        // Check for namespace match.
>+        nsCOMPtr<nsIApplicationCacheNamespace> namespaceEntry;
>+        rv = mApplicationCache->GetMatchingNamespace
>+            (cacheKey, getter_AddRefs(namespaceEntry));
>+        NS_ENSURE_SUCCESS(rv, aSync ? rv : Connect(PR_FALSE));

So you're now using the NS_ENSURE_SUCCESS line above, when you used to have

            if (aDelayed)
                NS_ENSURE_SUCCESS(rv, rv);
            else
                return Connect(PR_FALSE);      

This is not equivalent.  Previously you would always "return Connect", but now you'll only return if Connect does not return a success code.  Is that OK?  I'm guessing it's not.

>+        if (namespaceType &
>+            nsIApplicationCacheNamespace::NAMESPACE_FALLBACK) {
>+            rv = namespaceEntry->GetData(mFallbackKey);
>+            NS_ENSURE_SUCCESS(rv, aSync ? rv : Connect(PR_FALSE));
>+        }

Same issue with NS_ENSURE_SUCCESS call.


>-    if (!gHttpHandler->mCacheWriteThread ||
>-         NS_FAILED(rv) ||
>-         policy == nsICache::STORE_ON_DISK_AS_FILE) {
>-        LOG(("nsHttpChannel::InstallCacheListener sync tee %p\n", tee.get()));
>+    if (NS_FAILED(rv) || policy == nsICache::STORE_ON_DISK_AS_FILE) {
>+        LOG(("nsHttpChannel::InstallCacheListener sync tee %p rv=%x policy=%d",
>+            tee.get(), rv, policy));
>         rv = tee->Init(mListener, out, nsnull);
>     } else {
>-        LOG(("nsHttpChannel::InstallCacheListener async tee %p\n",
>+        nsCOMPtr<nsICacheService> serv =
>+            do_GetService(NS_CACHESERVICE_CONTRACTID, &rv);
>+        NS_ENSURE_SUCCESS(rv, rv);
>+
>+        nsCOMPtr<nsIEventTarget> cacheIOTarget;
>+        rv = serv->GetCacheIOTarget(getter_AddRefs(cacheIOTarget));
>+
>+        if (NS_FAILED(rv) || !cacheIOTarget) {
>+            LOG(("nsHttpChannel::InstallCacheListener sync tee %p because of "
>+                "missing cache IO target)", tee.get()));
>+            rv = tee->Init(mListener, out, nsnull);
>+        } else {
>+            LOG(("nsHttpChannel::InstallCacheListener async tee %p",
>                 tee.get()));
>-        rv = tee->InitAsync(mListener, gHttpHandler->mCacheWriteThread, out, nsnull);
>-    }
>-   
>+            rv = tee->InitAsync(mListener, cacheIOTarget, out, nsnull);
>+        }
>+    }
>+

So you've replaced

    if (missing cache thread || other_conditions)
        setup sync tee
    else
        setup async tee

with 

    if (other_conditions)
        setup sync tee
    else
        if (missing cache thread)
            setup sync tee
        else 
            setup async tee

Any reason why you couldn't keep the same, simpler branch structure?  It looks like you'd just need to move the check for nsICacheService's cacheIOTarget before the first if, right?

If there's a performance reason (i.e. common case is other_conditions == true), or if you want to distinguish these cases in the LOG calls (as you now do), then keep it.  But all things equal, it's nice to have shorter, simpler code.
Comment 21 Jason Duell [:jduell] (needinfo me) 2010-05-14 01:08:25 PDT
biesi, do you know how well our existing xpcshell/mochitests test our caching logic?   You're the most likely candidate to sr this, by the way, so if you can take a look at whether all the codepaths touched by this are already tested, or whether some need additional tests written, that would help move things along.
Comment 22 Shawn Wilsher :sdwilsh 2010-05-14 09:46:23 PDT
(In reply to comment #20)
> I don't really like 'aSync' either--it seems possible to parse it as meaning
> 'async', when it really means 'sync.'  But I can live with it.
aIsSync?
Comment 23 Johnny Stenback (:jst, jst@mozilla.com) 2010-05-14 11:06:36 PDT
(In reply to comment #20)
[...]
> >+        nsCOMPtr<nsIApplicationCacheNamespace> namespaceEntry;
> >+        rv = mApplicationCache->GetMatchingNamespace
> >+            (cacheKey, getter_AddRefs(namespaceEntry));
> >+        NS_ENSURE_SUCCESS(rv, aSync ? rv : Connect(PR_FALSE));
> 
> So you're now using the NS_ENSURE_SUCCESS line above, when you used to have
> 
>             if (aDelayed)
>                 NS_ENSURE_SUCCESS(rv, rv);
>             else
>                 return Connect(PR_FALSE);      
> 
> This is not equivalent.  Previously you would always "return Connect", but now
> you'll only return if Connect does not return a success code.  Is that OK?  I'm
> guessing it's not.

Random comment here w/o actually understanding what this code really does. I agree that the old and new are not equivalent, but the new code does always return if (!aDelayed). NS_ENSURE_SUCCESS() always returns if the first code is an error code, so the new code returns rv if aSync is true, or the return value of Connect() if it's false, whether Connect() returns success or error.
Comment 24 Jason Duell [:jduell] (needinfo me) 2010-05-14 14:18:49 PDT
Actually, I think jst's comment is wrong, too.  The new code returns iff GetMatchingNamespace fails.  If it succeeds, we used to return Connect if aSync, but we won't any more.

I also don't understand exactly what's going on here.  It seems weird to get the cacheKey, and then, even if it succeeds, call Connect unconditionally (the previous patch's logic) without keeping the cacheKey around (why bother looking it up then?).  So maybe the change in the new version is intentional.  Michal?
Comment 25 Michal Novotny (:michal) 2010-05-17 17:41:19 PDT
(In reply to comment #24)
> I also don't understand exactly what's going on here.  It seems weird to get
> the cacheKey, and then, even if it succeeds, call Connect unconditionally (the
> previous patch's logic) without keeping the cacheKey around (why bother looking
> it up then?).  So maybe the change in the new version is intentional.  Michal?

Yes, the change is intentional. You've pointed out the bad 'return Connect(...)' in comment #13 and I agreed that it was wrong. The original code was:

// Check for namespace match.
nsCOMPtr<nsIApplicationCacheNamespace> namespaceEntry;
rv = mApplicationCache->GetMatchingNamespace
    (cacheKey, getter_AddRefs(namespaceEntry));
NS_ENSURE_SUCCESS(rv, rv);

This code was called always synchronously from Connect, so if GetMatchingNamespace failed then returning an error code led to continuing without the cache entry.

With this patch the code can be called from Connect or from OnCacheEntryAvailable. In case it is called from Connect (aIsSync is true) we want to do the same as before - just throw an error. In case it is called from OnCacheEntryAvailable we need to call Connect(PR_FALSE) to start proceeding the request without the cache entry. Anyway in both cases we want to return from OnOfflineCacheEntryAvailable when GetMatchingNamespace fails.
Comment 26 Jason Duell [:jduell] (needinfo me) 2010-05-17 21:59:10 PDT
Comment on attachment 438628 [details] [diff] [review]
patch v4

OK, this is ready for sr.

Biesi, it would be great to know which of the codepaths this touches are tested by our existing xpcshell/mochi tests, and which we might want to write new tests for.
Comment 27 Michal Novotny (:michal) 2010-05-18 03:13:53 PDT
Created attachment 445931 [details] [diff] [review]
patch v5

(In reply to comment #20)
> You mentioned that you'd either find or write tests for this code.  There's no
> new test in the patch--did you find any existing tests?   I'm giving you a +r
> so this can move to the sr stage, but we'll need test coverage for as much of
> the many codepaths this touches before we can check this in. 

I haven't looked at this yet. I suppose that some new tests will be needed. I'll create a separate patch...


> 'aResult' is a terrible name.  "result" is generally understood to be an "out"
> parameter.  I understand you want to distinguish it from "mStatus":  I suggest
> you call it something like "aEntryStatus".

Fixed.


> I don't really like 'aSync' either--it seems possible to parse it as meaning
> 'async', when it really means 'sync.'  But I can live with it.

I've renamed it to aIsSync.


> So you've replaced
> 
>     if (missing cache thread || other_conditions)
>         setup sync tee
>     else
>         setup async tee
> 
> with 
> 
>     if (other_conditions)
>         setup sync tee
>     else
>         if (missing cache thread)
>             setup sync tee
>         else 
>             setup async tee
> 
> Any reason why you couldn't keep the same, simpler branch structure?  It looks
> like you'd just need to move the check for nsICacheService's cacheIOTarget
> before the first if, right?
> 
> If there's a performance reason (i.e. common case is other_conditions == true),
> or if you want to distinguish these cases in the LOG calls (as you now do),
> then keep it.  But all things equal, it's nice to have shorter, simpler code.

Yes, that was an attempt at performance optimization, but I've changed it because other_condition is false probably in most cases.
Comment 28 Shawn Wilsher :sdwilsh 2010-06-08 13:18:35 PDT
Can somebody else possibly do this sr if biesi cannot get to it soon?  We have ten days until the beta freeze, and this is blocking b1.
Comment 29 :Ehsan Akhgari 2010-06-10 08:26:49 PDT
Comment on attachment 445931 [details] [diff] [review]
patch v5

bz is the only one besides biesi who can sr netwerk patches.
Comment 30 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-06-10 08:36:07 PDT
Any sr can sr.  Does this affect interfaces?  Want to make sure we know what we're sring for.
Comment 31 :Ehsan Akhgari 2010-06-10 08:44:58 PDT
(In reply to comment #30)
> Any sr can sr.

Huh.  I think this page disagrees: <http://www.mozilla.org/hacking/reviewers.html>

> Does this affect interfaces?  Want to make sure we know what
> we're sring for.

AFAICT, the only thing in this patch which needs sr is the addition of the cacheIOTarget attribute to nsICacheService.
Comment 32 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-06-10 08:51:47 PDT
"Super-review does not require domain expertise (module owners and peers supply that, usually), so the areas below are not pigeon-holes -- you can solicit a super-review from any reviewer on the list, but using area as a guide will get quicker results in the typical case."

sr coming right up.
Comment 33 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-06-10 08:54:35 PDT
Comment on attachment 445931 [details] [diff] [review]
patch v5

cacheIOTarget would like a better comment, perhaps describing what events are dispatched (or a URL to MDC on which it's explained). "which is used for I/O operations" is unusefully vague.

Let's get the patch in, but make sure to document that piece better on MDC at the least.

sr=shaver
Comment 34 Christian :Biesinger (don't email me, ping me on IRC) 2010-06-11 08:49:24 PDT
Comment on attachment 445931 [details] [diff] [review]
patch v5

see comments below.

+++ b/netwerk/cache/src/nsCacheService.cpp
 #endif
-
 nsCacheService *   nsCacheService::gService = nsnull;

why'd you remove this empty line?

+    nsresult rv = NS_NewThread(getter_AddRefs(mCacheIOThread));
+    if (NS_FAILED(rv)) {
+        NS_WARNING("Can't create cache IO thread");

why not return rv; after this? this shouldn't happen normally.

+NS_IMETHODIMP nsCacheService::GetCacheIOTarget(nsIEventTarget * *aCacheIOTarget)
+{
+    NS_ENSURE_ARG_POINTER(aCacheIOTarget);

no need to nullcheck out params


+++ b/netwerk/protocol/http/src/nsHttpChannel.cpp

Why did you make mAsyncCacheOpen a member variable?

        NS_ENSURE_SUCCESS(rv, aIsSync ? rv : Connect(PR_FALSE));

This is kinda hard to read, why not:
  if (NS_FAILED(rv) && !aIsSync)
    return Connect(PR_FALSE);
  NS_ENSURE_SUCCESS(rv, rv);




+++ b/netwerk/protocol/http/src/nsHttpChannel.h

+    typedef nsresult (nsHttpChannel:: *nsOnCacheEntryAvailableCallback)(nsICacheEntryDescriptor *, nsCacheAccessMode, nsresult, PRBool);

80 columns please
Comment 35 Jason Duell [:jduell] (needinfo me) 2010-06-11 16:10:03 PDT
> Why did you make mAsyncCacheOpen a member variable?

See comment 13.  I might be wrong, but it seemed better than passing an extra arg through a long set of function calls.
Comment 36 Johnny Stenback (:jst, jst@mozilla.com) 2010-06-18 15:33:40 PDT
Michal, can you provide an updated patch here for someone to land ASAP, we'd love to have this in beta1, which is just a few days away now...
Comment 37 Michal Novotny (:michal) 2010-06-21 08:38:55 PDT
Created attachment 452729 [details] [diff] [review]
patch v6

(In reply to comment #34)
> +++ b/netwerk/cache/src/nsCacheService.cpp
>  #endif
> -
>  nsCacheService *   nsCacheService::gService = nsnull;
> 
> why'd you remove this empty line?

I've probably added some code that I removed later... Fixed.

> +    nsresult rv = NS_NewThread(getter_AddRefs(mCacheIOThread));
> +    if (NS_FAILED(rv)) {
> +        NS_WARNING("Can't create cache IO thread");
> 
> why not return rv; after this? this shouldn't happen normally.

Fixed.

> +NS_IMETHODIMP nsCacheService::GetCacheIOTarget(nsIEventTarget *
> *aCacheIOTarget)
> +{
> +    NS_ENSURE_ARG_POINTER(aCacheIOTarget);
> 
> no need to nullcheck out params

Fixed.

> +++ b/netwerk/protocol/http/src/nsHttpChannel.cpp
> 
> Why did you make mAsyncCacheOpen a member variable?

The first patch added a new meaning to the parameter delayed. It was null when the call was from OnCacheEntryAvailable. I've split it into aIsSync and mAsyncCacheOpen. As Jason already wrote, this seems to be a little easier to read.
 
>         NS_ENSURE_SUCCESS(rv, aIsSync ? rv : Connect(PR_FALSE));
> 
> This is kinda hard to read, why not:
>   if (NS_FAILED(rv) && !aIsSync)
>     return Connect(PR_FALSE);
>   NS_ENSURE_SUCCESS(rv, rv);

Fixed

> +++ b/netwerk/protocol/http/src/nsHttpChannel.h
> 
> +    typedef nsresult (nsHttpChannel::
> *nsOnCacheEntryAvailableCallback)(nsICacheEntryDescriptor *, nsCacheAccessMode,
> nsresult, PRBool);
> 
> 80 columns please

Fixed
Comment 38 Mike Beltzner [:beltzner, not reading bugmail] 2010-06-21 16:16:57 PDT
--> beta2+ but jst said he might land it tonight
Comment 39 Shawn Wilsher :sdwilsh 2010-06-21 16:29:32 PDT
If this is ready to land, I will land it tomorrow morning.  Just let me know if I need to do that.
Comment 40 Johnny Stenback (:jst, jst@mozilla.com) 2010-06-21 16:31:05 PDT
sdwilsh, if I don't manage to land this tonight then please do land tomorrow morning if you have a chance to!
Comment 41 :Ehsan Akhgari 2010-06-21 23:26:55 PDT
This was landed <http://hg.mozilla.org/mozilla-central/rev/b560649a36f1> and then backed out <http://hg.mozilla.org/mozilla-central/rev/e6fcd886df50> due to test failures.
Comment 42 Michal Novotny (:michal) 2010-06-29 02:05:06 PDT
Created attachment 454810 [details] [diff] [review]
patch v7

This patch fixes following test failures:

1) mochitest /tests/dom/tests/mochitest/ajax/offline/test_bypass.html

The same check for (mLoadFlags & LOAD_ONLY_FROM_CACHE) as after OpenCacheEntry() is needed in case Connect() is called again. Otherwise we would hit the net in case we shouldn't...


2) xpcshelltest content/unit/test_error_codes.js

Devices are created on the cache IO thread because of lazy initialization. The assertion occurs because StorageService is initialized off the main thread (see backtrace below). I've disabled lazy initialization for offline device. Does anybody have a better idea? Calling simply do_GetService("@mozilla.org/storage/service;1") in nsCacheService::Init() would also fix the problem but this would be IMHO ugly hack.

#3  0x0021a014 in mozalloc_abort (msg=
    0xf5dfe5ec "###!!! ASSERTION: Using observer service off the main thread!: 'Error', file /opt/moz/hg/src/xpcom/ds/nsObserverService.cpp, line 128")
    at /opt/moz/hg/src/memory/mozalloc/mozalloc_abort.cpp:75
#4  0x0229aaee in Abort (aMsg=0xf5dfe5ec "###!!! ASSERTION: Using observer service off the main thread!: 'Error', file /opt/moz/hg/src/xpcom/ds/nsObserverService.cpp, line 128")
    at /opt/moz/hg/src/xpcom/base/nsDebugImpl.cpp:379
#5  0x0229aab1 in NS_DebugBreak_P (aSeverity=1, aStr=0x2929290 "Using observer service off the main thread!", aExpr=0x2929287 "Error", aFile=
    0x2929188 "/opt/moz/hg/src/xpcom/ds/nsObserverService.cpp", aLine=128) at /opt/moz/hg/src/xpcom/base/nsDebugImpl.cpp:366
#6  0x0223ee7c in nsObserverService::AddObserver (this=0x8a30ce8, anObserver=0xf5200be4, aTopic=0x286de48 "xpcom-shutdown", ownsWeak=0)
    at /opt/moz/hg/src/xpcom/ds/nsObserverService.cpp:128
#7  0x01f19143 in mozilla::storage::Service::initialize (this=0xf5200be0) at /opt/moz/hg/src/storage/src/mozStorageService.cpp:189
#8  0x01f18cec in mozilla::storage::Service::getSingleton () at /opt/moz/hg/src/storage/src/mozStorageService.cpp:114
#9  0x01f18512 in mozilla::storage::ServiceConstructor (aOuter=0x0, aIID=..., aResult=0xf5dfecc0) at /opt/moz/hg/src/storage/build/mozStorageModule.cpp:53
#10 0x0222b110 in nsGenericFactory::CreateInstance (this=0xf5200bc8, aOuter=0x0, aIID=..., aResult=0xf5dfecc0) at nsGenericFactory.cpp:80
#11 0x022805d8 in nsComponentManagerImpl::CreateInstanceByContractID (this=0x893ef88, aContractID=0x257bde0 "@mozilla.org/storage/service;1", aDelegate=0x0, aIID=..., aResult=
    0xf5dfecc0) at /opt/moz/hg/src/xpcom/components/nsComponentManager.cpp:1725
#12 0x02281961 in nsComponentManagerImpl::GetServiceByContractID (this=0x893ef88, aContractID=0x257bde0 "@mozilla.org/storage/service;1", aIID=..., result=0xf5dfedac)
    at /opt/moz/hg/src/xpcom/components/nsComponentManager.cpp:2310
#13 0x0221e536 in CallGetService (aContractID=0x257bde0 "@mozilla.org/storage/service;1", aIID=..., aResult=0xf5dfedac) at nsComponentManagerUtils.cpp:94
#14 0x0221ea6a in nsGetServiceByContractIDWithError::operator() (this=0xf5dfeedc, aIID=..., aInstancePtr=0xf5dfedac) at nsComponentManagerUtils.cpp:288
#15 0x0103b726 in nsCOMPtr<mozIStorageService>::assign_from_gs_contractid_with_error (this=0xf5dfeec0, gs=..., aIID=...) at ../../dist/include/nsCOMPtr.h:1239
#16 0x0103a759 in nsCOMPtr<mozIStorageService>::nsCOMPtr (this=0xf5dfeec0, gs=...) at ../../dist/include/nsCOMPtr.h:612
#17 0x01031837 in nsOfflineCacheDevice::Init (this=0xf5200780) at /opt/moz/hg/src/netwerk/cache/nsDiskCacheDeviceSQL.cpp:980
#18 0x0101a73a in nsCacheService::CreateOfflineDevice (this=0x8fd51f8) at /opt/moz/hg/src/netwerk/cache/nsCacheService.cpp:1067 


3) xpcshelltest browser/components/places/tests/test_txnGUIDs.js

Cache IO thread is shutdown outside the cache service lock to prevent a deadlock.
Comment 43 Shawn Wilsher :sdwilsh 2010-06-29 09:49:59 PDT
(In reply to comment #42)
> Devices are created on the cache IO thread because of lazy initialization. The
> assertion occurs because StorageService is initialized off the main thread (see
> backtrace below). I've disabled lazy initialization for offline device. Does
> anybody have a better idea? Calling simply
> do_GetService("@mozilla.org/storage/service;1") in nsCacheService::Init() would
> also fix the problem but this would be IMHO ugly hack.
File a bug on storage.  This isn't supposed to be illegal, and it looks like I regressed it.  I can get that fixed for you pretty quickly.
Comment 44 Jason Duell [:jduell] (needinfo me) 2010-07-20 17:13:42 PDT
Comment on attachment 454810 [details] [diff] [review]
patch v7

Looks good.   Patch has bitrotted slightly.  

It sounds like sdwilsh thinks the lazy initialization for offline device should be kept--did we file/fix a bug on that?
Comment 45 Michal Novotny (:michal) 2010-07-21 07:18:22 PDT
> It sounds like sdwilsh thinks the lazy initialization for offline device should
> be kept--did we file/fix a bug on that?

Yes, the bug number is #578939.
Comment 46 Michal Novotny (:michal) 2010-07-21 07:21:52 PDT
Created attachment 459002 [details] [diff] [review]
patch v8

Lazy initialization of the offline device is kept in this patch.
Comment 47 Johnny Stenback (:jst, jst@mozilla.com) 2010-07-29 16:30:27 PDT
Pushed to mozilla-central now that bug 578939 is fixed.

http://hg.mozilla.org/mozilla-central/rev/1e8961782824
Comment 48 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-07-29 20:21:35 PDT
http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=1ac135c8a484

Backed out due to
s: talos-r3-fed64-015
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-xpcshell/build/xpcshell/tests/content/unit/test_error_codes.js | test failed (with xpcshell return code: -6), see following log:
Bug 560197 - [xpcshell-tests] Intermittent timeout after test_csputils.js, when running test_error_codes.js PROCESS-CRASH | /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-xpcshell/build/xpcshell/tests/content/unit/test_error_codes.js | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-xpcshell/build/xpcshell/tests/test_necko/unit/test_fallback_no-cache-entry_passing.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-xpcshell/build/xpcshell/tests/test_necko/unit/head_channels.js | Could not get contentLength - See following stack:
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-xpcshell/build/xpcshell/tests/test_necko/unit/head_channels.js | Error in onStartRequest: 2147500036 - See following stack:
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-xpcshell/build/xpcshell/tests/test_necko/unit/head_channels.js | Failed to load URL: 804b0046 - See following stack:
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/mozilla-central-fedora64-debug-u-xpcshell/build/xpcshell/tests/test_necko/unit/head_channels.js | Error in onStopRequest: 2147500036 - See following stack:

all over Xo and Xd.
Comment 49 Michal Novotny (:michal) 2010-07-30 06:44:38 PDT
Bug 578939 doesn't seem to fix the problem with storage initialization. I'll have a look at it during the weekend.
Comment 50 Michal Novotny (:michal) 2010-08-04 17:35:35 PDT
There are 2 failing tests:

1) test_fallback_no-cache-entry_passing.js
I'm still investigating this.

2) test_error_codes.js
There is an assertion because nsOfflineCacheDevice is destroyed on a different thread than it was created and it isn't declared as threadsafe. If I do so (I think that the code is threadsafe) then I got assertion from mozstorage that the database must be closed on the same thread as it was opened (see below). Shawn, do you think that this can be fixed in the mozstorage code?

###!!! ASSERTION: Must close the database on the thread that you opened it with!: 'Error', file /opt/moz/hg/src/storage/src/mozStorageConnection.cpp, line 543

(gdb) bt 20
#0  0x00a96832 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00254a81 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x0025634a in abort () at abort.c:92
#3  0x0075e014 in mozalloc_abort (msg=
    0xffa9488c "###!!! ASSERTION: Must close the database on the thread that you opened it with!: 'Error', file /opt/moz/hg/src/storage/src/mozStorageConnection.cpp, line 543") at /opt/moz/hg/src/memory/mozalloc/mozalloc_abort.cpp:75
#4  0x0248115e in Abort (aMsg=
    0xffa9488c "###!!! ASSERTION: Must close the database on the thread that you opened it with!: 'Error', file /opt/moz/hg/src/storage/src/mozStorageConnection.cpp, line 543") at /opt/moz/hg/src/xpcom/base/nsDebugImpl.cpp:379
#5  0x02481121 in NS_DebugBreak_P (aSeverity=1, aStr=0x2ac0eec "Must close the database on the thread that you opened it with!", aExpr=0x2ac0ee6 "Error", 
    aFile=0x2ac0a50 "/opt/moz/hg/src/storage/src/mozStorageConnection.cpp", aLine=543) at /opt/moz/hg/src/xpcom/base/nsDebugImpl.cpp:366
#6  0x02065e95 in mozilla::storage::Connection::setClosedState (this=0xf5e00c38) at /opt/moz/hg/src/storage/src/mozStorageConnection.cpp:543
#7  0x02066341 in mozilla::storage::Connection::Close (this=0xf5e00c38) at /opt/moz/hg/src/storage/src/mozStorageConnection.cpp:630
#8  0x020651ee in mozilla::storage::Connection::~Connection (this=0xf5e00c38, __in_chrg=<value optimized out>)
    at /opt/moz/hg/src/storage/src/mozStorageConnection.cpp:326
#9  0x020653b1 in mozilla::storage::Connection::Release (this=0xf5e00c38) at /opt/moz/hg/src/storage/src/mozStorageConnection.cpp:329
#10 0x0206474d in nsRefPtr<mozilla::storage::Connection>::~nsRefPtr (this=0xf5e02b7c, __in_chrg=<value optimized out>)
    at ../../dist/include/nsAutoPtr.h:969
#11 0x0206d453 in mozilla::storage::StorageBaseStatementInternal::~StorageBaseStatementInternal (this=0xf5e02b78, __in_chrg=<value optimized out>)
    at /opt/moz/hg/src/storage/src/StorageBaseStatementInternal.h:75
#12 0x0206a638 in mozilla::storage::Statement::~Statement (this=0xf5e02b70, __in_chrg=<value optimized out>)
    at /opt/moz/hg/src/storage/src/mozStorageStatement.cpp:282
#13 0x0206a7a7 in mozilla::storage::Statement::Release (this=0xf5e02b70) at /opt/moz/hg/src/storage/src/mozStorageStatement.cpp:288
#14 0x010f6ca4 in nsCOMPtr<mozIStorageStatement>::~nsCOMPtr (this=0xf5e00820, __in_chrg=<value optimized out>) at ../../dist/include/nsCOMPtr.h:533
#15 0x010ecee4 in nsOfflineCacheDevice::~nsOfflineCacheDevice (this=0xf5e00808, __in_chrg=<value optimized out>)
    at /opt/moz/hg/src/netwerk/cache/nsDiskCacheDeviceSQL.cpp:787
#16 0x010ecf49 in nsOfflineCacheDevice::~nsOfflineCacheDevice (this=0xf5e00808, __in_chrg=<value optimized out>)
    at /opt/moz/hg/src/netwerk/cache/nsDiskCacheDeviceSQL.cpp:787
#17 0x010ecaa4 in nsOfflineCacheDevice::Release (this=0xf5e00808) at /opt/moz/hg/src/netwerk/cache/nsDiskCacheDeviceSQL.cpp:775
#18 0x010d6834 in nsCacheService::Shutdown (this=0x9f30eb0) at /opt/moz/hg/src/netwerk/cache/nsCacheService.cpp:778
#19 0x010d5118 in nsCacheProfilePrefObserver::Observe (this=0x9f326e0, subject=0x9cb3f14, topic=0x2b9023c "xpcom-shutdown", data_unicode=0x0)
    at /opt/moz/hg/src/netwerk/cache/nsCacheService.cpp:262
Comment 51 Shawn Wilsher :sdwilsh 2010-08-04 17:51:34 PDT
(In reply to comment #50)
> Shawn, do you think that this can be fixed in the mozstorage code?
> 
> ###!!! ASSERTION: Must close the database on the thread that you opened it
> with!: 'Error', file /opt/moz/hg/src/storage/src/mozStorageConnection.cpp, line
> 543
No, that's an assertion that we have because we need that invariant to hold true.
Comment 52 Michal Novotny (:michal) 2010-08-18 07:49:03 PDT
Created attachment 467015 [details] [diff] [review]
patch v10

- nsOfflineCacheDevice releases DB on the same thread as it was created
- fixed handling of fallback channel in nsHttpChannel::OnCacheEntryAvailable

This patch seems to pass try server. There were few failures that seem to be unrelated to this patch and I couldn't reproduce them.
Comment 53 Michal Novotny (:michal) 2010-08-19 11:13:50 PDT
Created attachment 467461 [details] [diff] [review]
difference between v8 and v10

Shawn, could you review the database closing in the new patch?
Comment 54 Dão Gottwald [:dao] 2010-08-20 05:57:44 PDT
This is still waiting for review, right? Removing checkin-needed.
Comment 55 Shawn Wilsher :sdwilsh 2010-08-20 13:39:19 PDT
Comment on attachment 467461 [details] [diff] [review]
difference between v8 and v10

Your indentation appears to be weird (could be the interdiff), but otherwise looks fine.  r=sdwilsh
Comment 56 Johnny Stenback (:jst, jst@mozilla.com) 2010-08-21 11:41:16 PDT
Landed as:

http://hg.mozilla.org/mozilla-central/rev/d0b284052d29
Comment 57 Johnny Stenback (:jst, jst@mozilla.com) 2010-08-21 21:54:31 PDT
Looks like this stuck this time! Marking FIXED.
Comment 58 Mounir Lamouri (:mounir) 2010-08-22 15:11:11 PDT
I backed out the patch because it seems that it was the cause of a talos regression (tp4). On Linux-x86-65 opt, results without the patch are around 327 and with the patch around 337.

http://hg.mozilla.org/mozilla-central/rev/643758385088
Comment 59 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-08-23 07:54:54 PDT
(In reply to comment #58)
> I backed out the patch because it seems that it was the cause of a talos
> regression (tp4). On Linux-x86-65 opt, results without the patch are around 327
> and with the patch around 337.
> 
> http://hg.mozilla.org/mozilla-central/rev/643758385088

Did we see Talos numbers go back down after backing this out?
Comment 60 Mounir Lamouri (:mounir) 2010-08-23 13:46:54 PDT
(In reply to comment #59)
> (In reply to comment #58)
> > I backed out the patch because it seems that it was the cause of a talos
> > regression (tp4). On Linux-x86-65 opt, results without the patch are around 327
> > and with the patch around 337.
> > 
> > http://hg.mozilla.org/mozilla-central/rev/643758385088
> 
> Did we see Talos numbers go back down after backing this out?

Talos tp4 on Linux 64bits is back to 327. However, there were no notifications in tree-management but I have no idea how these notifications should behave...
Comment 61 Johnny Stenback (:jst, jst@mozilla.com) 2010-08-23 15:59:39 PDT
So I've been looking more at the data from the talos runs that we did while this patch was in the tree and here's what I see:

Windows: flat
Linux: ~3% regression
Mac: ~8% gain

Given that I think we're worse off w/o this patch so I'm voting for landing this again. Michal says he will do some investigation with callgrind here to see if we're missing anyting obvious, but I think we should land that independent of what Michal finds (which he won't have time to do until later in the week).

If anyone disagrees please speak up, if not I'll add this to the landing queue later today and get this back in the tree.

Here's some graph links for reference, this change landed at 11 am on the 21st and got backed out at 8pm on the 22nd.

Windows: http://graphs.mozilla.org/graph.html#tests=[[38,1,351],[38,1,352],[38,1,353],[38,1,354],[38,1,355],[38,1,356],[38,1,357],[38,1,358],[38,1,359],[38,1,360],[38,1,361],[38,1,362],[38,1,363],[38,1,364],[38,1,365],[38,1,366],[38,1,367],[38,1,368],[38,1,369],[38,1,370],[38,1,483],[38,1,484],[38,1,485],[38,1,486],[38,1,487],[38,1,488],[38,1,489],[38,1,490],[38,1,491],[38,1,492],[38,1,493],[38,1,494],[38,1,495],[38,1,496],[38,1,497],[38,1,498],[38,1,499],[38,1,500],[38,1,501],[38,1,648],[38,1,649],[38,1,650],[38,1,651],[38,1,652],[38,1,653],[38,1,654],[38,1,655],[38,1,656],[38,1,657],[38,1,883],[38,1,884],[38,1,885]]&sel=1282129268,1282573826

Linux: http://graphs.mozilla.org/graph.html#tests=[[38,1,431],[38,1,432],[38,1,433],[38,1,434],[38,1,435],[38,1,436],[38,1,437],[38,1,438],[38,1,439],[38,1,440],[38,1,441],[38,1,442],[38,1,443],[38,1,444],[38,1,445],[38,1,446],[38,1,447],[38,1,448],[38,1,449],[38,1,450],[38,1,688],[38,1,689],[38,1,690],[38,1,691],[38,1,692],[38,1,693],[38,1,694],[38,1,695],[38,1,696],[38,1,697],[38,1,698],[38,1,699],[38,1,700],[38,1,701],[38,1,702],[38,1,703],[38,1,704],[38,1,705],[38,1,706],[38,1,707],[38,1,708],[38,1,709],[38,1,710],[38,1,711],[38,1,712],[38,1,713],[38,1,714],[38,1,715],[38,1,716],[38,1,717],[38,1,895],[38,1,896],[38,1,897],[38,1,898],[38,1,899]]&sel=1282129268,1282573826

Mac: http://graphs.mozilla.org/graph.html#tests=[[38,1,530],[38,1,531],[38,1,532],[38,1,533],[38,1,534],[38,1,535],[38,1,536],[38,1,537],[38,1,538],[38,1,539],[38,1,540],[38,1,576],[38,1,577],[38,1,578],[38,1,579],[38,1,580],[38,1,581],[38,1,582],[38,1,718],[38,1,719],[38,1,720],[38,1,721],[38,1,722],[38,1,723],[38,1,724],[38,1,725],[38,1,726],[38,1,727],[38,1,728],[38,1,729],[38,1,730],[38,1,731],[38,1,732],[38,1,733],[38,1,734],[38,1,735],[38,1,736],[38,1,737],[38,1,738],[38,1,739],[38,1,740],[38,1,741],[38,1,742],[38,1,743],[38,1,744],[38,1,745],[38,1,746],[38,1,747],[38,1,900],[38,1,901],[38,1,902],[38,1,903],[38,1,904]]&sel=1282129268,1282573826
Comment 62 Brian Carpenter [:geeknik] 2010-08-23 16:03:57 PDT
I'm of the opinion that it is best to land something like this regardless of the regressions as long as it doesn't break anything else and work on finding/fixing the regression after it has landed. I think the positives outweigh the negatives here. =)
Comment 63 Mounir Lamouri (:mounir) 2010-08-23 18:08:01 PDT
Relanded:
http://hg.mozilla.org/mozilla-central/rev/72d2863f43c7
Comment 64 Mounir Lamouri (:mounir) 2010-08-23 23:24:11 PDT
Seems like everything's green.
Comment 65 Honza Bambas (:mayhemer) 2010-08-27 08:36:10 PDT
Because this patch is suspected to cause following top crash:

http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=nsHttpChannel%3A%3AContinueProcessResponse%28unsigned%20int%29&version=Firefox%3A4.0b5pre

I would like to take it for few days off the trunk and see of this crash disappears.  Everybody agree?
Comment 66 Benjamin Smedberg [:bsmedberg] 2010-08-31 07:42:02 PDT
Please back this out, it caused the very common test_twostreams failure also, and there's definitely logic errors there which need to be fixed.
Comment 67 Michal Novotny (:michal) 2010-08-31 08:03:52 PDT
I'd prefer to back out only the HTTP part of the patch. If there is any problem it should be there.
Comment 68 Honza Bambas (:mayhemer) 2010-09-01 07:05:57 PDT
(In reply to comment #67)
> I'd prefer to back out only the HTTP part of the patch. If there is any problem
> it should be there.

Good idea, probably no need to back the rest out.
Comment 69 dwitte@gmail.com 2010-09-02 13:49:41 PDT
Not sure if this is related, but any idea what this crash is about? Seen on a try run recently, after this patch landed.

http://tinderbox.mozilla.org/showlog.cgi?tree=MozillaTry&errorparser=unittest&logfile=1283410607.1283430259.8701.gz&buildtime=1283410607&buildname=WINNT%205.2%20tryserver%20leak%20test%20build&fulltext=1
Comment 70 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-09-10 17:32:18 PDT
So...did anyone back out this problematic patch?  "Prefer to back out only X" is nice and all, but the responsible thing to do is back out the thing that broke, so it doesn't interfere with others and then land parts of it if they test correctly.
Comment 71 Johnny Stenback (:jst, jst@mozilla.com) 2010-09-10 23:56:13 PDT
Yes, this did get backed out in bug 589296, but yes, only partially. Last I spoke with Honza he still wanted to check a few things before confirming in bug 589296 that the crash really is gone (looks good so far AFAIK). This also lead to bug 594882, which now has the likely reason for the crash that was introduced with this patch.
Comment 72 Honza Bambas (:mayhemer) 2010-09-11 05:42:14 PDT
Johnny is correct.  The crash is gone after backing out the nsHttpChannel part of this bug's patch.

Fix + re-land will be tracked in bug 594882.
Comment 73 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-09-11 18:36:48 PDT
The backout appears to be
http://hg.mozilla.org/mozilla-central/rev/a93c9118f16d
Comment 74 Johnny Stenback (:jst, jst@mozilla.com) 2010-09-13 20:43:55 PDT
Backout landed on GECKO20b5_20100831_RELBRANCH

http://hg.mozilla.org/mozilla-central/rev/11e16ebb2d38
Comment 75 Eric Shepherd [:sheppy] 2010-11-18 07:20:28 PST
Other than the addition of the cacheIOTarget attribute on nsICacheService, is there anything here that needs noting for developers? After poring over the code for a while, there doesn't really seem to be anything here to document.
Comment 76 Michal Novotny (:michal) 2010-11-23 07:32:43 PST
(In reply to comment #75)
> Other than the addition of the cacheIOTarget attribute on nsICacheService, is
> there anything here that needs noting for developers?

Maybe somewhere could be noted that using asyncOpenCacheEntry() is preferred to openCacheEntry() since it doesn't generate IO on the calling thread.

Note You need to log in before you can comment on or make changes to this bug.