Closed Bug 103851 Opened 23 years ago Closed 21 years ago

move Cache deletion to another thread so it doesn't block the UI

Categories

(Core :: Networking: Cache, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.4alpha

People

(Reporter: johann.petrak, Assigned: gordon)

References

Details

(Keywords: helpwanted)

Attachments

(5 files)

Using 2001100321/linux but have seen this for a long time now on linux: 
After Mozilla crashes or gets killed, starting it new will take a very long
time. There is a lot of disk activity before the mozilla window finally appears.
Sometimes there is a lot of disk activity when mozilla crashes and it takes a 
long time until it finishes termination, but this is not because a coredump
is being written. Maybe it has to do with the cache?
Reporter: 
How many RAM do you have ?
If mozilla crashed the OS will clear all the memory.
Maybe the OS clears the swap file ?
The system has 512MB RAM. I am pretty sure it isnt the swap file but rather the
cache because of this: I am running mozilla on a different machine than my
display, the swap is on the remote machine, but the cache is on my local machine.
When I start mozilla after a crash or kill, I can hear the disk activity, so
it has to be either the cache or my .mozilla, which is also stored locally.
Note that usually this activity appears when starting again after the killing
or crash, at that time there is no memory to clear. I just wanted to mention 
that *sometimes* this does occur when a crash occurs too (if it does occur,
again I can hear the disk activity so it cannot be the swap file).
The simplest way to try this out is: use Ctrl-C to terminate mozilla
in the terminal where it was started, then restart it again.
*** Bug 109828 has been marked as a duplicate of this bug. ***
Mozilla clears the whole cache after a crash.
Can you try it with a lower cache setting ?
-> Cache (?)
Assignee: asa → gordon
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking: Cache
Ever confirmed: true
QA Contact: doronr → tever
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
I'm sure the delay is due to the clearing of the disk cache.  We detect a crash
has occurred, leaving the disk cache in an unknown state, and we need to delete
it before we can start over.

We don't have any immediate plans to do anything about this, but I'll leave this
bug open for now to discuss it in the performance meeting and gather feedback.
I'll raise this issue at the performance meeting next week, but we won't be 
landing a fix for this by 0.9.7.  Retargeting.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Target Milestone: mozilla0.9.8 → mozilla1.0
Blocks: 122597
I'd like to try moving the deletion of the cache directory to another thread,
and possibly even delay it until after startup, so we don't block the UI or
severely impact startup time.
Priority: -- → P1
Keywords: nsbeta1
per adt, not critical for nsbeta1. hence minus.
Keywords: nsbeta1nsbeta1-
*** Bug 135063 has been marked as a duplicate of this bug. ***
This happens on OS/2 as well.  The larger your Disk Cache, the longer it takes
to start up.  In my dual Athlon 1.2GHz system, a 50000KB cache takes about a
minute to clear.

I don't understand why this happens.  None of the other web browsers do this. 
If Mozilla wants to clear the whole cache, then why doesn't it just delete the
whole cache directory or something?  On my system, deleting 50MB of files takes
a few seconds if you do it the right way.

One idea is to create a new cache directory upon startup, and then use a
background thread to delete the old one.
Gordon,
I really think this issue needs to be re-prioritized.  Having to wait several
minutes for the browser to start on a 2Ghz/1GB-RAM computer seems very wrong.
I have found that I can get mozilla up quicker by doing two or three rounds of
starting mozilla and then killing the processes. Typically by the third launch,
mozilla starts.  This seems to work far quicker than waiting for mozilla to try
and find its way up.
It definitely seems that if mozilla is clearing the whole cache, this could be
done much quicker.
Definitely agree with your comments about at least delaying this til after startup.
(this bug still present in 0.9.x-rc1)
gordon the code is in nsDiskCacheDevice::InitializeCacheDirectory(), right?

ok. this bug is for moving deletion to another thread.
but while we're moving it, can someone from xpcom-file (hi dougt) tell us if 
cache is using the correct apis to delete a tree?
Summary: After crashing or getting killed the restart is very slow → move Cache deletion to another thread so it doesn't block the UI
Target Milestone: mozilla1.0 → ---
Since bug 81724 was fixed, we're using a lot less files in the cache. As a
side-effect, clearing the cache (manually, or automatically after a crash) is
now much faster. It used to be a big problem on the Mac (bug 123157), but the
speed is now so much improved that it's not a problem for me anymore. Clearing a
20MB cache on a Mac HFS-disk is now done in 10 seconds or so.
i don't believe locking up the user's machine for 10 seconds w/out telling them
why is acceptible. we need a UI.
i agree that this bug is neither invalid nor fixed. for whatever reason, it
seems to hit hardest on laptops rather than desktops from my experience. on a G3
PowerBook, Moz (or Moz-based) churns after any crash for much longer than 10
seconds. this continues even with 1.1. strangely, i don't notice any ill effects
on a G4 tower. but a crash on the laptop, which runs smoothly in all other
scenarios, makes me want to switch browsers instead of launching Moz again.
*** Bug 123157 has been marked as a duplicate of this bug. ***
Mike, how about if we don't lock up the UI.  I think that would be even better
than trying to explain why you can't use your computer.

Not sure I can get this in by Tuesday night, but I'll try.
FWIW, this bug also affects Chimera 0.5 in a major way. again, it seems to
happen most often on a PowerBook rather than for PowerMac towers.

might i suggest a change of platform, also, since one of my bugs was duped to
this, and i have encountered it on a variety of Macs?
OS: Linux → All
Hardware: PC → All
Target Milestone: --- → mozilla1.3alpha
This bug is completely solved in the trunk, and has become almost invalid since
Mozilla 1.2a. On an iBook 300 MHz (with a very slow hard disk) I've never seen a
hang of larger than 10 seconds, it's often 1 or 2 seconds. Very insignificant
compared to the rest of the startup. We might still need an indication why we
hang, but that's bug 122597.

But Chimera is still based on Mozilla 1.0.1 - that's why the bug is still
visible there, especially on slow hard disks (like those in portables). When (if
?) Chimera will be merged with the trunk again, it's automatically solved. 
well, if there's a patch checked in that fixes it, i'd say resolve it as fixed.
i'm not sure if "has become almost invalid since Mozilla 1.2a" means that this
is fixed by code or if you're just saying that a fast computer with decent hard
drive is unaffected. if the latter is the case, it should stay open until it's
actually fixed.
I will try recheck on the original system which I rarely use at the moment.

This problem was especially visible in configurations where the cache is on a
slow disk or on a disk mounted over the network. The symptoms seem to indicate
that each file in the cache gets deleted or invalidated individually instead of
simply 
removing all cashe files in one step.

If this is the case, the problam will also depend on the size of the cache set
in the preferences bythe user - a huge cache size will probably lead to very
long processing here.

Re comment #20: how has it been solved? What exactly has been changed? In which
bug is this documented?
I do not believe that bug 81724 will solve this problem, since the cache will at
some point grow to limit size. Note that this problem will not be visible if you
frequently clear the cache or do things that make Mozilla clear the cache or if
you have a small cache size setting from the start.
Target Milestone: mozilla1.3alpha → mozilla1.4alpha
Here's my first cut at moving the cache deletion to another thread.
Attachment #115452 - Flags: superreview?(darin)
Attachment #115452 - Flags: review?(bryner)
Context diff pretty please?
Comment on attachment 115452 [details] [diff] [review]
patch 1 - move cache deletion to separate thread.

function params should be aFoo instead of foo

>  nsDiskCacheBlockFile::Close(PRBool flush)
>-    nsresult rv  = FlushBitMap();
>+    nsresult rv;
>+    if (flush)
>+        rv  = FlushBitMap();
rv is uninitialized unless flush *bad*	 
>     if (NS_SUCCEEDED(rv) && (err != PR_SUCCESS))

>+nsDiskCacheDevice::Shutdown_Private(PRBool  flush)
>+{
>+        if (Initialized()) {
>         // check cache limits in case we need to evict.
afaik your patch wasn't -w, so you need to reindent this block

>+    nsCOMArray<nsIFile> * deleteList = new nsCOMArray<nsIFile>;
that's scary.. oh well

+#if 1
#ifdef DEBUG_<something>
...
+	     printf("## deleting: %s\n", path.get());
...
+#endif
Attachment #115452 - Flags: superreview?(darin)
Attachment #115452 - Flags: review?(bryner)
Attachment #115452 - Flags: review-
timeless: aFoo vs. foo in necko implementation code doesn't matter, especially
not when the function is so small.  no one needs the "a" prefix to remind them
of what they can easily see in the function signature.
Still testing edge cases, but looking for feedback.
Comment on attachment 115593 [details] [diff] [review]
patch 2 - fixed a couple bugs, added diff -u

>Index: nsDiskCacheBlockFile.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/cache/src/nsDiskCacheBlockFile.cpp,v
>retrieving revision 1.5
>diff -u -r1.5 nsDiskCacheBlockFile.cpp
>--- nsDiskCacheBlockFile.cpp	9 Oct 2002 03:00:09 -0000	1.5
>+++ nsDiskCacheBlockFile.cpp	26 Feb 2003 03:20:30 -0000
>@@ -103,11 +103,14 @@
>  *  Close
>  *****************************************************************************/
> nsresult
>-nsDiskCacheBlockFile::Close()
>+nsDiskCacheBlockFile::Close(PRBool flush)
> {
>     if (!mFD)  return NS_OK;
>-    
>-    nsresult rv  = FlushBitMap();
>+    nsresult rv;
>+
>+    if (flush)
>+        rv  = FlushBitMap();
>+

As timeless noted, won't rv be uninitialized here if flush is false?

The patch looks good to me in general.	I'm not really familiar with the disk
cache code so I could have missed something.  I'd advise cleaning up the patch
a bit (#if 0 code, and maybe resolve some of those XXX comments) before
landing.
Comment on attachment 115593 [details] [diff] [review]
patch 2 - fixed a couple bugs, added diff -u

>Index: nsDiskCacheBinding.cpp

PR_STATIC_CALLBACK(PLDHashOperator)

>+PLDHashOperator PR_CALLBACK
>+ActiveBinding(PLDHashTable *    table,

>+            ((PRBool *)arg) = PR_TRUE;

you meant to dereference this pointer first, right?


>Index: nsDiskCacheBlockFile.cpp

>     // write the bit map and flush the file
>+    // XXX except we would take a severe performance hit
>     // XXX rv = FlushBitMap();

if we leave this unimplemented, what is the cost/impact?


>Index: nsDiskCacheDevice.cpp

>+// "Cache.Trash directory is a sibling of the "Cache" directory

  "Cache.Trash"


more comments later.. i haven't reviewed much beyond this point.
Attachment #115593 - Flags: review-
I'm still waiting for additional comments from Darin.
Comment on attachment 115703 [details] [diff] [review]
patch 3 - incorporating comments received so far

>Index: nsDiskCacheBinding.cpp

>-                NS_ASSERTION(binding->mGeneration != p->mGeneration,  "generations collide!");
>+                NS_WARNING("### disk cache: generations collide!");

speaking from my own experience it's much nicer to be able to jump into
the debugger when one of these is hit.	i'd use NS_ERROR instead for this
reason.  only use NS_WARNING when it's not something you'd want to fix!


>Index: nsDiskCacheDevice.cpp

> nsresult
> nsDiskCacheDevice::Create(nsCacheDevice **result)
> {
>+    // not implemented because nsCacheService has to set attributes
>+    // on the device before Init() can be called.
>+    NS_NOTREACHED("Create");
>+    return NS_ERROR_NOT_IMPLEMENTED;
> }

maybe you should just remove this function then.


>+    NS_ASSERTION(mCacheMap == nsnull, "### leaking mCacheMap");

more assertion nits ;-)

### prefix really isn't necessary.  you'll get plenty of output
when an assertion is hit without this.


>+        nsCOMArray<nsIFile> * trashList;

too bad nsCOMArray doesn't have a shallow copy method.	then you
wouldn't be forced to heap allocate the nsCOMArray container object.


>+nsDiskCacheDevice::ClearDiskCache()
>+{
...
>+    nsCOMArray<nsIFile> * deleteList = new nsCOMArray<nsIFile>;
>+    if (!deleteList)  return NS_ERROR_OUT_OF_MEMORY;
>+
>+    rv = Shutdown_Private(PR_FALSE);  // false = don't bother flushing
>+    if (NS_FAILED(rv))  return rv;

failure case causes |deleteList| to leak.  more below...


>+#ifdef DEBUG
>+    printf("## nsDiskCacheDevice: now running DeleteDirectory()\n");
>+#endif

maybe now is a good time to start using PR_LOG in the cache?  you might want
to copy the LOG(args) macro from, for example, nsInputStreamPump.cpp


>+nsDiskCacheDevice::DeleteFiles(nsCOMArray<nsIFile> * fileList)
>+{
...
>+    if (!thread)  return NS_ERROR_UNEXPECTED;
>+    return NS_OK;
>+}

in the error case who deletes |fileList| ??  the caller of this
function certainly seems to fire and forget.


>+nsDiskCacheDevice::ListTrashContents(nsCOMArray<nsIFile> ** result)
>+{
...
>+    nsCOMArray<nsIFile> * array = new nsCOMArray<nsIFile>;
>+    if (!array)  return NS_ERROR_OUT_OF_MEMORY;

we really need a smart ptr class :(


>+    if (NS_FAILED(rv) || !dirEntries)  goto error_exit; // !dirEntries returns NS_OK
>+
>+    PRBool  more;

some compilers will give a build error if you place a goto statement that
steps over the declaration of a stack variable.  maybe MSVC or certain
versions of GCC?


>+nsDiskCacheDevice::MoveCacheToTrash(nsIFile ** result)
> {
...
>+    if (result)
>+        NS_ADDREF(*result = uniqueDir);

can you just assume |result| != NULL?  maybe add a debug only assertion??


otherwise, this looks great.  r=darin if you fix the leaks.
Attachment #115703 - Flags: review-
Attachment #115955 - Flags: superreview?(darin)
Attachment #115955 - Flags: review?(bryner)
Comment on attachment 115955 [details] [diff] [review]
patch 4 - changes for Darin's latest comments

>Index: nsCache.cpp

>+//#include "nsXPIDLString.h"
>+//#include "nsPrintfCString.h"

please just delete these lines before checking in.

sr=darin
Attachment #115955 - Flags: superreview?(darin) → superreview+
Comment on attachment 115955 [details] [diff] [review]
patch 4 - changes for Darin's latest comments

>@@ -580,11 +561,14 @@
>     if (!binding->mDoomed) {
>         // so it can't be seen by FindEntry() ever again.
>         nsresult rv = mCacheMap->DoomRecord(&binding->mRecord);
>-        NS_ASSERTION(NS_SUCCEEDED(rv), "DoomRecord failed.");
>+        if (NS_FAILED(rv)) {
>+            NS_WARNING("DoomRecord failed.");
>+        }
>         binding->mDoomed = PR_TRUE; // record in no longer in cache map
>     }

Could you use NS_WARN_IF_FALSE(NS_SUCCEEDED(rv))?  That way, you don't bother
checking rv in a non-debug build.

>@@ -785,8 +770,17 @@
> nsresult
> nsDiskCacheDevice::EvictEntries(const char * clientID)
> {
>+    nsresult  rv;
>+
>+    if (clientID == nsnull) {
>+        // we're clearing the entire disk cache
>+        
>+        rv = ClearDiskCache();
>+        if (NS_SUCCEEDED(rv))  return NS_OK;
>+    }

Do you really want to fall through here if NS_SUCCEEDED(rv) is false?



>+
>+// function passed to PR_CreateThread
>+void
>+DoDeleteFileList(void *arg)
>+{
>+    nsCOMArray<nsIFile> * fileList = NS_STATIC_CAST(nsCOMArray<nsIFile> *, arg);
>+    nsresult              rv;
>+
>+    // iterate over items in fileList, recursively deleting each
>+    PRInt32  count = fileList->Count();
>+    for (PRInt32 i=0; i<count; i++) {
>+        nsIFile * item = fileList->ObjectAt(i);
>+        CACHE_LOG_PATH(PR_LOG_ALWAYS, "deleting: %s\n", item);
>+
>+        rv = item->Remove(PR_TRUE);
>+        if (NS_FAILED(rv)) {
>+            NS_WARNING("failure cleaning up cache");
>+        }

Same as above (NS_WARN_IF_FALSE).
Brian, just using NS_WARN_IF_FALSE() would generate additional warnings in the
optimized build because I'm only using the 'nsresult rv' in debug builds.  I
trying to work with Brendan to add somekind of IF_DEBUG macro to avoid the
ugliness of using #ifdef DEBUG brackets in this idiom.  Since neither of these
calls in in a tight loop, it's highly likely that I've already spent more time
on this issue than all users put together will spend executing these particular
conditional checks, which a decent compiler should have optimized away in the
first place.  Anyway, Darin pointed out the same thing, and I'm continuing to
find a solution that everybody can live with.

About falling through in EvictEntries() if ClearDiskCache() fails:  yes, I do
want to fall through...probably.  Actually, I only want to fall through on
certain errors (if ClearDiskCache() fails because there are active bindings),
otherwise we may have found a more serious error and could bail early rather
than attempt eviction on an entry by entry basis.  The code isn't wrong the way
it stands now, but if we failed ClearDiskCache() for a reason other than active
bindings, we are likely to fail eviction as well.  I'll look into checking for a
specific error and only falling through for that case.

Thanks for the comments.
Comment on attachment 115955 [details] [diff] [review]
patch 4 - changes for Darin's latest comments

Ok, that sounds fine. r=bryner.
Attachment #115955 - Flags: review?(bryner) → review+
Patch checked in.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: