103851 - move Cache deletion to another thread so it doesn't block the UI

Reporter

Description

•

23 years ago

Using 2001100321/linux but have seen this for a long time now on linux: 
After Mozilla crashes or gets killed, starting it new will take a very long
time. There is a lot of disk activity before the mozilla window finally appears.
Sometimes there is a lot of disk activity when mozilla crashes and it takes a 
long time until it finishes termination, but this is not because a coredump
is being written. Maybe it has to do with the cache?

Matthias Versen [:Matti]

Comment 1

•

23 years ago

Reporter: 
How many RAM do you have ?
If mozilla crashed the OS will clear all the memory.
Maybe the OS clears the swap file ?

johann.petrak@gmail.com

Reporter

Comment 2

•

23 years ago

The system has 512MB RAM. I am pretty sure it isnt the swap file but rather the
cache because of this: I am running mozilla on a different machine than my
display, the swap is on the remote machine, but the cache is on my local machine.
When I start mozilla after a crash or kill, I can hear the disk activity, so
it has to be either the cache or my .mozilla, which is also stored locally.
Note that usually this activity appears when starting again after the killing
or crash, at that time there is no memory to clear. I just wanted to mention 
that *sometimes* this does occur when a crash occurs too (if it does occur,
again I can hear the disk activity so it cannot be the swap file).
The simplest way to try this out is: use Ctrl-C to terminate mozilla
in the terminal where it was started, then restart it again.

Matthias Versen [:Matti]

Comment 3

•

23 years ago

*** Bug 109828 has been marked as a duplicate of this bug. ***

Matthias Versen [:Matti]

Comment 4

•

23 years ago

Mozilla clears the whole cache after a crash.
Can you try it with a lower cache setting ?

Matthias Versen [:Matti]

Comment 5

•

23 years ago

-> Cache (?)

Assignee: asa → gordon

Status: UNCONFIRMED → NEW

Component: Browser-General → Networking: Cache

Ever confirmed: true

QA Contact: doronr → tever

gordon

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla0.9.7

gordon

Assignee

Comment 6

•

23 years ago

I'm sure the delay is due to the clearing of the disk cache.  We detect a crash
has occurred, leaving the disk cache in an unknown state, and we need to delete
it before we can start over.

We don't have any immediate plans to do anything about this, but I'll leave this
bug open for now to discuss it in the performance meeting and gather feedback.

gordon

Assignee

Comment 7

•

23 years ago

I'll raise this issue at the performance meeting next week, but we won't be 
landing a fix for this by 0.9.7.  Retargeting.

Target Milestone: mozilla0.9.7 → mozilla0.9.8

gordon

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.8 → mozilla1.0

gordon

Assignee

Updated

•

23 years ago

Blocks: 122597

gordon

Assignee

Comment 8

•

23 years ago

I'd like to try moving the deletion of the cache directory to another thread,
and possibly even delay it until after startup, so we don't block the UI or
severely impact startup time.

Priority: -- → P1

gordon

Assignee

Updated

•

22 years ago

Keywords: nsbeta1

Gagan

Comment 9

•

22 years ago

per adt, not critical for nsbeta1. hence minus.

Keywords: nsbeta1 → nsbeta1-

Boris Zbarsky [:bzbarsky]

Comment 10

•

22 years ago

*** Bug 135063 has been marked as a duplicate of this bug. ***

TT

Comment 11

•

22 years ago

This happens on OS/2 as well.  The larger your Disk Cache, the longer it takes
to start up.  In my dual Athlon 1.2GHz system, a 50000KB cache takes about a
minute to clear.

I don't understand why this happens.  None of the other web browsers do this. 
If Mozilla wants to clear the whole cache, then why doesn't it just delete the
whole cache directory or something?  On my system, deleting 50MB of files takes
a few seconds if you do it the right way.

One idea is to create a new cache directory upon startup, and then use a
background thread to delete the old one.

Buckley Collum

Comment 12

•

22 years ago

Gordon,
I really think this issue needs to be re-prioritized.  Having to wait several
minutes for the browser to start on a 2Ghz/1GB-RAM computer seems very wrong.
I have found that I can get mozilla up quicker by doing two or three rounds of
starting mozilla and then killing the processes. Typically by the third launch,
mozilla starts.  This seems to work far quicker than waiting for mozilla to try
and find its way up.
It definitely seems that if mozilla is clearing the whole cache, this could be
done much quicker.
Definitely agree with your comments about at least delaying this til after startup.
(this bug still present in 0.9.x-rc1)

timeless

Comment 13

•

22 years ago

gordon the code is in nsDiskCacheDevice::InitializeCacheDirectory(), right?

ok. this bug is for moving deletion to another thread.
but while we're moving it, can someone from xpcom-file (hi dougt) tell us if 
cache is using the correct apis to delete a tree?

Keywords: helpwanted, mozilla1.1, mozilla1.2

Summary: After crashing or getting killed the restart is very slow → move Cache deletion to another thread so it doesn't block the UI

Target Milestone: mozilla1.0 → ---

Jo Hermans

Comment 14

•

22 years ago

Since bug 81724 was fixed, we're using a lot less files in the cache. As a
side-effect, clearing the cache (manually, or automatically after a crash) is
now much faster. It used to be a big problem on the Mac (bug 123157), but the
speed is now so much improved that it's not a problem for me anymore. Clearing a
20MB cache on a Mac HFS-disk is now done in 10 seconds or so.

Mike Pinkerton (not reading bugmail)

Comment 15

•

22 years ago

i don't believe locking up the user's machine for 10 seconds w/out telling them
why is acceptible. we need a UI.

Thomas F. O'Connell

Comment 16

•

22 years ago

i agree that this bug is neither invalid nor fixed. for whatever reason, it
seems to hit hardest on laptops rather than desktops from my experience. on a G3
PowerBook, Moz (or Moz-based) churns after any crash for much longer than 10
seconds. this continues even with 1.1. strangely, i don't notice any ill effects
on a G4 tower. but a crash on the laptop, which runs smoothly in all other
scenarios, makes me want to switch browsers instead of launching Moz again.

gordon

Assignee

Comment 17

•

22 years ago

*** Bug 123157 has been marked as a duplicate of this bug. ***

gordon

Assignee

Comment 18

•

22 years ago

Mike, how about if we don't lock up the UI.  I think that would be even better
than trying to explain why you can't use your computer.

Not sure I can get this in by Tuesday night, but I'll try.

Thomas F. O'Connell

Comment 19

•

22 years ago

FWIW, this bug also affects Chimera 0.5 in a major way. again, it seems to
happen most often on a PowerBook rather than for PowerMac towers.

might i suggest a change of platform, also, since one of my bugs was duped to
this, and i have encountered it on a variety of Macs?

gordon

Assignee

Updated

•

22 years ago

OS: Linux → All

Hardware: PC → All

Target Milestone: --- → mozilla1.3alpha

Jo Hermans

Comment 20

•

22 years ago

This bug is completely solved in the trunk, and has become almost invalid since
Mozilla 1.2a. On an iBook 300 MHz (with a very slow hard disk) I've never seen a
hang of larger than 10 seconds, it's often 1 or 2 seconds. Very insignificant
compared to the rest of the startup. We might still need an indication why we
hang, but that's bug 122597.

But Chimera is still based on Mozilla 1.0.1 - that's why the bug is still
visible there, especially on slow hard disks (like those in portables). When (if
?) Chimera will be merged with the trunk again, it's automatically solved.

Thomas F. O'Connell

Comment 21

•

22 years ago

well, if there's a patch checked in that fixes it, i'd say resolve it as fixed.
i'm not sure if "has become almost invalid since Mozilla 1.2a" means that this
is fixed by code or if you're just saying that a fast computer with decent hard
drive is unaffected. if the latter is the case, it should stay open until it's
actually fixed.

johann.petrak@gmail.com

Reporter

Comment 22

•

22 years ago

I will try recheck on the original system which I rarely use at the moment.

This problem was especially visible in configurations where the cache is on a
slow disk or on a disk mounted over the network. The symptoms seem to indicate
that each file in the cache gets deleted or invalidated individually instead of
simply 
removing all cashe files in one step.

If this is the case, the problam will also depend on the size of the cache set
in the preferences bythe user - a huge cache size will probably lead to very
long processing here.

Re comment #20: how has it been solved? What exactly has been changed? In which
bug is this documented?
I do not believe that bug 81724 will solve this problem, since the cache will at
some point grow to limit size. Note that this problem will not be visible if you
frequently clear the cache or do things that make Mozilla clear the cache or if
you have a small cache size setting from the start.

gordon

Assignee

Updated

•

21 years ago

Target Milestone: mozilla1.3alpha → mozilla1.4alpha

gordon

Assignee

Comment 23

•

21 years ago

Attached patch patch 1 - move cache deletion to separate thread. — Details — Splinter Review

Here's my first cut at moving the cache deletion to another thread.

gordon

Assignee

Updated

•

21 years ago

Attachment #115452 - Flags: superreview?(darin)

Attachment #115452 - Flags: review?(bryner)

Mike Kaply [:mkaply]

Comment 24

•

21 years ago

Context diff pretty please?

timeless

Comment 25

•

21 years ago

Comment on attachment 115452 [details] [diff] [review]
patch 1 - move cache deletion to separate thread.

function params should be aFoo instead of foo

>  nsDiskCacheBlockFile::Close(PRBool flush)
>-    nsresult rv  = FlushBitMap();
>+    nsresult rv;
>+    if (flush)
>+        rv  = FlushBitMap();
rv is uninitialized unless flush *bad*	 
>     if (NS_SUCCEEDED(rv) && (err != PR_SUCCESS))

>+nsDiskCacheDevice::Shutdown_Private(PRBool  flush)
>+{
>+        if (Initialized()) {
>         // check cache limits in case we need to evict.
afaik your patch wasn't -w, so you need to reindent this block

>+    nsCOMArray<nsIFile> * deleteList = new nsCOMArray<nsIFile>;
that's scary.. oh well

+#if 1
#ifdef DEBUG_<something>
...
+	     printf("## deleting: %s\n", path.get());
...
+#endif

Attachment #115452 - Flags: superreview?(darin)

Attachment #115452 - Flags: review?(bryner)

Attachment #115452 - Flags: review-

Darin Fisher

Comment 26

•

21 years ago

timeless: aFoo vs. foo in necko implementation code doesn't matter, especially
not when the function is so small.  no one needs the "a" prefix to remind them
of what they can easily see in the function signature.

gordon

Assignee

Comment 27

•

21 years ago

Attached patch patch 2 - fixed a couple bugs, added diff -u — Details — Splinter Review

Still testing edge cases, but looking for feedback.

Brian Ryner (not reading)

Comment 28

•

21 years ago

Comment on attachment 115593 [details] [diff] [review]
patch 2 - fixed a couple bugs, added diff -u

>Index: nsDiskCacheBlockFile.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/netwerk/cache/src/nsDiskCacheBlockFile.cpp,v
>retrieving revision 1.5
>diff -u -r1.5 nsDiskCacheBlockFile.cpp
>--- nsDiskCacheBlockFile.cpp	9 Oct 2002 03:00:09 -0000	1.5
>+++ nsDiskCacheBlockFile.cpp	26 Feb 2003 03:20:30 -0000
>@@ -103,11 +103,14 @@
>  *  Close
>  *****************************************************************************/
> nsresult
>-nsDiskCacheBlockFile::Close()
>+nsDiskCacheBlockFile::Close(PRBool flush)
> {
>     if (!mFD)  return NS_OK;
>-    
>-    nsresult rv  = FlushBitMap();
>+    nsresult rv;
>+
>+    if (flush)
>+        rv  = FlushBitMap();
>+

As timeless noted, won't rv be uninitialized here if flush is false?

The patch looks good to me in general.	I'm not really familiar with the disk
cache code so I could have missed something.  I'd advise cleaning up the patch
a bit (#if 0 code, and maybe resolve some of those XXX comments) before
landing.

Darin Fisher

Comment 29

•

21 years ago

Comment on attachment 115593 [details] [diff] [review]
patch 2 - fixed a couple bugs, added diff -u

>Index: nsDiskCacheBinding.cpp

PR_STATIC_CALLBACK(PLDHashOperator)

>+PLDHashOperator PR_CALLBACK
>+ActiveBinding(PLDHashTable *    table,

>+            ((PRBool *)arg) = PR_TRUE;

you meant to dereference this pointer first, right?


>Index: nsDiskCacheBlockFile.cpp

>     // write the bit map and flush the file
>+    // XXX except we would take a severe performance hit
>     // XXX rv = FlushBitMap();

if we leave this unimplemented, what is the cost/impact?


>Index: nsDiskCacheDevice.cpp

>+// "Cache.Trash directory is a sibling of the "Cache" directory

  "Cache.Trash"


more comments later.. i haven't reviewed much beyond this point.

Attachment #115593 - Flags: review-

gordon

Assignee

Comment 30

•

21 years ago

Attached patch patch 3 - incorporating comments received so far — Details — Splinter Review

I'm still waiting for additional comments from Darin.

Darin Fisher

Comment 31

•

21 years ago

Comment on attachment 115703 [details] [diff] [review]
patch 3 - incorporating comments received so far

>Index: nsDiskCacheBinding.cpp

>-                NS_ASSERTION(binding->mGeneration != p->mGeneration,  "generations collide!");
>+                NS_WARNING("### disk cache: generations collide!");

speaking from my own experience it's much nicer to be able to jump into
the debugger when one of these is hit.	i'd use NS_ERROR instead for this
reason.  only use NS_WARNING when it's not something you'd want to fix!


>Index: nsDiskCacheDevice.cpp

> nsresult
> nsDiskCacheDevice::Create(nsCacheDevice **result)
> {
>+    // not implemented because nsCacheService has to set attributes
>+    // on the device before Init() can be called.
>+    NS_NOTREACHED("Create");
>+    return NS_ERROR_NOT_IMPLEMENTED;
> }

maybe you should just remove this function then.


>+    NS_ASSERTION(mCacheMap == nsnull, "### leaking mCacheMap");

more assertion nits ;-)

### prefix really isn't necessary.  you'll get plenty of output
when an assertion is hit without this.


>+        nsCOMArray<nsIFile> * trashList;

too bad nsCOMArray doesn't have a shallow copy method.	then you
wouldn't be forced to heap allocate the nsCOMArray container object.


>+nsDiskCacheDevice::ClearDiskCache()
>+{
...
>+    nsCOMArray<nsIFile> * deleteList = new nsCOMArray<nsIFile>;
>+    if (!deleteList)  return NS_ERROR_OUT_OF_MEMORY;
>+
>+    rv = Shutdown_Private(PR_FALSE);  // false = don't bother flushing
>+    if (NS_FAILED(rv))  return rv;

failure case causes |deleteList| to leak.  more below...


>+#ifdef DEBUG
>+    printf("## nsDiskCacheDevice: now running DeleteDirectory()\n");
>+#endif

maybe now is a good time to start using PR_LOG in the cache?  you might want
to copy the LOG(args) macro from, for example, nsInputStreamPump.cpp


>+nsDiskCacheDevice::DeleteFiles(nsCOMArray<nsIFile> * fileList)
>+{
...
>+    if (!thread)  return NS_ERROR_UNEXPECTED;
>+    return NS_OK;
>+}

in the error case who deletes |fileList| ??  the caller of this
function certainly seems to fire and forget.


>+nsDiskCacheDevice::ListTrashContents(nsCOMArray<nsIFile> ** result)
>+{
...
>+    nsCOMArray<nsIFile> * array = new nsCOMArray<nsIFile>;
>+    if (!array)  return NS_ERROR_OUT_OF_MEMORY;

we really need a smart ptr class :(


>+    if (NS_FAILED(rv) || !dirEntries)  goto error_exit; // !dirEntries returns NS_OK
>+
>+    PRBool  more;

some compilers will give a build error if you place a goto statement that
steps over the declaration of a stack variable.  maybe MSVC or certain
versions of GCC?


>+nsDiskCacheDevice::MoveCacheToTrash(nsIFile ** result)
> {
...
>+    if (result)
>+        NS_ADDREF(*result = uniqueDir);

can you just assume |result| != NULL?  maybe add a debug only assertion??


otherwise, this looks great.  r=darin if you fix the leaks.

Attachment #115703 - Flags: review-

gordon

Assignee

Comment 32

•

21 years ago

Attached patch patch 4 - changes for Darin's latest comments — Details — Splinter Review

gordon

Assignee

Updated

•

21 years ago

Attachment #115955 - Flags: superreview?(darin)

Attachment #115955 - Flags: review?(bryner)

Darin Fisher

Comment 33

•

21 years ago

Comment on attachment 115955 [details] [diff] [review]
patch 4 - changes for Darin's latest comments

>Index: nsCache.cpp

>+//#include "nsXPIDLString.h"
>+//#include "nsPrintfCString.h"

please just delete these lines before checking in.

sr=darin

Attachment #115955 - Flags: superreview?(darin) → superreview+

Brian Ryner (not reading)

Comment 34

•

21 years ago

Comment on attachment 115955 [details] [diff] [review]
patch 4 - changes for Darin's latest comments

>@@ -580,11 +561,14 @@
>     if (!binding->mDoomed) {
>         // so it can't be seen by FindEntry() ever again.
>         nsresult rv = mCacheMap->DoomRecord(&binding->mRecord);
>-        NS_ASSERTION(NS_SUCCEEDED(rv), "DoomRecord failed.");
>+        if (NS_FAILED(rv)) {
>+            NS_WARNING("DoomRecord failed.");
>+        }
>         binding->mDoomed = PR_TRUE; // record in no longer in cache map
>     }

Could you use NS_WARN_IF_FALSE(NS_SUCCEEDED(rv))?  That way, you don't bother
checking rv in a non-debug build.

>@@ -785,8 +770,17 @@
> nsresult
> nsDiskCacheDevice::EvictEntries(const char * clientID)
> {
>+    nsresult  rv;
>+
>+    if (clientID == nsnull) {
>+        // we're clearing the entire disk cache
>+        
>+        rv = ClearDiskCache();
>+        if (NS_SUCCEEDED(rv))  return NS_OK;
>+    }

Do you really want to fall through here if NS_SUCCEEDED(rv) is false?



>+
>+// function passed to PR_CreateThread
>+void
>+DoDeleteFileList(void *arg)
>+{
>+    nsCOMArray<nsIFile> * fileList = NS_STATIC_CAST(nsCOMArray<nsIFile> *, arg);
>+    nsresult              rv;
>+
>+    // iterate over items in fileList, recursively deleting each
>+    PRInt32  count = fileList->Count();
>+    for (PRInt32 i=0; i<count; i++) {
>+        nsIFile * item = fileList->ObjectAt(i);
>+        CACHE_LOG_PATH(PR_LOG_ALWAYS, "deleting: %s\n", item);
>+
>+        rv = item->Remove(PR_TRUE);
>+        if (NS_FAILED(rv)) {
>+            NS_WARNING("failure cleaning up cache");
>+        }

Same as above (NS_WARN_IF_FALSE).

Andrew Hagen

Updated

•

21 years ago

Keywords: mozilla1.1, mozilla1.2

gordon

Assignee

Comment 35

•

21 years ago

Brian, just using NS_WARN_IF_FALSE() would generate additional warnings in the
optimized build because I'm only using the 'nsresult rv' in debug builds.  I
trying to work with Brendan to add somekind of IF_DEBUG macro to avoid the
ugliness of using #ifdef DEBUG brackets in this idiom.  Since neither of these
calls in in a tight loop, it's highly likely that I've already spent more time
on this issue than all users put together will spend executing these particular
conditional checks, which a decent compiler should have optimized away in the
first place.  Anyway, Darin pointed out the same thing, and I'm continuing to
find a solution that everybody can live with.

About falling through in EvictEntries() if ClearDiskCache() fails:  yes, I do
want to fall through...probably.  Actually, I only want to fall through on
certain errors (if ClearDiskCache() fails because there are active bindings),
otherwise we may have found a more serious error and could bail early rather
than attempt eviction on an entry by entry basis.  The code isn't wrong the way
it stands now, but if we failed ClearDiskCache() for a reason other than active
bindings, we are likely to fail eviction as well.  I'll look into checking for a
specific error and only falling through for that case.

Thanks for the comments.

Brian Ryner (not reading)

Comment 36

•

21 years ago

Comment on attachment 115955 [details] [diff] [review]
patch 4 - changes for Darin's latest comments

Ok, that sounds fine. r=bryner.

Attachment #115955 - Flags: review?(bryner) → review+

gordon

Assignee

Comment 37

•

21 years ago

Attached patch patch 5 - changes for Darin's & Brian's latest comments — Details — Splinter Review

gordon

Assignee

Comment 38

•

21 years ago

Patch checked in.

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

patch 1 - move cache deletion to separate thread. 21 years ago gordon 19.68 KB, patch	timeless : review-	Details \| Diff \| Splinter Review
patch 2 - fixed a couple bugs, added diff -u 21 years ago gordon 26.93 KB, patch	darin.moz : review-	Details \| Diff \| Splinter Review
patch 3 - incorporating comments received so far 21 years ago gordon 34.84 KB, patch	darin.moz : review-	Details \| Diff \| Splinter Review
patch 4 - changes for Darin's latest comments 21 years ago gordon 37.56 KB, patch	bryner : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
patch 5 - changes for Darin's & Brian's latest comments 21 years ago gordon 38.53 KB, patch		Details \| Diff \| Splinter Review