Closed Bug 76293 Opened 23 years ago Closed 23 years ago

new cache needs to be removed if browser busting or mozilla is killed...

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.1

People

(Reporter: chofmann, Assigned: gordon)

References

()

Details

(Whiteboard: r=beard sr=darin, drivers?)

Attachments

(2 files)

not sure if its just because I ran browser buster for a while
and generated lots of cache entries or if its becuase I
got hung up using mail and wasn't able to quit normally
by exiting windows but on the next startup I was unable to start.

here is what I did...

browser busting for about 251 url loads.
surf around to a few sites
read some mail.
wrote some mail messages.
tried to send a few messages.
one of the send mail message windows got hung up and
 would not send....
brought up the task manager and killed off netscp6.exe..

tried to launch again...
disk spins for a very long time and mozilla does not start.
task manager says netscp6.exe is not responding
I kill it off...

If I remove newcache dir from my profile everything starts fine...
I notice there are about 2000 files  and 6 MB of data
in the newcache directory when I could not start...

> dir newcache
...
     2,057 file(s)      5,945,186 bytes
         2 dir(s)       10,316.84 MB free

.\Users50\chofmann>mv newcache oldnewcache-2
Blocks: 76125
*** Bug 76376 has been marked as a duplicate of this bug. ***
*** Bug 75259 has been marked as a duplicate of this bug. ***
Target Milestone: --- → mozilla0.9.1
Target Milestone: mozilla0.9.1 → mozilla0.9
I think this bug is related to bug 76712.  It appears that after crashing on
Windows, open file descriptors can be left open, which prevents them from being
deleted.  The disk cache detects when it was not properly shutdown and tries to
delete and initialize the cache directory.  My hypothesis is that the
combination of these two facts leave us in an inconsistent state because the
disk cache is unable to remove the old "NewCache" directory.

Akkana's report on Linux could be a similar issue because she is access the same
profile (and therefore cache directory) from two different instances of mozilla
simultaneously (bug 76431 has been created to protect the cache from multiple
instances of mozilla).  One instance of mozilla will get access to the cache,
but the second instance will likely get confused.

Pink, what platforms we're you having trouble with your cache folder on?  I've
only heard of one case on the Mac where this happens and that was when the user
replaced the entire NewCache folder with a locked file named "NewCache" (bug
73314).

This hypothesis also accounts for my inability to reproduce the problems with
the "corrupt" cache folders that have been sent to me.  The cache folders
sometimes contain small anomalies that are handled by the disk cache, but the
folder work fine in my builds.  The reason is: because my machine wasn't the one
that has open file descriptors.

If this hypothesis accounts for all reports, I can solve the problem by fixing a
couple of separate bugs.

1) protect the disk cache from multiple instances of mozilla
2) make the disk cache react better to errors deleting old cache folders or
creating new ones (these "zombie" files can be moved aside, and automatically
delete later).

I think #2 is more urgent to fix for 0.9.
you would think then that if it was open FD's, then a reboot would cure the 
problem, no? Next time I see this (i did yesterday on mac), i'll try a reboot and 
see if that helps.
Yes, if a reboot doesn't fix things, then the hypothesis doesn't cover everything 
that's happening and we'll have to keep searching.  Let me know the next time you 
see this.  Thanks.
ok,  I got my win98 laptop in a state where I could not launch the browser..
(it would hang after splash screen, but before first window was fully
drawn (my home was page set to blank).  Several attempts to start
produced the same result.  then I tried to remove the Newcache...

C:\Users50\chofmann>rm -rf Newcache
rm: cannot unlink entry "Newcache/_cache_map_": Access is denied.
rm: cannot remove directory "Newcache": Access is denied.
rm: cannot remove directory "Newcache": Access is denied.
C:\Users50\chofmann>ls -lrt Newcache
total 0
-rwxrwxrwx   1 0        0              0 Apr 22 20:02 _CACHE_MAP_


then I rebooted, and vola!  the cache map was written too...

C:\Users50\chofmann>ls -lrt Newcache
total 264
-rwxrwxrwa   1 0        0         135168 Apr 22 21:55 _CACHE_MAP_
I think I may  also have found a way to get the browser in the
funny state where the cache may get in a funny state...

I was seeing how many windows might be openned before we 
start to run into performance/stability problems...
 [crtl]n, [ctrl]n , [ctrl]n , .... about 14 times on my
laptop gets it into a pretty bad state where killing off
windows and the browser becomes a difficult task...
adding beard to the cc list...  looks like he has poked around in the
code that tries to open/use/delete the cache map...

the other question gordon was unsure of last week is "why 
would we be trying to hit the cache on startup if the
default home page is set to blank?"    does side bar
or anything else hit the cache when we start?
By default the sidebar contains at leat 2 items which need access to the Net:
Tinderbox: SeaMonkey and Tinderbox: SeaMonkey-Ports.

Those 2 tabs will try to access the Net and maybe that's why it touches the cache...

How about if we hide the Sidebar and try to start up Moz again?
I just hit this again, the same way as I'd seen it before:
- Run today's optimized build for general browsing and leave it up.
- When my debug tree finishes building, run it under gdb with a URL argument
because I need to get a stack trace on something.
It takes forever to start up and autoregister all the libraries under gdb (maybe
5 minutes) so meanwhile, while it's starting up, do other things, including
going to pages with the release build.

Finally debug build comes up, gives error "can't load page".

When I've seen this before, it's always been on the first run of a debug build.
 Maybe that's because it takes so long to start up so the chances are that I'll
do something else in the release build while the second instance is starting. 
Is it possible that the new cache code checks for a lock or a date early in the
startup process, then depends on it much later in the startup process, instead
of waiting until right before it needs the info?
> including going to pages with the release build.

we really need to make it impossible to share profiles between multiple 
instantiations of mozilla. we get into so much trouble.
Why?  It's worked for years, up until a few weeks ago.  But that's better argued
in bug 76431.
Whiteboard: patch eta 4/24
Netscape 4.x explicitly prevents using the disk cache, global history, and 
certificates from multiple instances simultaneously.  We need the same protection 
we had in 4.x.
Status: NEW → ASSIGNED
OS: Windows 98 → All
Hardware: PC → All
Please don't just delete the NewCache directory.  We need to try things in this 
order:
1) COPY the NewCache directory somewhere outside the profile directory.
1) relaunch mozilla
2) reboot the OS and then relaunch mozilla (okay, if it's really too painful you 
can skip this, but it may affect the results).
3) DELETE the NewCache directory in the profile directory (or try to) and 
relaunch mozilla.
4) COPY the copied NewCache directory back into the profile directory and 
relaunch mozilla.

If the browser works after 1, 2, or 4, the NewCache directory is not corrupt, but 
we will have narrowed the actual cause down.

Post all results here.
*** Bug 77342 has been marked as a duplicate of this bug. ***
beard, darin, could you please review & super-review the latest patch?  Thanks.
Whiteboard: patch eta 4/24 → r=? sr=?
Please use unified diffs.
r=beard
Patrick, I selected "unified diff" from the Advanced Diff... menu in MacCVS.  
There appear to be some quirks about it that I would be happy to show you 
sometime.

Anyhow, thanks for the review.

I'll check this into the trunk, and the drivers can wait see if they want this 
checked into the 0.9 branch.
Whiteboard: r=? sr=? → r=beard sr=darin
*** Bug 76712 has been marked as a duplicate of this bug. ***
sr=darin
If we're moving it, can we at least try to delete it on the next startup (when
it is unlikely to be in use)?  Leaving a huge remnant directory around is kinda
rude.
That's exactly what were doing.  Take a look at the first few lines of code in 
the patch for nsDiskCacheDevice.cpp (or the description of the patch).

Some say we shouldn't delete this folder on startup because it adds to start up 
time.  Hyatt noticed the delay in his tests and was concerned because he was 
unaware of what we were doing.  I'm open to suggestions as to where the best 
place is to delete the old cache files.

Also, I hope to land changes soon that will greatly reduce the number of separate 
files we use to store data/metadata.  That may render the cache folder deletion 
unnoticible.
How about asking the user upon detecting the "rubbish" cache directory?

E.g. "Cache directory since last crash was found at <some dir>.  Do you want to
clear the unnecessary cache files?"

Or simply create a checkbox in preference menu:
[check box]   Delete rubbish cache files upon program start
[push button] Clear rubbish cache files now

Comments?
Not only do we not remove the old Cache directory, we create it if it isn't
there.  I keep removing both Cache and NewCache, and they both come back on the
next run (but then the Cache directory stays empty).

If we stop recreating it, then the startup time hit will only happen the first
time.  If we're worried about the time it takes to check whether the old cache
folder is there at all (does one stat take that long on any platform?), we could
do that in the installer.
we could also delete the old cache directory on shutdown (assuming the browser
ever normally shuts down...)
I've seen the "Cache" directory created whenever I launch a new build.  The old 
cache code is gone, and the new cache code doesn't create this directory.  I 
don't know what is creating the directory.
Okay, so the patch has been checked into the trunk for awhile now.  Should I 
check it into the 0.9 branch?  Should I move the deletion of Cache.Trash 
directories to shutdown instead of startup?

Looking for input from drivers.
Whiteboard: r=beard sr=darin → r=beard sr=darin, drivers?
The "Cache" folder is getting created here:

http://lxr.mozilla.org/seamonkey/source/xpfe/bootstrap/nsAppRunner.cpp#659

chofmann: do you guys want this for 0.9? if so then gordon needs to know so that
he can check it in for 0.9 else we should move this off to 0.9.1 and mark it
fixed since its already in the trunk.
per blizzard's email this is too late for 0.9. So off to 0.9.1 and marking as fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Target Milestone: mozilla0.9 → mozilla0.9.1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: