Closed Bug 105843 Opened 23 years ago Closed 8 years ago

Cache lost if Mozilla crashes

Categories

(Core :: Networking: Cache, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: matt, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, Whiteboard: [snappy:p2][notacrash][metro-mvp][LOE:2])

Attachments

(1 obsolete file)

Mozilla 2001101608, Linux 2.4.9 i686, RedHat 6.1, XFree86 4.1.0, glibc 2.2.4

If Mozilla crashes, then all the images in the cache are lost (and presumably
the cached HTML pages as well).  This is easy to reproduce: just go to a site
with a lot of graphics (like http://megatokyo.com), do "killall -9 mozilla-bin",
then restart Mozilla and go back to that site; it will start reloading all of
the images.
Are you sure this bug is valid ?
The cache is in an unkown state if you kill mozilla.
I think mozilla detects this at the next startup and clears the cache.
This is better than crashing with a bad/corrupted cache file.
This is by design.  The cache is primarily an optimization.  Rather than add a
bunch of extra code to verify it and fix any potential corruption, we toss it
and start again.

Marking WONTFIX.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
*** Bug 116477 has been marked as a duplicate of this bug. ***
verified wontfix
Status: RESOLVED → VERIFIED
I don't think this should be won't fix.
I think it should be an enhancement.

It will probably need a re-write of the cache, and/or some fiddly recovery
code.

At minimum you need to tell the user the cache is being cleared at startup,
and explain that this is because Mozilla crashed or was killed.

A second best for people who want a stable cache may be to backup the cache
to a second directory.
Maybe you only need to backup the _CACHE_* files(which I assume are catalogs and
indexes), not the cache elements themselves.
Of course this option should be a power-user option.


It was bugging me for days as to why my cache kept behaving in odd ways.
I eventually realised that cach crash killed the cache.
But yet a crash doesn't kill the history list.

A large cache can be extremely useful for offline browsing.
Imagine showing people pages on laptop, from cache.
Imagine frustration when you find that all the pages you had cached are now gone.
You're offsite, and you know you don't have access to a PC, or

It's also extremely annoying when you only have evening flatrate connections.
You want to look at sometning you went to last evening....
But it's not there, coz the browser crashed.
*** Bug 147504 has been marked as a duplicate of this bug. ***
I agree with Rob that it's pretty annoying, especially if it takes many minutes
to deletes all those files (on a Mac HFS filesystem for instance). I would
expect at least a dialog box to explain the situation.

Bug 123157 wanted to have a separate thread to clear the cache, so that Mozilla
would starts *much* faster.
I also hear many complance about loosing the cache. We defenetly must to give a
choice for user or get some detectings what was loaded during the cache, so we
do not erase all cache.

Should I open a new feature request or is there already one?
Eugene: The cache main database file is always open. That means you can't only
delete the cache from the current loaded site.
So, there's only one way of a rewriting the cache completly? Many people on
modem suffer of this and return to ie... At lest in Russia, where the internet
is +/-98% on modem.
What if we write the cache meta-data out in a journaling manner, like how
journaling file systems do: every modification to the cache gets appened to the
meta-data file, so if Mozilla crashes, we just have to step through the
journal to recreate the proper cache state.  Then we'd only have to delete
the last object to be cached, just in case that object had gotten corrupted.
It's possible but that is too much work.. (see comment #2).

*** Bug 187327 has been marked as a duplicate of this bug. ***
*** Bug 199875 has been marked as a duplicate of this bug. ***
*** Bug 216633 has been marked as a duplicate of this bug. ***
*** Bug 180260 has been marked as a duplicate of this bug. ***
If this bug isn't going to be fixed then I may have to go back to IE which is
something I really don't want to do but as a dial-up user I may have no choice.
 I think the developers need to get off their broadband and try using a dial-up
connection for a week then maybe they'd realize how important this bug is.
robert:

i understand your concern, but it begs the question: are you experiencing
frequent mozilla crashes on your system?  we try really hard to make mozilla not
crash.  if it is crashing, then it seems to me that we should focus more of our
energy on fixing the crash bugs.  afterall, you could lose much more than just
your cache if we crash.  you could lose work in progress, etc.  that seems far
more valuable than a cache of data that can be re-downloaded albeit at great
cost to dial-up users.  honestly i would be willing to reconsider this bug as a
feature request, but i don't expect it will be high enough priority to get
worked on anytime soon.  that is, unless someone really takes it upon themselves
to do the work.
Well, I am using Win98 which is probably one of the reasons its crashing. :)  
Also, not all the crashes are due to Mozilla/Firebird.  Sometimes something 
else brings the system down.

I'm not a programmer or I'd fix it myself but it doesn't seem that difficult to 
fix.  Just backup the _cache* files every so often (user selectable interfal) 
and if Mozilla detects that the cache is bad instead of deleting it it can just 
copy the backup files ove the bad files.  You could even add a check to see if 
the _cache files needed backing up so you wouldn't be just writing to the disk 
and keeping your computer from going into standby.  You'd still lose a little 
but that'd be much better than losing 100megs or more.  Would that really need 
a lot of programming?  Mozilla already detects if the cache is bad which would 
seem to be the most difficult part.

I'm sure there are better more elegant fixes but that would be a quick and 
dirty fix that should work.

unfortunately, there is no code to detect a bad cache.  what mozilla does
currently is detect that the cache was never closed after being previously
opened.  what this means simply is that when the cache is opened, a flag is set
indicating that its on disk representation is dirty (i.e., possibly incomplete).
 at shutdown everything is sync'd to disk and the dirty flag is cleared. 
however, what you are proposing requires determining if a cache is corrupt or
not, and that is a very complicated problem to solve.  we are no where close to
having enough support in necko to determine if the cache files are consistent.

a simple thing that we could do is sync up the cache periodically and only set
the dirty flag on disk when something changes in memory.  but, again.. while
that isn't terribly complicated to implement, i don't see it happening any time
soon :(
*** Bug 225435 has been marked as a duplicate of this bug. ***
*** Bug 226567 has been marked as a duplicate of this bug. ***
Is there any objection to reopening this bug as an enhancement? It could greatly
improve the performance of Mozilla on Windows 98 machines with dial-up access,
for example. It seems like low-hanging perf fruit to me.
steve: i think characterizing this bug as "low-hanging fruit" is a gross
understatement of reality.  this is not a very easy bug to fix.  it is a lot
like trying to implement a journaling filesystem with one exception.  the cache
can tolerate some lost files.

but to mark this WONTFIX is perhaps wrong.  it could theoretically be fixed, if
someone really want to do so.  reopening, but that doesn't mean anything going
to really change ;)
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
I wish someone would fix this bug.  I have DSL now but when I had dial-up I had
to switch to MyIE2 because of this bug.
Assignee: gordon → darin
Status: REOPENED → NEW
Target Milestone: --- → Future
Severity: major → enhancement
Why this bug is not listed in in Bug 19454, meta bug for Crash Recovery (of all
temporary browser state data) tracking?
This bug doesn't seem to have anything to do with saving the state of Mozilla
when it crashes. You could save the state but lose the cache, meaning that the
next time you start Mozilla the pages you had open when it crashed would be
re-opened but Mozilla would respond more slowly. You could also lose the state
and save the cache, which is what happens when you manually exit Mozilla and
don't have the "Display Last Page Visited Upon Startup" option enabled.
having just read some of the comments form so called devs on here, im quite
frankly astonished at the attitude of 'cant be bothered to fix' because its "too
much work", its incredible.  youve got several users saying theyve had to go
back to IE because of it, does someone not using the browser anymore not give
you just a teeny clue as to this bugs importance?  IE has no problem like this
with the cache, afaik the cache doesnt get dumped at all, why should mozilla
(and hence also firebird)?

the problem for me (firebird user) and a lot of others, isnt dumping the cache
its the fact that the cache is wiped by so many things OTHER than firebird
crashing, for example shutting down windows without closing firebird first, now
why would firebird not be closed gracefully when every other program is?  it
also regularly dumps the cache if you right click and choose close from the
windows taskbar.  once THAT huge problem is solved, users like me who rarely
ever get a crash in firebird, will no longer have to worry about ti as much.  at
the moment im losing my cache almost on a daily basis without ever even having
it crash.

a quick and simple fix to benefit all the reasonably experienced users is simply
to have an option in prefs.js that can disable the wiping of the cache
completely, which is a REALLY easy thing to add, simply have it so that enabling
that option always keeps the 'dirty' flag set to 'clean'.  also gives modemers
and the like the option of not having their cache wiped when its highly likely
it doesnt need wiping anyway.

at least that way, since nobody can be 'bothered' to fix it properly, the many
many users who are plagued by this problem are able to sort it for themselves.
I agree on Comment #28 From Ewok.
There is no reason to set "dirty" on every start up becasue Mozilla keep Cache
status in memory only during active and each cahce data file contains received
data only.
I also think seting "dirty" only on physical update of _cache_* files is sufficient.
However there is an exeption - when same cache data file is used for other URI.
In this case, _cache_* control files should be refreshed.

I believe many users can afford cache clear due to crash just while Mozilla is
updating _cache_* files only.
And if so, I can pursuade a few non-affordable friends to take back up Mozilla's
Cache directly on boot-up or on shutdown or periodically.



  
I му been suffering because of this bug for about 1.5 years. Now i am really 
tired because of it and using Opera. You can really understand me that it is 
just even worse than anything that can be!... I му been already posting several 
times to mozillazine forums but no one listened... You can read this topic here: 
 http://forums.mozillazine.org/viewtopic.php?
t=41814&postdays=0&postorder=asc&start=0 I have already said everything that I 
Could.. Now its your time something to do. And also one thing. I know for about 
10-20 men that also suffer of this thing. Some had returned to IE some using 
Opera.. the same as me. But I still have a hope that someone will fix this.
The actual link to topicc is:  http://forums.mozillazine.org/viewtopic.php?
t=41814&postdays=0&postorder=asc&start=0
backing up the cache is most certainly not the answer given the possible size of
someones browser cache.  simply having an option to never clear the cache
(unless told to do so from within firebird) would be a temporary fix until a
proper solution is implemented, would keep everyone happy for now and only take
about 5 mins to implement.
Not to clear the cache would be possibly very easy to implement (Never say that
if you don't know the code!) but the cache is broken after that.
Do you really expect that someone will implement such a pref with the result of
a broken cache (and many stupid bug reports which will come ?)
yes i do, because if you bother to read my original suggestion, i suggest that 
its done as an expert only preference that can be manually added to prefs.js by 
users experienced enough to know what to do *IF* in the HIGHLY unlikely event 
that something gets corrupted, AND happens to stop firebird/mozilla working.  

And id still like to see poof that this has ever happened on any occasion ever 
because personally i doubt it has or will, so i think your comment about loads 
of bug reports is quite frankly rediculous since theres no evidence that the 
cache will EVER get corrupted in such a way as to make the browser crash, why 
dont you implement this option and then we can find out once and for all by 
using expert users to test it???

youve got hundreds, if not thousands of people leaving firebird and mozilla due 
to this bug alone (and thousands more who are plagued by this **** every damn 
day) and yet for 2 and a half years since this bug was filed, a total refusal 
to do anything about it.  and people wonder why IE has something like 96% 
market share?
>yes i do, because if you bother to read my original suggestion, i suggest that 
>its done as an expert only preference that can be manually added to prefs.js by 
>users experienced enough to know what to do *IF* in the HIGHLY unlikely event 
>that something gets corrupted, AND happens to stop firebird/mozilla working.  
>
>And id still like to see poof that this has ever happened on any occasion ever 
>because personally i doubt it has or will, so i think your comment about loads 
>of bug reports is quite frankly rediculous since theres no evidence that the 
>cache will EVER get corrupted in such a way as to make the browser crash, why 
>dont you implement this option and then we can find out once and for all by 
>using expert users to test it???

i think you do not understand how the cache is implemented.  in a nut-shell (and
this explains why this bug is hard to fix), the index of cache entries is an
in-memory data structure that is only written to disk when the browser exits
cleanly.  however, data files are written as we load pages.  the result is that
the data files on disk are not in sync (at all) with the index of cache entries
stored on disk.  hence, if we do not shutdown cleanly, we do not have a
meaningful index of cache entries on disk.

the "solution" that you propose would cause the browser to crash or load
incorrect content in many cases.


>youve got hundreds, if not thousands of people leaving firebird and mozilla due 
>to this bug alone (and thousands more who are plagued by this crap every damn 
>day) and yet for 2 and a half years since this bug was filed, a total refusal 
>to do anything about it.  and people wonder why IE has something like 96% 
>market share?

i understand that this bug matters to some people, but so do many other bugs. 
fixing this bug is a non-trivial task.  if it were a trivial task, we wouldn't
be having this discussion.  please try to temper your comments in the future. 
getting angry doesn't solve anything.
What difference between Navigator and Firebird?
Firebird is designed to be fast. This needs some code rewrites.

The only one right way to fix this bug is to rewrite cache code - and we'll be 
waiting for this (and till that time we'll be using another browser - mainly 
opera - its beter to spend little traffic for opera's banner than lost all the 
cache and download it again).
I see.  Looks like the design of it was severely flawed from the start then.  
How does IE's cache work since that seems to function perfectly and with no 
loss of speed.  And for a bug that affects so many people, do you think two and 
a half years is an acceptable timeframe for it?  I mean this should have been 
marked as blocking a mozilla 1.0 release in the very least imo and we are now 
on 1.6 with no sign of anything happening (and until the last few days no sign 
that anything would ever happen).

you say it affects "some people", i say it affects thousands of people and has 
clearly already lead to hundreds if not thousands going back to IE or other 
browsers, does that alone not ramp up the importance of this bug to being 
pretty damn severe?  it certainly would if it was my project.
Ewok is right... Very right!.
the problem is that you'll never find volounteers to fix this

cause its easier to complete (in *smart* way) konqueror's khtml engine than work 
on slow gecko.

i think that 'wontfix' is good - it will speed up desktop (kde or gnome) choise 
of many companies that are giong to switch to linux platform - how can they 
accept gnome if it's even hasn't stable & fast browser?
it is apparently far easier to degrade the work of others than it is to offer up
constructive solutions.  please do not post here unless you have a constructive
comment or can in some way contribute to helping fix this bug.
> ...the data files on disk are not in sync (at all) with the index of cache
> entries stored on disk...

Then why not sync the index of cache entries after a certain period of
inactivity, say one to five minutes? Then, if Mozilla has been idle for some
time the cache would not be lost if the computer crashes or loses power. This
has been the most common case for me -- the computer crashes while Mozilla is
open but I'm currently using another program. Would that be so hard to implement?
>...the index of cache entries is an in-memory data structure that is only
written to disk when the browser exits cleanly. however, data files are written
as we load pages. ...

How about this, then: On start, the index is loaded into memory but not deleted.
On a successful quit, the disc file is overwritten. On a crash, the in-memory
index file is of course lost, leaving you with a cache with a mostly complete
index file and a few unindexed data bits floating around, and possibly some
indicies that point to nothing. 

All this requires is that we ignore anything that's unindexed and that we delete
an entry and then rewrite it in a separate place to avoid corruption if
something changes, rather than rewriting it in the same place--do we do that
already? If we do, then this /should/ be easy to fix. 
the contents of the cache can change. your comment does not address that. I
imagine this would be especially a problem for the _CACHE_00x_ files.
Is this the same issue as bug #216490? 
To Comment #44 From rimez :

No, not same, but includes bug #216490 if it is caused by cache clear on unusual
Mozilla termination.
This bug, which had been closed as WONTFIX two years ago, was re-opened to
relieve not only offline users but also slow line users, eg. dial-up or
mobile-phone users, from unwanted/unneccesary cache clear on unusual termination
of Mozilla, by Comment #23 From Steve Chapel and Comment #24 From Darin Fisher
on 2003-12-08.
To Comment #35 and Comment #40 From Darin Fisher :

I looked into _CACHE_MAP_ , _CACHE_001_ , _CACHE_002_ and _CACHE_003_.
I found that these files do not contain file name of cache data files.
I've now understand your description slightly more, "the index of cache entries
is an in-memory data structure".
(I guess cache data files are pointed by file sequence number or relative
position or something else in map file.)
Sorry for my misunderstand.

The easiest way is automatic backup of cache directry, I think.
 - On shutdown,
   - After successuful cache index write back, set dirty to spare,
     copy current data files to spare, then copy index files to spare
 - On restart, 
   - both clean, same timestamp -> use any
   - both clean, different timestamp -> copy newer one, use newer one
   - one clean, another dirty -> copy clean one, use clean one
   - both dirty -> clear both, use any
With this small enhancement, current cache management mechanism can be used with
no change.
Problem I can imagine is that cache directry copy on restart/shutdown will cause
performance problem in restart/shutdown if cache directry is big.
So this automatic backup should be optional although cache size can be limited
by preference.
This is not sophisticated solution, though.

Second proposal is making _CAHCE_* files recoverable instead of simple copy of
in-memory control block.
If in-memory control blocks are converted to something which contains cache data
file name and timestamp and size etc. on termination, and if in-memory control
blocks are re-created from this something on restart, cache clear after
unexpected termination can easily be avoided in many situations.
This also does not require change of current mechanism, although in-memory
control blocks re-creation logic becomes complicated since user can easily
add/delete/rename/modify cache data files. 

Third. If both samll change and small enhancement are acceptable, I guess
solution by #42 From dolphinling is possible.

Darin Fisher, what do you think?
Im now truying to envolve all the coders I know to fix is. The main problem is
that im not programmer :(. I was thinking for a bit.. also reading the comments
of Darin Fisher. It seems that he is only one that knows much on the way Cache
mechanizm is implemented in mozilla and also FB. You was talking about the index
of cache entries which is kept only in memory until Mozilla is safely closed.
And when Mozilla or Fb or any other Gecko based, crashes the new index of cache
entries is lost, and not written to the disk. But data files of new-loaded pages
are written to the cache folder, but the old file of cache entries "don`t knows"
about new data and about new pages. So this what makes cache to become dirty
(and also to being destroyed by browser) - cache entry file is older than
data... anyways. Maybe there is such a variant so cache entries file will be
saved NOT when the browser exits but during the work of the browser for example
one time per 5 seconds or in any other way, for example if the page that is
viewing succesfully loaded. And even when browser crashes, cache entries file
will be enough synchronized with its data. So what do ya think about it?
Currently to Darin.
>Maybe there is such a variant so cache entries file will be
>saved NOT when the browser exits but during the work of the browser for example
>one time per 5 seconds or in any other way, for example if the page that is
>viewing succesfully loaded.

yes, i think this is what we want to do.
(In reply to comment #48)
> >Maybe there is such a variant so cache entries file will be
> >saved NOT when the browser exits but during the work of the browser for example
> >one time per 5 seconds or in any other way, for example if the page that is
> >viewing succesfully loaded.
> 
> yes, i think this is what we want to do.

that's bug 197431
Bug 197431 is about delaying the writing of cache entries to disk. The proposed
solution to this bug is writing the *index* of cache entries to disk *more
often*. Fixing bug 197431 would help to fix this bug by helping to keep the
index and the entries consistent more of the time. Maybe there should be a
dependency between the two bugs.
I am talking about Moz <= 1.7rc1 on W2K.

If Mozilla is loaded again after a forced termination it takes a while until it
reacts to user input due to this cache issue.

Could there be some information popping up at Mozilla startup? Please.

It was already mentioned in comment 7.
> Could there be some information popping up at Mozilla startup? Please.

I think that would be a separate bug. Maybe bug 122597 could be reopened, or you 
could submit a new enhancement request for this feature.
How about regarding the cache catalogs and index files _themselves_ as a cache
for the cache's meta-data? What I mean is that these files should not be the
primary sources for the cache's meta-data. Meta-data should be stored along with
each cache's entry (ie. in the same data file).

The underlying idea is to directly map URIs to cache data file names and make
the handling of single cache data files reliable (eg. if the file name is
"657648hjh455", then work with "657648hjh455.tmp" and rename it afterwards).
That way, is should be totally impossible to corrupt any data.

When Mozilla crashes, only the cached cache-metadata need to be rebuilt. The
nice thing about that solution is that the cache's meta-data cache could be
rebuilt in the background without delaying Mozilla's startup.

And, although such a background process does not necessarily need to be really
fast, it may be a good idea to keep "clean" backups of the cache's meta-data
cache. When the current meta-data cache gets "dirty", the new one could be
rebuilt largely out of that backup: copy of information for cache data files
whose timestamps and sizes have not changed.
> rebuilt largely out of that backup: copy of information for cache data files

"copy of" -> "copy over"

Is there any source code documentation available for the necko disk cache?
Don't make me laugh! ;-)  The documentation is the source itself.  I'm willing
to help you navigate it if you have specific questions.  Though I didn't write
the code, I've been through it several times and know the design well.
Keywords: helpwanted
What about making this bug dependent on bug #11050?
I think this bug is connected with the bug #246961. That bug is marked as
UNCONFIRMED, but some users noted in the comments above that similar behaviour
is occured.
(In reply to comment #58)
> I think this bug is connected with the bug #246961. That bug is marked as
> UNCONFIRMED, but some users noted in the comments above that similar behaviour
> is occured.

They look like duplicates to me; or at least they could be reworded and combined
into one.
*** Bug 265127 has been marked as a duplicate of this bug. ***
*** Bug 246961 has been marked as a duplicate of this bug. ***
please fix this or I WILL Turn back to opera
(In reply to comment #62)
> please fix this or I WILL Turn back to opera

That's the spirit! Let's get some action going! And nobody reply in 2 years time
saying "I note that somebody asked for some action 2 years ago and it looks to
me like Firefox is still one of the slowest browsers out there."

If this isn't a major issue that should be fixed ASAP then nothing is. It goes
towards the perception people have of the browser. If IE just 'seems' faster,
then people will use IE (or a derivative) instead.

Are the Gecko programmers still around? Maybe they've all died and there's
nobody left on the planet who knows how to get inside Gecko anymore. Oh well. RIP.

P.S. Paying for this fix to be implemented is also ridiculous. While I'm sure we
all understand that Firefox is completely free and looked after by a truly
dedicated bunch of talented individuals doing their bit for the community in
their limited free time in an amazing display of altruism in the face of
opposition from the multi-billion dollar MS empire, it sure is difficult to
understand how something like this can remain in the same semi-broken state for
the last 4 years. The above bug reports provide a good background story on this
whole issue, but at some point, talk must end and action begin.

Not intending to tread on any sensitive toes here. Just reviving a bug that
seems to have almost become lost in the mists of history...
thank you for volunteering to fix this very complicated bug. in the future i'd
suggest you volunteer to fix much easier bugs. this one involves rewriting the
entire cache architecture. please inform us as to your expected completion date.
Assignee: darin → ajhubble
Whiteboard: what needs to be done: see comment 48
(In reply to comment #64)
> thank you for volunteering to fix this very complicated bug. in the future i'd
> suggest you volunteer to fix much easier bugs. this one involves rewriting the
> entire cache architecture. please inform us as to your expected completion date.

Gee I apologise. It seems I posted in the wrong bug thread. I actually meant
this one: https://bugzilla.mozilla.org/show_bug.cgi?id=38486

But I think both are major issues. Losing the cache whenever you forget to close
down Firefox first before shutting down Windows is just no good. Actually, it
makes the cache pretty much useless.

Both of these bugs would have to be the two most pressing issues that would
prevent people from switching from IE, and wasn't that the whole point of
Mozilla  in the first place?

I'm not a programmer. If I had the know-how to fix these bugs, I would have. But
I don't. But it seems strange that nobody has tackled what should have been a
core issue before the mainstream release.
-> me (helpful comments welcome; please don't post gripes to bug reports!)
Assignee: ajhubble → darin
>But I think both are major issues. Losing the cache whenever you forget to close
>down Firefox first before shutting down Windows is just no good.

while that's true, that's a different bug.
(In reply to comment #67)
> while that's true, that's a different bug.

Should bug 145469 be reopened?
(In reply to comment #68)
> > while that's true, that's a different bug.
> Should bug 145469 be reopened?

When power failure or BSOD on MS Win, problem won't resolved until this bug will
be fixed.
But if Mozilla can close at least cache control files on non-normal termination,
many cache loss on "usual" abnormal termination (usually caused by
logoff/shutdown without termination of Mozilla/Firebird) will be avoided.
Is it possible? (especially on MS Windows)
If possible, I think that not all but not so small number of cache loss problem
can be relieved and re-open of Bug 145469 is reasonable.
I still think we should do it the KISS way. Drop the cache control files and map
the cache object identifiers (URLs?) directly to file names. Put checksums into
these files and that's it (almost).
(In reply to comment #68)
> Should bug 145469 be reopened?

yes (unless we have an open bug on the issue); programs are supposed to exit
cleanly on windows shutdown. mozilla must do something strange so that it
doesn't happen.
The bug about Mozilla losing the cache when Windows is shut down is bug 212251.
For clarification, this bug is about losing the cache when the computer crashes
or loses power.
OS: Linux → All
Hardware: PC → All
FYI, we are planning a rewrite of the cache based on SQLite.  Part of this
rewrite should solve this bug (hopefully).  If you are interested, see the
beginnings of this code here:
http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDeviceSQL.cpp
Darin, that's excellent news!
When do you expect to be able to have it functioning on the trunk?
Are you going to use SQL lite for other database files?
And why do you use Blobs for metadata?
   For purposes of debugging, it's often best to avoid blobs where possible,
   so can use some standard viewer/editor tool to inspect and repair the database.
> When do you expect to be able to have it functioning on the trunk?

post mozilla 1.8


> Are you going to use SQL lite for other database files?

there is a plan in the works to move all (or nearly all) of mozilla's database
files into SQL lite.


> And why do you use Blobs for metadata?

because the metadata may contain null bytes.


>    For purposes of debugging, it's often best to avoid blobs where possible,
>    so can use some standard viewer/editor tool to inspect and repair the database.

makes sense, but here you could also just delete the offending entry.  it's only
a cache, which is not a critical database.  or, if desired you could write your
own tool using SQL lite's API to do whatever you need to do.
*** Bug 282936 has been marked as a duplicate of this bug. ***
It seems to me that a partial solution to this bug should be simple. Just leave
the disk cache flag always set to clean unless the disk cache is being written
to at shutdown or sooner if other ideas are implemented. This way there would
almost always be a clean cache to reload into the program at startup. If the
browser crashes then there would at least be old web page data for the browser
to start with and compare with the present page.
> Just leave the disk cache flag always set to clean unless the disk cache is 
> being written to at shutdown or sooner if other ideas are implemented.

I'm sure I've said this before, but this solution requires that we periodically
sync the index file to disk.  Right now, the index file (_CACHE_MAP_) is kept in
memory and is only written to disk at shutdown.  Your proposal is not as simple
as you might think, and more-to-the-point: patches are always welcome ;-)
> Right now, the index file (_CACHE_MAP_) is kept in
> memory and is only written to disk at shutdown.

Then a possible solution would be to give the disk cache it's own index that is
synced at shutdown along with the rest of the cache or whenever the disk cache
is written to.

Darin: I wish I still had the ability to write patches. I haven't written any
code in 30 years.
> Then a possible solution would be to give the disk cache it's own index that 
> is synced at shutdown along with the rest of the cache or whenever the disk 
> cache is written to.

I'm not sure I understand what you mean (we have an index that is only sync'd at
shutdown), but let me try to clarify.  Some of the disk files are modified
during a browser session, but the index itself is not written out until the end
of the session.  So, if we crash, then we have an old index file that may not
correspond to the other files stored in the Cache folder.


> Darin: I wish I still had the ability to write patches. I haven't written any
> code in 30 years.

OK, that's understandable, but please also understand that there is a reason for
this bug being around for so long.  If it were a trivial thing to fix, it would
have been fixed already.  I'd love to take time out to fix this bug, but I
haven't had that luxury.

BTW, it feels like I'm repeating myself (see comment #20).
What I'm suggesting is that an index file be created, if it doesn't already
exsist, that resides with the disk cache file to reflect what is in the disk
cache. That the disk cache is never written to without updating the index to
reflect the changes. This doesn't have to be done with every web page that is
viewed in the browser. They can be put in the ram cache until they reach some
preset level then transfered in a block, updating the disk cache index at the
same time. Then at shutdown it can be brought completely up to date for the next
start of the browser. This way a person might loose some of the cache but almost
never all of it.
*** Bug 284065 has been marked as a duplicate of this bug. ***
Where is the 'dirty' flag stored? If it's stored in some easy-to-access place,
it would be trivial to write some kind of launcher that would check the flag
upon startup and restore the cache from a backup if flag is 'dirty'.
It's stored in the _CACHE_MAP_ file (in the first few bytes somewhere).  See:
http://lxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheMap.cpp
Blocks: majorbugs
No longer blocks: majorbugs
I've been thinking about this bug some more and another question came to mind.
Why is the cache data being written directly to disk in the first place? If a
temporary cache is used, residing in ram, with it's own index for recent  pages
and a link to the disk index , retrieving pages would be faster and actually
speed up the browser. The disk cache  and index could then be updated
periodically when the browser is idle during page reading, or at a preset time
interval if the idle time doesn't occur first.
Unfortunaley, I don't know C++. Could anyone write a very simple loader, that
checks the dirty bit and then:
* starts FF if cache is 'clean', or 
* restores the cache from a given location and starts FF.
> Unfortunaley, I don't know C++. Could anyone write a very simple
why cant you all understand: nobody (of developers) cares
even if you knew c++ it wouldnt be enough

the only thing that may help is suggestin' money (and not $100) for the fix
i bet most of you are window$ users so why don't you just buy opera?
(In reply to comment #86)
> Unfortunaley, I don't know C++. Could anyone write a very simple loader, that
> checks the dirty bit and then:
> * starts FF if cache is 'clean', or 
> * restores the cache from a given location and starts FF.

For Mozilla under Windows there is a "bad" way, possible in FF too: Patch
Mozilla (tested with 1.7.8) with

patch.exe components\necko.dll /O=\83\7E\50\00\75\64 /N=\83\7E\50\00\90\90 /X /Z=1

Then Mozilla will never seen the dirty flag. Patch.exe is from
ftp://ftp.heise.de/pub/ct/listings/0014-184.zip [10kb].

But if you have a problem first delete the cache before you send any bug
messages to Bugzilla!
(In reply to comment #88)

> For Mozilla under Windows there is a "bad" way, possible in FF too: Patch

Thanx, but I've already written cache backup & restore scripts, so patching is
not a good solution for me..  The idea of a loader I've offered above is the
best workaround IMHO, since it doesn't alter browser's behaviour in any way. 
can you guys discuss workarounds elsewhere? bugzilla is for tracking bugs and
their fixes.
Blocks: 216490
Cache Fixer 1.0 - The Extensions Mirror
(http://www.extensionsmirror.nl/index.php?showtopic=3630)
*** Bug 291903 has been marked as a duplicate of this bug. ***
Blocks: 291224
Flags: blocking-aviary2.0?
Yeah, I too find this "cache vanishing" quite annoying; it's mainly because I am a dial-up user, and so I am trying to avoid downloading these pages/graphics more times than necessary ...


But there is a solution out there, as suggested to me in the "Cache and closing the Firefox with EndTask method" thread: http://forums.mozillazine.org/viewtopic.php?t=359655, that I opened on MozillaZine forums. It's a "Cache Fixer" extension: http://imglikeopera.mozdev.org/cache_fixer.html, which drops the "dirty flag" on every startup so that browser always see the cache as uncorrupted/clean, when the cache is left in a dirty/unexpected state on Firefox process exiting. 


P.S., For more info, also see the "About the cache-files and closing the "firefox.exe" process with "EndTask" method" thread:
http://episteme.arstechnica.com/groupee/forums/a/tpc/f/99609816/m/571007496731, that I opened on ArsTechnica forums.


best regards,
Ivan Tadej, Slovenija, Europe
http://users.volja.net/tayiper
I really doubt that this can be fixed in a branch-safe way, minusing.
Flags: blocking-aviary2? → blocking1.8.1-
The Cache Fixer extension is not 1.5+ compatible, it seems to cause a bug with signons. So there is now no workaround.
(In reply to comment #95)
> The Cache Fixer extension is not 1.5+ compatible, it seems to cause a bug with
> signons. So there is now no workaround.
> 

Well hmmm, that's pretty strange. You see, I am running Firefox 1.5.0.1 (i.e. upgraded from  1.5 with Cache Fixer extension installed), and I am having no problems with stored passwords whatsoever; at least apparent/obvious ones ...


--
<end>

best regards,
Ivan Tadej, Slovenia
http://www.tadej-ivan.be/
(In reply to comment #96)

Are you using Cache Fixer 1.0 or 2.0? I was using 2.0 with Firefox 1.5.0.1.
Priority: -- → P3
QA Contact: tever → networking.cache
Target Milestone: Future → mozilla1.9alpha
Assignee: darin → nobody
Target Milestone: mozilla1.9alpha → ---
*** Bug 348162 has been marked as a duplicate of this bug. ***
*** Bug 348557 has been marked as a duplicate of this bug. ***
What if the nsDiskCacheDeviceSQL cache device is used instead of the default nsDiskCacheDevice?  I suppose because SQLite maintains ACID properties for transactions, this problem would be solved?
In can live with the cache being cleared after a real crash. But if the operating system is rebooting (or the user is logging out), firefox does not exit cleanly and the cache is also cleared in this case!

There are many bugreports on the unclean exit: bug 333907, bug 365749, bug 336193,...
Attached patch WIP Patch v1 (obsolete) — Splinter Review
OK, I have a patch to fix this issue.  I'm basically saving the cache map and and cache block file in-memory data structures when they're modified.  To test this patch, I start with an empty cache directory, open a few tabs and pages, and verify about:cache?device=disk to make sure that some stuff exist in the cache.  Then, I close Firefox using task manager and re-open it.  I'm getting an assertion repeatedly when resurrecting the cache of the previous session:

<http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/netwerk/cache/src/nsDiskCacheMap.cpp&rev=1.28&mark=540-541#510>

I couldn't figure out what I'm doing wrong which causes this assertion.  Can anyone take a look into this?  Darin, maybe?

Thanks!
Assignee: nobody → ehsan.akhgari
Status: NEW → ASSIGNED
This is an issue reported long ago, and is quite serious for users with slow Internet connections (see the number of votes).  Because session restore is enabled by default, all such users experience this issue.  Having an empty cache when restarting the browser in case of a crash/power outage would mean a lot of time wasted to wait for stuff already downloaded, and in addition, many of the restored pages could face network timeout issues because of the low bandwidth.  Other major browsers (including IE) do not have this problem.

Now that this has a WIP patch, I'm requesting blocking on this bug so that we finally get it fixed for the upcoming Firefox 3 release.
Flags: blocking1.9?
Keywords: helpwanted
Whiteboard: what needs to be done: see comment 48 → help needed to test the WIP patch, see comment 103
Target Milestone: --- → mozilla1.9beta4
We can't block the release for a 6 year old bug at this stage - but Ehsan that doesn't mean we'd consider a properly reviewed and unitested patch if the risk was reasonable enough.  Or that we wouldn't take it in a dot release.  
Flags: blocking1.9? → blocking1.9-
When I exit firefox cleanly and then reopen it, the images are loaded but then they all start reloading. Would that be related to this bug?
no, it also doesn't mean that this is a bug, depending on the server headers.
(In reply to comment #97)
> (In reply to comment #96)
> 
> Are you using Cache Fixer 1.0 or 2.0? I was using 2.0 with Firefox 1.5.0.1.
> 

As first sorry for a quite a bit delayed response... Anyway, I didn't even know that there are Cache Fixer version 1.0 and 2.0. But well, as you can read here "Cache Fixer :: Firefox Add-ons": https://addons.mozilla.org/en-US/firefox/addon/6859 (i.e. the "official" plugins's page; btw. my comments/reviews are signed as "satyr_moz), I cannot make it work with Firefox 3.0 even though I changed the "install.rdf" file (i.e. changed the "maxVersion" to 4.0) etc. I was so happy to find this plugin/extension and use it with FF 2.0 since I am still on dial-up and so loosing all the cached stuff really made my head hurt!!

--
<end>

best regards,
Ivan Tadej, Slovenia
http://tadejpersic.50webs.com/
Target Milestone: mozilla1.9beta4 → ---
Blocks: 493436
Oldest bug I ever seen. 8 years bug. I guess this bug cannot be fixed in time for Firefox 3.5.
Keywords: perf
Whiteboard: help needed to test the WIP patch, see comment 103 → [notacrash] help needed to test the WIP patch, see comment 103
How are looking to get this fixed? 

When I found out through first hand experiance that the cache is deleted when firefox crashes I thought it was DESIGNED as a blanket to keep whatever crashed firefox from staying the cache and corrupting the cache storage. 

Though I think it should be more a advanced option for power users to be able to choose whether or not the cache is deleted when a crash occurs.
Kyle@113: Unless you have actual experience of the cache causing crashes, there's nothing wrong with using the cache. There are existing mechanisms to prevent crashes when restoring the session; IIRC if the browser crashes twice in a row about:sessionrestore comes up and the user can pick tabs to restore.
(In reply to comment #114)
> Kyle@113: Unless you have actual experience of the cache causing crashes,
> there's nothing wrong with using the cache. There are existing mechanisms to
> prevent crashes when restoring the session; IIRC if the browser crashes twice
> in a row about:sessionrestore comes up and the user can pick tabs to restore.

I'm just putting out ideas if they were to change the way the cache system works
Please, this bug needs to be fixed ASAP...the cache problem may be FF only major flaw. Cache is really important for users with a lower internet connection such as me or those with a net browsing limit and this era is definitely needs to be re-looked at as people have been complaining about it for over 5 years now
Updating to reality: I won't have the time to work on this for the foreseeable future!
Assignee: ehsan.akhgari → nobody
Status: ASSIGNED → NEW
Is this bug still present?

The current version of firefox is 3.6; this is tagged as affecting 1.9, which should be obsolete by now. Right?

If not, then I'm looking at these 8-year old comments:

-- What if we write the cache meta-data out in a journaling manner, like how
journaling file systems do: every modification to the cache gets appened to the
meta-data file, so if Mozilla crashes, we just have to step through the
journal to recreate the proper cache state.  Then we'd only have to delete
the last object to be cached, just in case that object had gotten corrupted.


-- It's possible but that is too much work.. 

Somehow, the idea of a data journaling library system seems ... like something that would be wanted over and over again. Has no one written this? Can't a programmer who knows what's going on in a journaling file system just rip out the journal support stuff and put it into a generic data storage library?
Blocks: http_cache
The 'bug' is still there. When a Mozilla app crashes, the cache is marked dirty, and will be deleted on next startup. 
Just 'ripping' some journalling code will not help. This is very complex stuff, and it very easily impacts performance of Firefox. Any action that is added to cache handling will result in performance degradation.
Three things will help reduce the impact of this bug:
1. Prevent crashes: a lot of work has been done, and with official releases crashes are now much rarer than 10 years ago.
2. Recover a 'dirty' cache: this will need code to rebuild the cache recovering as much as possible, instead of just deleting. But is tricky, as there are no filesizes or CRC checks that can be done on the cached items.
3. Replace with some other cache system. Also very difficult...
Blocks: 573012
No longer blocks: 573012
(In reply to comment #121)
> 2. Recover a 'dirty' cache: this will need code to rebuild the cache recovering
> as much as possible, instead of just deleting. But is tricky, as there are no
> filesizes or CRC checks that can be done on the cached items.

This thing can be made a lot easier by treating cache entries as immutable files. E.g. you have an index mapping of <URL + etag/timestamp> -> <SHA1(file content)> and store cache entries on the file system as cachedir/hashed_filename.mozcache or something like that.

This way files don't have to be replaced, only added and deleted and are immutable for their lifetime. Their content can also be rechecked. So all that's necessary is recovering the index file and scanning if all files are still present.
Flags: in-testsuite+
Flags: blocking1.9-
Flags: blocking1.8.1-
Flags: in-testsuite+
How hard is this really?

It sucks having to get back the large amount of cache every time I shutdown my PC or another program BSODs my PC.
Especially for some users on slower connections.

The fix is so easy it isn't funny, just do what the guy says in comment 70 or copy the index file that's saved on shutdown to disk every time the cache is modified or at specific intervals, say every 10 minutes.

Why hasn't this been fixed in 10 years?
Actually fixing this is very, very hard.
The cache consists of three parts:
1. The index/map
2. Blockfiles containing the small pieces.
3. Files scattered in the cache directory.

Writes happen to each of them asynchronously, so at time of a crash the status of each is unknown. The only way is to make 'snapshots' at specific points of time in a stable moment. This requires snapshotting the cache filesystem as a whole which is very costly. Also any journalling required to keep track of changes since the last snapshot is costly for the normal operations (page loading and such).
(In reply to comment #126)
I think snapshotting the entire cache would be the wrong approach anyway.

Instead the cache should be built around either immutable or checksummed objects that allow the index to be verified after the crash. The idea is not relying on the state to be saved consistently after shutdown, instead we need a way to verify the consistency after a crash.

I.e. we need
1. a way to check the contents of the index and discard individual entries that are inconsistent.
2. to make sure that individual cache entries can never be in a corrupt state that cannot be detected


Here's an example how things could be done:
A) large files.
They're stored under sha1(file contents) as their file name. This way they're immutable and thus cannot be corrupted
Metadata is kept in the index file and mapped to file hashes
B) small files.
They're stored under sha1(domain name) and each new small file is appended as <sha1(content), length, content> tuples to the file. This way only the last entry can be corrupted with a partial write and this is easily detectable based on its length or hash.
Which file to load can be derived from the domain name stored in the index file.

The index gets copied every N minutes. After a crash all you have to do is load the old index and purge files that are not present in the index. All read operations are easy to verify without the index anyway, so no risk of data corruption.

Block files can occasionally be compacted either with some copy-modify-move logic.
It just seems every other browser doesn't have this issue, why Firefox?
An entire 4.x branch on Win STILL has this stupid bug!

In a this March I've got this twice on a two nighties (4.1pre and 4.2pre).

<<Reported: 2001-10-20>> Damn! Is this a kind of Mozilla trademark?! WHY a MozTeam looses time on a **** like an angled then rounded then semiangled tabs instead of fixing ancient bugs?!
I don't know what is the point of reporting this bug - if it has not been fixed in ten years it probably never will be a priority, but for what it is worth...

The fact is there will always be bugs and there will always be crashes. That will never change - it has not changed in the last ten years and it will not change in the next ten years. But when the browser crash, or close unexpectedly for whatever reason, you cannot simply wipe the entire cache. That is not an acceptable solution.

If it is too difficult to fix this bug immediately, a possible workaround for the interim is to limit the time that the cache stays in the dirty state (read/write mode), ie flush all files and clear the dirty flag (change to read only mode) whenever the browser is at idle. This might reduce the number of times that the entire cache will be lost.
(In reply to comment #130)
> If it is too difficult to fix this bug immediately, a possible workaround for
> the interim is to limit the time that the cache stays in the dirty state
> (read/write mode), ie flush all files and clear the dirty flag (change to read
> only mode) whenever the browser is at idle. This might reduce the number of
> times that the entire cache will be lost.

Unfortunately that wouldn't work either, since the next time that something changes in the cache (read: a few milliseconds later perhaps), the bitmap entry would be inconsistent with the cache maps again.  And writing that to disk every time is also not an option, due to the performance impact of doing so.

As somebody who used to be on a dial-up connection for many years, I once tried to fix this.  In the process I realized that fixing this bug with the current design of the disk cache is extremely difficult.  And rewriting the disk cache code is not trivial either.  So trust me, this bug going unfixed doesn't mean that nobody cares, it's just too difficult to fix, and the amount of effort for doing so has outweighed its benefits in the past.

But the good news is that I've heard that the Necko team has plans to overhaul the cache, which would fix this issue, among others.  So stay tuned.  :-)
Damn! 500Mb cache went away!

I constantly dreaming for that every developer that is associated with this part of code was sentenced by the court to a three months of dialup or gprs. Without appeal.

WHY NOT implement a quick-and-dirty -- but really working! -- method from the CacheFixer extension instead of a talks then talks then talks again abt a "complete redesign"?
(In reply to Ehsan Akhgari [:ehsan] from comment #131)
> (In reply to comment #130)
> But the good news is that I've heard that the Necko team has plans to
> overhaul the cache, which would fix this issue, among others.  So stay
> tuned.  :-)

Could you point to the appropriate link, docs or bug regarding it ?
It will really help those especially who face the net browsing/data usage limit problem.
(In reply to vitar from comment #133)
> WHY NOT implement a quick-and-dirty -- but really working! -- method from
> the CacheFixer extension instead of a talks then talks then talks again abt
> a "complete redesign"?
Could you post your OS, and Firefox version ?
http://kmeleon.sourceforge.net/forum/read.php?9,98959
You can also try here:
https://groups.google.com/forum/#!topic/mozilla.support.firefox/7N6xI63SbjQ
(In reply to Shriram ( irc: Mavericks ) from comment #135)
> Could you post your OS, and Firefox version ?
I'm on XP with Aurora, currently v9
CacheFixer is v1.x (some v1.0 fixed by somebody for a FF4 "Great Shift" AFAIR)

Unfortunately this version of CF stopped to work at the end of v7 branch (or immediately after a FF v8 was out, I don't remember exactly) because of FF internal changes -- CF can't overlay a new cache-related stuff (rewriten and optimized, LOL) and therefore didn't zeroes a "dirty" flag.
I'll mention that with recent changes in Firefox which automatically sets the cache size based on the free disk space (or size of disk), the cache can literally be hundreds of MB in size.   For example the max cache size on my PC as listed in about:cache is 819 MB.

This becomes problematic even for people with fast connections as cleaning out the cache takes a considerable amount of time when the cache is that large.  For example after a crash on my machine, Firefox's SessionStore component actually generated a "taking too long" alert while the cache is cleaned.


I'm curious why the _CACHE_MAP_ file(s) can't be written to the disk at the same time as the cache data?  That would keep things in sync.  Then change the dirty "flag" to a last updated time float.  On a crash simply clear out any cache data later than the last updated time.

A side effect of doing that, would be that checking the "Cache" box in the "Clear Recent History" and choosing one hour would actually delete the last hour of the cache, instead of the entire cache like it does now.
(In reply to Michael Kraft [:morac] from comment #137)
> I'm curious why the _CACHE_MAP_ file(s) can't be written to the disk at the
> same time as the cache data?  That would keep things in sync.  Then change
> the dirty "flag" to a last updated time float.  On a crash simply clear out
> any cache data later than the last updated time.

If there's a write error for the _CACHE_MAP_ , what would replace the corrupt file(if corrupted)?
(In reply to Shriram ( irc: Mavericks ) from comment #138)
> If there's a write error for the _CACHE_MAP_ , what would replace the
> corrupt file(if corrupted)?

if a nsISafeOutputStream was used then the original _CACHE_MAP_ file wouldn't be corrupted on a write error.  Any file written after the last valid _CACHE_MAP_ would be considered invalid (just like after a crash).
I've been using Firefox forever on dial up but only became aware of this problem in the last six months.  So a large part of the frustration I and my family have had is due to this cache deletion.  I think I'll only recommend Firefox with noscript when extra extra security is needed.  

I lost a 130mb cache last week, just lost 60mb cache due to TV card crash.
60mb would take me 4 hours to download in ideal conditions, much longer for web page content I expect.  It's staggering to consider how much of my time has been wasted by firefox.  4hrs x 52 weeks x 10 years =2080hrs?

If someone like me is only now aware of this flaw I'd guess less than 2% of people severely affected will ever know.  Very disappointed, disillusioned even, what a shocking thread this is.  Firefox just isn't suitable for dial up, it never was it seems.
Could you mention the operating system(OS), along with the version of Firefox you use?

(In reply to fredfish from comment #140)
> I've been using Firefox forever on dial up but only became aware of this
> problem in the last six months.  So a large part of the frustration I and my
> family have had is due to this cache deletion.  I think I'll only recommend
> Firefox with noscript when extra extra security is needed.
(In reply to Shriram ( irc: Mavericks ) from comment #141)
> Could you mention the operating system(OS), along with the version of
> Firefox you use?

:-D  Are You kidding? Look over this thread from the top.

OS: ANY Windows
Firefox: ANY version

Not sure for xNIX but because of similar logic it's looks possible in some cases like pkill etc.
Comment on attachment 300868 [details] [diff] [review]
WIP Patch v1

Review of attachment 300868 [details] [diff] [review]:
-----------------------------------------------------------------

Alas, this patch solves cache corruption by using fsync (aka PR_SYNC) to flush out each and every cache record we write, and that's too high of a performance hit, especially on some filesystems (which will do a global filesystem flush each time fsync is called).  

We're in the process of evaluating whether it makes sense to tackle this crash-recovery issue on its own (we might want to do what Chrome does, i.e. write a hash of each entry at the end of it and detect corruption if the hash is wrong/missing, and only throw out the cache then) or whether it should come as part of a redesigned cache that addresses a whole host of other shortcomings with our current cache architecture (the list is long: performance, mobile support, pinning items, many others).  

If you're a user and not losing your HTTP cache when the browser crashes is a big deal to you, I can understand the frustration.  We're mitigating the situation in the meantime by reducing crashes (ex: Flash and other plugin crashes no longer cause the browser itself to crash, so no cache loss).  While we figure out the right solution to this bug and find programmer time to fix it, it'd be great to not turn this bugzilla entry into a venue for venting about the issue.  Thanks for your support (or at least, hopefully, your courtesy :)
Attachment #300868 - Flags: review-
Current crashes is very bad.
Proposed fush-on-every-change is far from good too.
And old trick with resetting of "dirty" flag is bad because of someone claimed abt potential craches.

At the same time:
- There is nothing hard in the finding of wrong (or potentially wrong) MAP and DISK entries (at least by time)
- FF curently does some bg jobs already and nobody calls it unneccessary or big penalties (update checks, places cleanup etc.)
- After EVERY crach every affected user looses time redownloading of lost data

The right decision is clear and very simple:

Make FF to check and if neccessary to correct or simply disregard a wrong entries right on startup if it finds this damned "dirty" flag.

Details of logic can vary. The simplest way is to
1) launch, check the flag, and if the flag set then
2) disregard all cache entries of last session entirely or after some date-time 
The more complex is to
2) walk through the CACHE hierarchy and validate their consistency on by-item base like utlity proposed above by someone.
Of course this can be made optional and even hidden in abt:conf

Anyway for a crash-affected users this (sometimes huge) delays is inevitable so let spend this time on something useful instead of stupid redownloading.
So the problem is that flushing to disk is too slow (at least, has too much latency for the main thread to have to wait for it), and the reason you want to flush to disk is that the data has to be in a consistent state (that is, Firefox can't be updating it while you're copying it over in the background), right? So why not copy it synchronously into RAM, then flush that to disk asynchronously? As long as you're only updating the chunks that need updating, the latency should be okay (certainly better than waiting for the HDD), and if you make sure the asynchronous task is atomic and finished synchronously on shutdown it should be safe.

I guess implementing something in practice is always harder than it sounds, but am I missing anything major?
(In reply to Jason Duell (:jduell) from comment #143)
> we might want to do what Chrome does, i.e. write a hash of each entry at the end of it and 
> detect corruption if the  hash is wrong/missing, and only throw out the cache the
If applied to FF, how would it affect performance ?
Whiteboard: [notacrash] help needed to test the WIP patch, see comment 103 → [snappy][notacrash] help needed to test the WIP patch, see comment 103
I think we should lazily check cache entries after a crash. There is no reason to throw away our cache or eagerly re-verify it.
Whiteboard: [snappy][notacrash] help needed to test the WIP patch, see comment 103 → [snappy:p2][notacrash] help needed to test the WIP patch, see comment 103
(In reply to Shriram (irc: Mavericks) from comment #146)
> (In reply to Jason Duell (:jduell) from comment #143)
> > we might want to do what Chrome does, i.e. write a hash of each entry at the end of it and 
> > detect corruption if the  hash is wrong/missing, and only throw out the cache the
> If applied to FF, how would it affect performance ?

Except for very large things like ISOs and videos, the CPU cost of calculating the hash or checksum would not be too high. Also, if the choice is hash vs. no hash, it doesn't have to be a cryptographic hash; it can be a very fast one. Some processors even have CRC-32 instructions now (see SSE4.2) that would make it basically free.
Doing a CRC check on a file before using it, will never be free. All the bytes will still need to be scanned completely, before the byte stream can be passed to a image or video decoder which are all optimized to decoding during loading, showing the first (part) of an image or video on;ly having read part of the file.
The question still is if keeping the cache is really that useful in case of crashing. Keeping a crashed cache may ensure that firefox crashes again, and again, ...
Comment on attachment 300868 [details] [diff] [review]
WIP Patch v1

FWIW, my patch here is probably not what we want to do at all, so there should be very little point in testing it.  And I would be surprised if it even would apply cleanly after about 4 years.  :-)
Attachment #300868 - Attachment is obsolete: true
(In reply to Brian Smith (:bsmith) from comment #148)
> (In reply to Shriram (irc: Mavericks) from comment #146)
> > (In reply to Jason Duell (:jduell) from comment #143)
> > > we might want to do what Chrome does, i.e. write a hash of each entry at the end of it and 
> > > detect corruption if the  hash is wrong/missing, and only throw out the cache the
> > If applied to FF, how would it affect performance ?
> 
> Except for very large things like ISOs and videos, the CPU cost of
> calculating the hash or checksum would not be too high. Also, if the choice
> is hash vs. no hash, it doesn't have to be a cryptographic hash; it can be a
> very fast one. Some processors even have CRC-32 instructions now (see
> SSE4.2) that would make it basically free.

Good points re file size and hashing. I think rereading the whole file is an ok price to pay even for large file(especially if one does chunked hashes to reduce the delay).
Sequentially reading in an occasional large file for checksum beats both fsync of everything preemptively(ie while writing out the cache, cpu time is cheaper than io) and re-downloading(assuming the harddrive isn't an overloaded piece of crap, but this is a story for another bug).

For operating systems that let us query when the file was written out to disk we can set a global flag to not check the file again.
For reference: 
There are apis of varying quality that tell us when data has hit disk. On linux there is FIEMAP_EXTENT_UNWRITTEN http://www.mjmwired.net/kernel/Documentation/filesystems/fiemap.txt#121

On Windows you can play with FILE_FLAG_OVERLAPPED combined with FILE_FLAG_WRITE_THROUGH to get notifications of when data hit disk. Other options to look into: FILE_FLAG_WRITE_THROUGH, FILE_FLAG_NO_BUFFERING. Some combination of flags seems like it would help to avoid evicting smaller cache entries while caching bigger ones.
Whiteboard: [snappy:p2][notacrash] help needed to test the WIP patch, see comment 103 → [snappy:p2][notacrash]
Blocks: 716949
I think Fennec could really benefit if this bug were fixed. On Android, any apps in the background may be OOM killed. This occurs frequently, and when it does, we need to restore the session as quickly as possible so that the OOM kill is transparent to the user. If the cache is lost, this significantly slows the browser restoration - especially considering that all resources must be re-fetched on a mobile network.
I filed bug 745075 to see if we could flush the disk cache when going into the background on Android. I think the situation on Android is less intrusive than on desktop since we have a trigger point where we could do a single costly flush - if that's enough to save the cache.
Face this problem so many times
When Firefox crashes or is ended via task-manager it loses all its cache(unlike other browsers-opera,chrome,ie,safari which keep it)
Firefox should no discard this cache or at least provide an option to the user to choose what should be the default way ie : always keep cache no matter what or discard when browser crashes(not all the people have broadband & many people keep the cache for weeks for various reasons)
dropping the "dirty flag" on every start-up so that browser always see the cache as uncorrupted/clean, when the cache is left in a dirty/unexpected state on Firefox process exiting should be helpful too(see opera's implementation of cache)
beelzebub360: You don't tell us news in this 11year old bug report. The facts are all known. The cache needs to be rewritten and that costs manpower. Just dropping the dirty flag is nonsense as it would lead to a broken cache with not loading pages and other malfunctions.
(In reply to Matthias Versen (Matti) from comment #158)
> beelzebub360: You don't tell us news in this 11year old bug report. The
> facts are all known.
okay so you know its 11 years right? & pretty nasty one?

>The cache needs to be rewritten and that costs
> manpower
Well that's why we all are here? to make it better?
(any small contribution helps)

>Just dropping the dirty flag is nonsense as it would lead to a
> broken cache with not loading pages and other malfunctions.

Thanks for that didn't know that :)
The current cache is very tight and heavily optimized. Any little change has the risk of making it behave worse. But any sensible improvement in the form of a patch, including tests and benchmarks that proves that the patch makes the cache better is seriously welcome. Any discussion about the 'badness' is not.
Blocks: 753813
According to http://metrics.mozilla.com/data, roughly 10% of startups encounter a corrupted cache
That's both astoundingly bad for users, and not surprising as many people never really exit the browser.
(In reply to Randell Jesup [:jesup] from comment #162)
> That's both astoundingly bad for users, and not surprising as many people
> never really exit the browser.

Totally agree!
this bug should be Top of the To Do list
Couldn't we kill 2 birds with one stone - encrypt the cache when its written for security, which would also provide file hashing etc?

Is there any reason they couldn't move the cache to a sqlite database?

Or have a "Safe" cache directory, which is the cache from the last successful shutdown. Create any changes into a new directory, which would get merged with the "safe" cache on a successful shutdown. that way if the client crashes it can still load from the "safe" cache?
One issue there is that FireFox already takes a LONG time to shut down, and making it longer would be a pain.

The time for Firefox to quit is so long that it times out my "logout" cycle on OsX 10.7 -- I have to manually quit Firefox first, and wait for it to exit, before I can logout.
@michael-bugzilla-firefox@stb04.nccom.com,

> The time for Firefox to quit is so long that it times out my "logout" cycle on OsX

hmm, that sounds like a different bug.  Could you open a new bug for it?  Make sure to list your firefox version and exactly how long it takes to shut down:  info on how big your cache directory is would also be useful.  (But in recent versions, we're no longer deleting the cache at shutdown, so your issue is likely to be something besides the cache.)
(In reply to Jason Duell (:jduell) from comment #166)
> @michael-bugzilla-firefox@stb04.nccom.com,
> 
> > The time for Firefox to quit is so long that it times out my "logout" cycle on OsX
> 
> hmm, that sounds like a different bug.  Could you open a new bug for it? 
> Make sure to list your firefox version and exactly how long it takes to shut
> down:  info on how big your cache directory is would also be useful.  (But
> in recent versions, we're no longer deleting the cache at shutdown, so your
> issue is likely to be something besides the cache.)

That is bug 731741, which is actively being worked on.
I wanted to pass by a strategy for this, please let me know if you have any objections or see any flaws. 
If not, I will go ahead and make a patch for the below, built on top of Ehsan's patch. (We'd end up pushing Ehsan's patch and my modified patch on top of it).

So the idea is similar to what Ehsan did, but with the following changes:
- Keep the mHeader.mIsDirty flag
- Remove mHeader.mIsDirty = true and the flush header from Init, also initialize a new variable IsFlushed = false (IsFlushed is not part of the actual header since that's written to disk.)
- In the member functions that changes the header, set mIsDirty = true and flush the header only on the first call by:
  - If !IsFlushed, then set mHeader.mIsDirty = true and flush the header. 
  - Set IsFlushed = true so that we won't re-flush the header on subsequent calls.
- Collect the indexes that need to be flushed in the meantime.
- On a timer issued to the same cache IO thread, if we detect there was no change to the cache within a threshold time X, then
  - Flush the modified records
  - Set mIsDirty = false
  - Flush the header
  - Note: This timer can be pretty frequent since we are gating it by the threshold time X.
- PR_SYNC is not needed because if there is a crash at ANY point when the header is dirty, the mIsDirty = true header was already written
- Flushing records would of course always come before flushing the header.

The benefit of this is that we can probably dramatically reduce the 12% of corrupted cache startups on Windows/Mac (and from filtering the Telemetry data, I seen it is 20% on Linux).
Perhaps a large part of this 12% is from killed processes, or from crashes that don't happen exactly around the time we modify cache. 

The benefits are also obvious, we would have faster page load times under these used to be cache-clearing circumstances, and we wouldn't issue extra IO slowing down the system to delete old cache.
This sounds good, so long as writes of the header are preceded and followed by flushes of outstanding IO (fflush() or equivalent) so as to guarantee write ordering (and I assume this is the plan).  Note of course that the flushes can occur later; they just have to occur between the header and cache data writes.
(In reply to Randell Jesup [:jesup] from comment #169)
> This sounds good, so long as writes of the header are preceded and followed
> by flushes of outstanding IO (fflush() or equivalent) so as to guarantee
> write ordering (and I assume this is the plan).  Note of course that the
> flushes can occur later; they just have to occur between the header and
> cache data writes.

This is the plan except for the header write that happens when !IsFlushed at the start of a spurt of cache usage.  In that case we simply need to tweak the header on disk to store mIsDirty = true without changing anything else.
Could someone from the Necko team comment on Comment 168 before I begin working on this? Thanks!
(In reply to Brian R. Bondy [:bbondy] from comment #171)
> Could someone from the Necko team comment on Comment 168 before I begin
> working on this? Thanks!

Michal? Nick?
From my understanding this will still call fflush fairly often?   If so that may make this too expensive (fflush allegedly syncs all pending I/O filesystem-wide on some filesystems, most notably ext3).  That said, I'm not the cache or I/O expert, and Taras and Michal/Nick are more likely to have an informed opinion.

I just poked around a bit and it seems that there's some debate about whether Firefox has been too paranoid about avoiding fsync:

  http://en.wikipedia.org/wiki/Sync_(Unix)
  http://www.linuxfoundation.jp/news-media/blogs/browse/2009/03/don’t-fear-fsync
  http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/
  http://marcansoft.com/blog/2009/12/making-firefox-play-nicely-with-laptop-mode/

I'd be curious to know 

1) how often our SQLite code still calls fsync, to get a sense of how often we'd be ok with the cache calling it. 

2) Ted Tso's post mentions that "if the fsync() is done off the main UI thread, the fact that fsync() triggers a commit won’t actually disturb other processes doing normal I/O; in ext3 and ext4, we start a new transaction to take care of new file system operations while the committing transction completes.", which makes it sound like we could avoid jank if we did the fsync in a dedicated thread (assuming we could do that in a way that still captures a clean snapshot of the cache, which I assume means "so long as we haven't done any writes by the time fsync is called")

Generally, this algorithm only guarantees that the cache is clean if we haven't done any writes since the last fsync, right?  It's true that this would save cache loss if we're killed or crash while not doing cache writes--but I'm not sure those amount to a large % of cases.
Note that Chrome's strategy is still resulting in cache loss for 19% of users per week, so that's a lot of work for a questionable amount of win:

   https://plus.google.com/103382935642834907366/posts/XRekvZgdnBb

Almost makes me feel like we're back to comment 11, or maybe even exploring a sqlite backend...
Basically how I see it is that there are large spurts of cache usage when a page loads.  The proposed approach will only do a single write into the header to mark it as dirty on the first cache change in that spurt. 

At the end of the spurt, after some threshold time expires, off the main thread we'd write out the changes to disk and mark the header as clean.  This is in contrast to what Ehsan's patch did, which was to do writes on every cache change.

I don't think we should fear flushes in this case off the main thread.  Especially since there would be at most one flush per threshold.  If you are afraid of it, we could just increase what that threshold is.  Our cache clearing code itself does let's say ~1k IO operations on its own to delete old cache while the browser is open.  Having that kind of overhead is much larger than a flush every once in a while after a spurt of cache changes. And this happens on 1 out of 10 Firefox runs in Windows/Mac and 1 out of 5 runs in Linux.

This approach will result in a significant drop of cache loss for Firefox.  We can verify with telemetry data that we aren't causing any regressions in performance and we can also add telemetry data in the new parts of the code to see how long those operations take.  We could disable this approach only on a specific filesystem or platform if we found that to be a problem.  We could also change what triggers the cache header clean=true operation if we need to by using the idle service for example.

I think a targeted solution like this is better than a generalized let's move to sqlite.  I don't think sqlite will be a blind fix and it will bring with it a bunch of other performance problems that we'd have to deal with for years to come.  This could be implemented for Firefox 17 and would help a large amount of people especially those with slow internet connections.
Depends on: 774146
We did an experiment on mobile where we got rid of sqlite fsyncs for a few weeks. End result was that our IO performance got slightly worse. We traded a bunch of smaller explicit flushes() for implicit OS-triggered flushes that caused long pauses to become more frequent. I suspect that increasing cache flushes will not hurt significantly since cache already does a lot of IO.

What we should do here is do the fsyncs randomly if possible: ie randomly set a pref for people participating in the more-robust-cache experiment and track how perf + cache robustness changes.
To start, we could also land a smaller patch in a dependent bug that simply sets a pref browser.cache.cleanHeader to true or false.  Then we could report telemetry data on it in relation to when we encounter corrupt cache. In this way we could see exactly how much this approach would help the situation without doing all the work up front.  Most of that work would then be re-used assuming the benefit is significant.
To clarify Comment 177, that patch would just detect which crashes happen during a spurt of cache usage and which ones happen outside of spurt+a few seconds.  It wouldn't actually modify how the cache works.
OK I went back and re-read my patch and thought about this a bit.  I think comment 168 is a decent plan and I agree with comment 169.  Flushing doesn't hurt that much if it's not on the main thread, and IIRC that used to be the main problem with my patch.  And let's not forget that back in the time when I wrote that patch our performance evaluation criteria was mostly a number of guesses, but we can now measure perf changes using telemetry, which is great for this kind of thing.

Also, I think the sqlite cache plan is totally outside of the scope of this bug, and according to the experience of the places team will cause everything to be worse for a couple of years until we learn how to use it efficiently, so let's not talk about that here!  :-)
Yes to A|B testing.   And I agree that Brian's approach should be implementable and testable in a much shorter timeframe than some other approaches.  So I'd be ok with trying this.  I'd still like to hear feedback from the folks who know the cache code better, though.

> I don't think we should fear flushes in this case off the main thread

The one case in your description that troubles me is that at the start of page loads: we'd be likely be pausing the cache thread to fflush() the header as dirty (if some resources are new) just as we'd like to be triggering reads (if some other resources are cached).  Perhaps marking the cache dirty on a different thread would avoid that? (in the long term perhaps we should have separate reader/writer cache threads, so the writer thread can pause for fsyncs w/o blocking reads?).
(In reply to comment #180)
> Yes to A|B testing.   And I agree that Brian's approach should be implementable
> and testable in a much shorter timeframe than some other approaches.  So I'd be
> ok with trying this.  I'd still like to hear feedback from the folks who know
> the cache code better, though.
> 
> > I don't think we should fear flushes in this case off the main thread
> 
> The one case in your description that troubles me is that at the start of page
> loads: we'd be likely be pausing the cache thread to fflush() the header as
> dirty (if some resources are new) just as we'd like to be triggering reads (if
> some other resources are cached).  Perhaps marking the cache dirty on a
> different thread would avoid that? (in the long term perhaps we should have
> separate reader/writer cache threads, so the writer thread can pause for fsyncs
> w/o blocking reads?).

Flushes don't need to happen on the cache thread, right?
> Flushes don't need to happen on the cache thread, right?

Well, they need to happen at the right time.  If we move the flush that marks the cache as "dirty" to another thread, we shouldn't initiate any writes from the regular cache thread until the flush is complete (else we have a race where we could crash with new, partial writes while the header hasn't yet been marked dirty).   The flush to mark as clean needs to not happen until all writes are done (that's the easy case: since we're not going to flush-as-clean until we've seen no activity for a while, we don't really even need to offload this to a different thread).
(In reply to comment #182)
> > Flushes don't need to happen on the cache thread, right?
> 
> Well, they need to happen at the right time.  If we move the flush that marks
> the cache as "dirty" to another thread, we shouldn't initiate any writes from
> the regular cache thread until the flush is complete (else we have a race where
> we could crash with new, partial writes while the header hasn't yet been marked
> dirty).   The flush to mark as clean needs to not happen until all writes are
> done (that's the easy case: since we're not going to flush-as-clean until we've
> seen no activity for a while, we don't really even need to offload this to a
> different thread).

Yeah agreed.  I don't remember a lot about how the cache works but I'm sure that we can maintain these constraints by being careful...
> The one case in your description that troubles me is that at the start of page
>  loads: we'd be likely be pausing the cache thread to fflush() the header as
>  dirty (if some resources are new) just as we'd like to be triggering reads 
> (if some other resources are cached).

We can do 2 things in that case:
1) Write the dirty bytes to disk and flush mMapFD
2) We can create an empty file in the cache directory called '.cacheclean' when the cache is in a clean state.  When we want it to be dirty we simply delete the file.  So the first cache change in the spurt would delete the file.  The first upgrade that gets this upgrade would mark everyone's cache as dirty.

I think #2 would solve your concern because we aren't even writing any data we're just creating a file inode and removing it.
File corrupt cache code would be updated to check the existence of the .cacheclean 0 byte file. (If you prefer we can use .cachedirty with reverse logic)
re: #2: I'm guessing you'd need to fsync the file deletion?  Unless directory/inode changes have some ordering/completion guarantee that make them guaranteed to hit the disk w/o fsync. This also looks like 2 writes to me (deletion of inode, change to directory listing), so #1 is probably better.  The question I had is whether we can do it on a different thread and block cache writes (while permitting reads) until it completes.  It's an optimization--don't let it stop you, but it's something perhaps to look at once you've got initial perf numbers.
Really happy to see the discussion here progressing, thanks to all of you. Who is planning to implement the strategy we ultimately arrive at here?
Sorry, brain lapse--you can't fsync unlink--but you'd want to see if it's faster/slower than write + fsync.    [If unlink doesn't actually guarantee that the file is removed on disk, but it does guarantee that the OS will perform that deletion (unless the OS itself crashes), I'd consider that fine (actually a win)]
(In reply to Jason Duell (:jduell) from comment #187)
> but you'd want to see if it's
> faster/slower than write + fsync.    [If unlink doesn't actually guarantee
> that the file is removed on disk, but it does guarantee that the OS will
> perform that deletion (unless the OS itself crashes), I'd consider that fine
> (actually a win)]

I'm not sure if writing 4 bytes and flushing to an already opened handle is faster than linking (or unlinking) a file.  But I think they're probably comparable.  I suspect that both are negligible, but I can add a telemetry ping that times it.  I think that if the DeleteFile and unlink functions return successful, then the file is considered created on disk.  Both APIs just take a filepath though, so as you found you can't flush it.

> This also looks like 2 writes to me (deletion of inode, change to directory listing)

Could you expand on this more? I think it's only 1 disk access on the cache IO thread.

(In reply to Josh Aas (Mozilla Corporation) from comment #186)
> Who is planning to implement the strategy we ultimately arrive at here?

I'll be doing this work and will try to land it in Firefox 17.  I'll also do work for the related telemetry bugs.  I'm busy with Metro work as well but I think a v17 landing is possible, and surely if not, a v18 landing.
Assignee: nobody → netzen
> I'm not sure if writing 4 bytes and flushing to an already opened handle 
> is faster than linking (or unlinking) a file.

I don't know either.  Note that it might depend on how much outstanding I/O is occuring (at least for fsync in the ext3 case, which is the problematic one).  I also don't know how many disk accesses are involved in a file deletion--I assumed both the inode table and the parent directory listing would need modification.   Anyway, benchmarking will say more than my handwaving...

Thanks for taking this on Brian.
(In reply to comment #189)
> > I'm not sure if writing 4 bytes and flushing to an already opened handle 
> > is faster than linking (or unlinking) a file.
> 
> I don't know either.  Note that it might depend on how much outstanding I/O is
> occuring (at least for fsync in the ext3 case, which is the problematic one). 
> I also don't know how many disk accesses are involved in a file deletion--I
> assumed both the inode table and the parent directory listing would need
> modification.   Anyway, benchmarking will say more than my handwaving...

This depends on the file system.  But it's something that we can micro-measure without having to rely on telemetry.
> This depends on the file system. 

Agreed and on most modern filesystems they are journaled, so if you are counting IOs at this lower level then it'll probably be higher with creating/deleting a file.
(In reply to comment #191)
> > This depends on the file system. 
> 
> Agreed and on most modern filesystems they are journaled, so if you are
> counting IOs at this lower level then it'll probably be higher with
> creating/deleting a file.

It also depends on where small/empty files are stored.  Some modern filesystems store them on the same inode as the parent folder, which cuts down one write to disk, but I doubt that NTFS does that (since it doesn't do many of the common smart things. ;-)
Classically, file creation/deletion is one of the slowest operations, since it often involves changing the last-changed date of the directory, possibly rebalancing directory btrees (NTFS), etc.   It's hard to see how that could ever be more performant than writing a byte into a file and flushing it (especially if you're just overwriting a single byte of the file).  You could have a 1-byte-length file for clean that contains 0 or 1.

This may be somewhat different for journaled FS's, but I doubt to the level needed.
Depends on: 777328
Note that since bug 745075 landed, on mobile we're "flushing" the disk cache by closing it down completely when we're backgrounded, then opening it up again.  Which raises two questions:

1) How fast is that compared with the fsyncs we're planing to do here?  Strong bet that the approach here is faster, but data is good.

2) Whichever is faster, we should convert to doing it for both mobile and the periodic fsync-after-inactivity planned here.
Depends on: 787576
Depends on: 789939
So before doing this task, I did a series of tasks that measured if it was a good idea or not.  michal has been reviewing everything.
Those series of tasks contain about 50% of the actual implementation of this task.

I'm now at a point where no more extra data can be collected without actually implementing the entire task.

Data is in from the telemetry bugs, here is the break down of corruption currently, and what it would be reduced to:

Windows: 9.7% corrupt caches at startup -> 4.4% corrupt caches at startup 
Linux: 25.9% corrupt  caches at startup -> 11.2% corrupt caches at startup 
Darwin: 8.86% corrupt caches at startup -> 3.2% corrupt caches at startup 

These seem like significant enough wins on all platforms to implement the proposal.  The time it takes to write the cache clean file and fsync was also measured with telemetry and looks good.

Before we can make this a reality we need to:
1) Fix bug 789939
2) Finish the implementation of this task and do a trial run on m-c.  Then test it out to make sure there is no big performance hit from actually writing the cache changes to disk.

If #1 and #2 go well above, we can then look for further changes to get even more reductions in corrupt caches.
In reply to Brian R. Bondy [:bbondy] from comment #195)
> So before doing this task, I did a series of tasks that measured if it was a
> good idea or not.  michal has been reviewing everything.
> Those series of tasks contain about 50% of the actual implementation of this
> task.
> 
> I'm now at a point where no more extra data can be collected without
> actually implementing the entire task.
> 
> Data is in from the telemetry bugs, here is the break down of corruption
> currently, and what it would be reduced to:
> 
> Windows: 9.7% corrupt caches at startup -> 4.4% corrupt caches at startup 
> Linux: 25.9% corrupt  caches at startup -> 11.2% corrupt caches at startup 
> Darwin: 8.86% corrupt caches at startup -> 3.2% corrupt caches at startup 
Brian,
These corruption rates still seem bad. Do you know what is causing remaining corruption and whether we can work around it?(
(In reply to Taras Glek (:taras) from comment #196)
> Brian,
> These corruption rates still seem bad. Do you know what is causing remaining
> corruption and whether we can work around it?(

Crashes around the time a page is loaded (or ajax requests happen) and unclean shutdowns? 

Each stat is more than a 2x improvement, it's better to have an improvement over nothing right?  More than 1 out of every 2 startups will have significantly less IO work to do from deleting files.  And more than 1 out of every 2 startups will not have to re-create everything in the cache. 

For Linux, I suspect when I look into that more I'll find some Linux only bug that is fixable.
If there is a better solution that someone can make happen in the somewhat near future, let me know so I won't waste my time implementing this.  But as far as I know, as it stands this bug has been open since 2001 and this is the only serious plan I've heard to improve it.
Reducing our cache loss by a factor >2 would be great. And you're right that yours is the only short-term plan we have for improving things.  So go for it.
After letting this sit a little longer, here is the updated telemetry. It's looking a bit better than last time:

Windows: 9.3% -> 2.88%
Linux: 22.34% -> 6.07%
Darwin: 11.85% -> 2.96%
Adding metro-beta tag since telemetry stats for Metro shows 25% corruption with that browser.
Whiteboard: [snappy:p2][notacrash] → [snappy:p2][notacrash] metro-beta
Whiteboard: [snappy:p2][notacrash] metro-beta → [snappy:p2][notacrash][metro-beta]
Depends on: 795887
Whiteboard: [snappy:p2][notacrash][metro-beta] → [snappy:p2][notacrash][metro-mvp]
Severity: enhancement → normal
Priority: P3 → P2
Whiteboard: [snappy:p2][notacrash][metro-mvp] → [snappy:p2][notacrash][metro-mvp][LOE:2]
(In reply to Jason Duell (:jduell) from comment #194)
> Note that since bug 745075 landed, on mobile we're "flushing" the disk cache
> by closing it down completely when we're backgrounded, then opening it up
> again.  

The patch for bug 745075 has been problematic. Recently we found that we were shutting down and re-initing the cache much more frequently than desired, causing bugs 826385, 792242, and 829419. A new patch is up for review in bug 826385, but it is a terrible hack and likely to break over time.
I just bumped into this BR and can confirm this is a Problem as evidenced by my situation. I use TM+, and when I check my Plugins and Extension sometimes I need to restart with 100's of Tabs open. 

When I do a restart it takes barely any time since I use these variables from about:config noscript.autoreload.???? to avoid a full reload, without them it takes a couple of minutes to reload.

IF there is a crash then a redownload of every Tab is forced (the cache is lost) and reloading takes much longer. This convinces me that the cache is lost and that better Coding could save most of it.

As a very simple Programming example (I am not saying to actually implement it in this manner) could we not have:
1. One large File with Records and for the last few bytes of each Record have a CRC and Date/Time Stamp. 
2. When the Browser is restarted (for whatever reason) each Record could be read in and assigned (remalloced) as the Cache for each Tab (and it's Tab History, images, etc.). 
3. The last few bytes could be used to calculate the Cache's CRC (to detect if the 'writing Subroutine has confidence in the Data).
4. The last few bytes would also hold the Date/Time and that could be compared with the last updated Date/Time for the single File (that is the Cache).
5. Every Record that passes the CRC check, and has a Date/Time that is prior to the Date/Time for the Cache File is considered a validly written portion of the Cache File (a valid Record).
6. Reload each Tab (and it's Tab History, images, etc.). up to the point where the CRC does not check or (where the failure actually occurred) the Date/Time Stamp is later than that of the Cache File.

Isn't that failproof baring Hardware error or write to the Cache from somewhere else (we use the wrong Subroutine (call too directly)) or another Program (EG Virus, OS) that is incorrect.

This Bug could be fixed to make Firefox work better and faster, it would not be a 'unexpected large amount of extra work'. There are many who could do this.

Disclaimer: The above is only intended to be a simple way of explained the Method I am suggesting could serve to resolve this issue. It is not intended as an expert Tutorial on Cache, OS, or Program Design nor a complete Program.
Bumping bug for 12 years of ****
There is an effort to rewrite the disk cache, you can read more about it here:
https://wiki.mozilla.org/Necko/Cache/Plans

I believe this bug will be resolved when that lands.
Assignee: netzen → nobody
That's correct--we're doing a total rewrite of the cache, and it will be resilient to crashes.  We'll close this bug when that lands.
(In reply to Brian R. Bondy [:bbondy] from comment #205)
> There is an effort to rewrite the disk cache, you can read more about it
> here:
> https://wiki.mozilla.org/Necko/Cache/Plans
> 
> I believe this bug will be resolved when that lands.

Good to hear

Thank you very much
On v21, firefox closed by normal means (FIREFOX -> [X] EXIT)

Cache lost
Depends on: 877301
but it was not a crash, it was closed by normal means and the cache still dumps


ETA on overall cache fix?
(In reply to 1337guy from comment #208)
> On v21, firefox closed by normal means (FIREFOX -> [X] EXIT)
> 
> Cache lost

"Normal" means (from a Mozilla viewpoint) means File→Quit (or File→Exit depending on OS) or the corresponding hotkey. The [X] button at top-right of the window is known to have "interesting" (to say the least) side-effects.
So no effort is being put into making an option to retain the cache?
@1337guy
As already mentioned above, it will be fixed with new on going Necko Cache changes. So please bear with it, hopefully at the end of one day, it will be fixed and we can benefit from it.
firefox updated just today and that bug is still retained i see

and ways of manually disabling the cache dump feature is still not available

it's the same lie over and over again for 12 years
(In reply to 1337guy from comment #213)
> firefox updated just today and that bug is still retained i see
> 
> and ways of manually disabling the cache dump feature is still not available
> 
> it's the same lie over and over again for 12 years

You clearly cannot read because other people have told you it is being worked on.

You even commented saying "Thanks". It'll take as long as it takes man

Want some proof? Here's a blog post outlining the work in progress....

http://www.janbambas.cz/new-firefox-http-cache-backend-implementation/
> other people have told you it is being worked on.
> You even commented saying "Thanks". It'll take as long as it takes

"Thanks" FOR WHAT? Everybody free to construct a perpetuum mobile, to grow bonsai and to make a lot of other funny things. So somebody invents "a genuine new super-duper-resilent extra-mega-fast caching engine". Making BTW a quite questionable decisions like "one file for everyone url" (seems this "inventor" so rich that newer heard abt "cluster waste").

But THIS bug about "cache loss on crach" and not this "genuine new" bells and whistles.

And all users who's affected with this bug had not seen absolutely no any improvements during this 12 (read again: twelve!) yrs. Nor real fixes nor even any workarounds (that's BTW already been proven to do its job and needs only to b "glued" together).
(In reply to vitar from comment #215)
> > other people have told you it is being worked on.
> > You even commented saying "Thanks". It'll take as long as it takes
> 
> "Thanks" FOR WHAT? Everybody free to construct a perpetuum mobile, to grow
> bonsai and to make a lot of other funny things. So somebody invents "a
> genuine new super-duper-resilent extra-mega-fast caching engine". Making BTW
> a quite questionable decisions like "one file for everyone url" (seems this
> "inventor" so rich that newer heard abt "cluster waste").
> 
> But THIS bug about "cache loss on crach" and not this "genuine new" bells
> and whistles.
> 
> And all users who's affected with this bug had not seen absolutely no any
> improvements during this 12 (read again: twelve!) yrs. Nor real fixes nor
> even any workarounds (that's BTW already been proven to do its job and needs
> only to b "glued" together).

You seem to be the expert - I take it if you so urgently need this fixed you'll be submitting some patches through shortly?
Stop struggling.
Some other situation where the cache is useless lost: Preferences -> Network -> CachedWebContent -> [x]Override...  -> Limit cache to...MB of space:
Starting to type some new size there like 1024 to get a 1G cache, the first "1" character is already squeezing the old 256M cache to 1M.
Stop any discussion on this bug. Bugzilla is not a discussion forum.
Everyone wants to have this issue solved. Complaining about it or the progress just doesn't solve the issue.

Re comment #217, please file a new bug for this, as it is a separate (new) issue that by it self can probably be easier solved.
Firefox is updated to V23 and I believe the problem is not yet resolved

Were is the mozilla devs's promise of "new cache system"?

Good browser ruined by asshat developers who enjoys seeing people get upset
Mozilla just crashed again, cache is still lost.

No attention from devs?
(In reply to 1337guy from comment #220)
> No attention from devs?

Apart of previous Comments you seem to ignore/refuse to read:
See linked Bug 877301: there's a (obviously non-trivial) Rewrite going on.

Also see: https://wiki.mozilla.org/Necko/Cache/Plans and http://www.janbambas.cz/new-firefox-http-cache-backend-story-continues/

Patch/Checkin Progress per: https://tbpl.mozilla.org/?tree=Gum
Win32 Builds e.g. available per: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/gum-win32/

Does that answer your question sufficiently?
(In reply to XtC4UaLL [:xtc4uall] from comment #221)
> (In reply to 1337guy from comment #220)
> > No attention from devs?
> 
> Apart of previous Comments you seem to ignore/refuse to read:
> See linked Bug 877301: there's a (obviously non-trivial) Rewrite going on.
> 
> Also see: https://wiki.mozilla.org/Necko/Cache/Plans and
> http://www.janbambas.cz/new-firefox-http-cache-backend-story-continues/
> 
> Patch/Checkin Progress per: https://tbpl.mozilla.org/?tree=Gum
> Win32 Builds e.g. available per:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/gum-win32/
> 
> Does that answer your question sufficiently?

The devs have been ignoring this problem for 12 **** years, and that's ok with you? ****.

I would believe you that "work" on the cache is being done if there were at least some minor changes in updates, but mozilla actually removed the ability to mask a crash using cache fixer, they did NOT allow a dirty cache to be reused, and having an excuse by saying corruption

But no, no changes in 12 years, problem still exists

That blog link you have is just an excuse to create an illusion of an effort actually being done, it's nothing more than an indefinitely stalled plan


How **** hard is it to allow the cache bit to be flipped from 0 to 1 and manually retain cache after crash? I can do it in less than a minute if mozilla allows it. But firefox still rejects the 3rd party edit and deletes my cache


Don't stand up for these idiots promising you nothing


12 **** years, rethink and go **** yourself
(In reply to 1337guy from comment #222)
> (In reply to XtC4UaLL [:xtc4uall] from comment #221)
> > (In reply to 1337guy from comment #220)
> > > No attention from devs?
> > 
> > Apart of previous Comments you seem to ignore/refuse to read:
> > See linked Bug 877301: there's a (obviously non-trivial) Rewrite going on.
> > 
> > Also see: https://wiki.mozilla.org/Necko/Cache/Plans and
> > http://www.janbambas.cz/new-firefox-http-cache-backend-story-continues/
> > 
> > Patch/Checkin Progress per: https://tbpl.mozilla.org/?tree=Gum
> > Win32 Builds e.g. available per:
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/gum-win32/
> > 
> > Does that answer your question sufficiently?
> 
> The devs have been ignoring this problem for 12 **** years, and that's ok
> with you? ****.
> 
> I would believe you that "work" on the cache is being done if there were at
> least some minor changes in updates, but mozilla actually removed the
> ability to mask a crash using cache fixer, they did NOT allow a dirty cache
> to be reused, and having an excuse by saying corruption
> 
> But no, no changes in 12 years, problem still exists
> 
> That blog link you have is just an excuse to create an illusion of an effort
> actually being done, it's nothing more than an indefinitely stalled plan
> 
> 
> How **** hard is it to allow the cache bit to be flipped from 0 to 1 and
> manually retain cache after crash? I can do it in less than a minute if
> mozilla allows it. But firefox still rejects the 3rd party edit and deletes
> my cache
> 
> 
> Don't stand up for these idiots promising you nothing
> 
> 
> 12 **** years, rethink and go **** yourself

Feel free to write the patch to fix this yourself, as you seem to know all the answers?

I'm waiting on you to fix this problem as you seem to be of the impression that this is an easy fix - go on, fix it.

Have you ever stopped to think that the cache being dirty could cause pages to be rendered wrongly, or cause the crash in the first place?

Stop being a small child who has a tantrum because he doesn't get the chocolate at the supermarket, and be an adult. Provide solutions or wait until someone who has experience fixes it.

Did you even read the blog post? You can try the new build for yourself and help give feedback if you like - you don't have to wait any longer.
(In reply to Grant from comment #223)
> (In reply to 1337guy from comment #222)
> > (In reply to XtC4UaLL [:xtc4uall] from comment #221)
> > > (In reply to 1337guy from comment #220)
> > > > No attention from devs?
> > > 
> > > Apart of previous Comments you seem to ignore/refuse to read:
> > > See linked Bug 877301: there's a (obviously non-trivial) Rewrite going on.
> > > 
> > > Also see: https://wiki.mozilla.org/Necko/Cache/Plans and
> > > http://www.janbambas.cz/new-firefox-http-cache-backend-story-continues/
> > > 
> > > Patch/Checkin Progress per: https://tbpl.mozilla.org/?tree=Gum
> > > Win32 Builds e.g. available per:
> > > http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/gum-win32/
> > > 
> > > Does that answer your question sufficiently?
> > 
> > The devs have been ignoring this problem for 12 **** years, and that's ok
> > with you? ****.
> > 
> > I would believe you that "work" on the cache is being done if there were at
> > least some minor changes in updates, but mozilla actually removed the
> > ability to mask a crash using cache fixer, they did NOT allow a dirty cache
> > to be reused, and having an excuse by saying corruption
> > 
> > But no, no changes in 12 years, problem still exists
> > 
> > That blog link you have is just an excuse to create an illusion of an effort
> > actually being done, it's nothing more than an indefinitely stalled plan
> > 
> > 
> > How **** hard is it to allow the cache bit to be flipped from 0 to 1 and
> > manually retain cache after crash? I can do it in less than a minute if
> > mozilla allows it. But firefox still rejects the 3rd party edit and deletes
> > my cache
> > 
> > 
> > Don't stand up for these idiots promising you nothing
> > 
> > 
> > 12 **** years, rethink and go **** yourself
> 
> Feel free to write the patch to fix this yourself, as you seem to know all
> the answers?
> 
> I'm waiting on you to fix this problem as you seem to be of the impression
> that this is an easy fix - go on, fix it.
> 
> Have you ever stopped to think that the cache being dirty could cause pages
> to be rendered wrongly, or cause the crash in the first place?
> 
> Stop being a small child who has a tantrum because he doesn't get the
> chocolate at the supermarket, and be an adult. Provide solutions or wait
> until someone who has experience fixes it.
> 
> Did you even read the blog post? You can try the new build for yourself and
> help give feedback if you like - you don't have to wait any longer.

Easy fix -> The "fix" of editing the bitmap using a hex editor like HXD is not hard, it is PATCHED

The "cache fixer" is made by an anon and THAT EXTENSION WORKED, until mozilla patched it

Then if I try the new build and something goes wrong, you'll blame me for being impatient, right?

My point still stands, for 12 years there has been no fix, and mozilla patched community discovered fixes. You're standing up for ****
"1337guy" (who needs to read http://www.catb.org/~esr/faqs/hacker-howto.html#style) has had his account disabled.

Gerv
> his account disabled

Well, well, it's right, but don't anybody think that it's a soul scream? That this guy like many other feels themself like fooled by a street cardsharper?

For those with a short memory I'll remind a bit of history.

Netscape Navigator was a very fine sw, remember all positives itself who can to. But it died. Then Firefox (under different name) was born. Given us a lot of impatient expectations. Wow! The New Modular Netscape! Supporting a lot of W3C's stuff! Extensions, runtime platform for apps, blah-blah -- use an integrated mail or chat or something else if U want to use it and simply kick it out if U don't...

Yes, it's a bit buggy, but it's too young... Crashing? Well, well, calm down, let's wait a bit... And a bit more... And once more... Here's a quickfix for a while and let's wait... Yes, U can install it to Ur buddy pc. And even his mom's. It will absolutely fine for them. Just don't forget this, this and that.

And then at the some damned time Firefox goes to turn face to an absolute stupids. It's called "market growth" IIRC. It goes to add a bell here and a whistle there, possibly aiming to end with a very smooth rounded full-colored button "Show me something funny" with blackjack and whores.

BTW the nearest competitor Chrome wents opposite way. Yes, it's started from a blondie-oriented window. But at the some point (yes, not very soon to not say more) it got a previously absent API that allowed to port AdBlock(s).

> You seem to be the expert - I take it if you so urgently need
> this fixed you'll be submitting some patches through shortly?
>
> Feel free to write the patch to fix this yourself, as you
> seem to know all the answers?

LOL what a funny school-age argument! Hey, dude, u make me cry! Do You really think that "no JS+XUL knowlege" equal to "have no brain at all"? Well, if I'll write a core algo of the fix either on the Clarion or Forth, will YOU integrate it?

The ways to FIX (just fix, but really fix) this bug was writen here for a years. Nothing has changed. The one only argument against CacheFix itself was that "it CAN caught a crashes because of cahce corruption". I myself for example got only a few such crashes over years, and all who I know too, but sure this possibility needs an attention...

But look above for something about "cahcefix + startup check" chain. It's THE SAME that MS chkdsk of xNIX fsck does sometimes regardles of user wishes and nobody died. But NO ANY (to not ask "reasonable" even) arguments against.

Looks like a True Painters [irony indeed]. Rough drawing? Oh, no, it's not rough, it's Author's Style. And no, I'll not redraw it, I'll write a new The Great Picture. It's not a bug, it's a speciality. And no, we'll not fix it, we'll write The True New Engine instead.
> BTW the nearest competitor Chrome wents opposite way. Yes, it's started from a blondie-oriented window. But at the some point (yes, not very soon to not say more) it got a previously absent API that allowed to port AdBlock(s).

Wait ... Are you saying that Chrome now lets you do full AdBlock? Does it also support NoScript now as well?

If so, time to convert.
(In reply to comment #226)
In addition to the page mentioned in comment #225 (which I didn't know and found fun), I recommend https://bugzilla.mozilla.org/page.cgi?id=etiquette.html and maybe even http://www.catb.org/~esr/faqs/smart-questions.html

I don't say you haven't read them but I think every Bugzilla and support-site user ought to. This said, I'll shut up because 228 comments are already too many. Developers, please forgive the bugspam.
(In reply to Tony Mechelynck [:tonymec] from comment #228)
> (In reply to comment #226)
> In addition to the page mentioned in comment #225 (which I didn't know and
> found fun), I recommend
> https://bugzilla.mozilla.org/page.cgi?id=etiquette.html and maybe even
> http://www.catb.org/~esr/faqs/smart-questions.html
> 
> I don't say you haven't read them but I think every Bugzilla and
> support-site user ought to. This said, I'll shut up because 228 comments are
> already too many. Developers, please forgive the bugspam.

228 comments in 12 years bro, it's not much

still sounds like they're doing nothing.

and if a new cache engine is under development, why don't they put in in the nightlies? - oh wait, that's right, because they don't have faith on their own product(s)

and notice how no developer have posted in this thread for awhile, mostly just forum mods who's told to suppress the need for a patch
(In reply to mikey2123 from comment #229)

> 
> and if a new cache engine is under development, why don't they put in in the
> nightlies? - oh wait, that's right, because they don't have faith on their
> own product(s)

you may track bug 913807 if you're interested in the dependencies to land the new cache code in nightly.
Restrict Comments: true
No longer blocks: 716949
cache2 made this much better
Status: NEW → RESOLVED
Closed: 23 years ago8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.