SetSizeEntry(..) called during shutdown could lead to corrupted cache.

RESOLVED FIXED

Status

()

P3
normal
RESOLVED FIXED
18 years ago
18 years ago

People

(Reporter: neeti, Assigned: neeti)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [rtm++])

Attachments

(6 attachments)

(Assignee)

Description

18 years ago
Currently, we write dowm the number of entries, and the number of storage bytes 
in use to the database only during shutdown.
(Assignee)

Comment 1

18 years ago
The SetSizeEntry(..) method is called in the destructer of nsNetDiskCache, which  
is called only during shutdown. If there is a leak, or if the app crashes, then 
these values are never written out to the database, leading to cache corruption.

I think we should call SetSizeEntry at the end of each url load, so the cache is  
always in sync.

My fix is to call SetSizeEntry in nsCachedNetData::Release(..).
Keywords: rtm
(Assignee)

Comment 2

18 years ago
Created attachment 16322 [details] [diff] [review]
patch for the bug
(Assignee)

Updated

18 years ago
Blocks: 54072
(Assignee)

Updated

18 years ago
No longer blocks: 54072

Comment 3

18 years ago
neeti: once you have the review-set this to rtm+
Whiteboard: [rtm need info]
(Assignee)

Comment 4

18 years ago
I am attaching a new patch which instead of updating the database for every url 
load, detects a corrupt cache (Which could be due to a leak or due to a crash) 
at startup and does a DbRecovery. 
(Assignee)

Comment 5

18 years ago
Created attachment 16652 [details] [diff] [review]
patch to detect and clean cache corruption at startup.

Comment 6

18 years ago
Created attachment 16819 [details] [diff] [review]
Added xpcom shutdown observer to earlier patch from neeti.
(Assignee)

Comment 7

18 years ago
Created attachment 16903 [details] [diff] [review]
patch using AssignWithConversion instead of NS_LITERAL_STRING
(Assignee)

Comment 8

18 years ago
I have attached another patch, using AssignWithConversion instead of 
NS_LITERAL_STRING, since it does not compile with NS_LITERAL_STRING.

Rick,

We need a super review on this patch.

Thanks,
Neeti

Comment 9

18 years ago
hey neeti,

this patch looks fine.  The only question I have is why the NetDiskCache needs 
to be an nsWeakReference?  It doesn't look like your changes take advantage of 
that...

(sr=rpotts)

-- rick

Comment 10

18 years ago
ObserverService will store a weakref to an observer if it implemented it. That
way the observer service doesn't have a owning reference to the object
increasing its lifetime, and the object and observerservice can get destroyed in
any order and all will be fine. That is why we dont even try to RemoveObserver too.

This is the std way for doing observers. :-)

Setting rtm+ as this bug has been reviewed and super reviewed.
Whiteboard: [rtm need info] → [rtm+]
(Assignee)

Comment 11

18 years ago
Created attachment 16996 [details] [diff] [review]
latest patch to return error from DBRecovery if cache.db file in use

Comment 12

18 years ago
r=dp That is for being paranoid and preventing an infinite loop if the db file 
is not writeable (no permission and stuff). We disable the cache in that 
senario.
(Assignee)

Comment 13

18 years ago
I attached a patch to return an error from RemoveAll(...) which is called by 
DBRecovery(..), if the cache.db file exists and it could not be deleted (it 
could be in use). This prevents an infinite loop condition, where we could end 
up calling DBRecovery(..) again and again.

Rick, could you please do a super review for this?

Thanks,
Neeti
(Assignee)

Comment 14

18 years ago
Created attachment 17024 [details] [diff] [review]
patch with updated cache version
(Assignee)

Comment 15

18 years ago
I added a one line change to nsNetDiskCache.h to update the cache version. This 
will blow away all old corrupt caches, so everyone starts with a clean new 
cache.

Comment 16

18 years ago
(sr=rpotts)

Comment 17

18 years ago
This all sounds like good stuff, but do we have evidence that this is really
needed? It sounds like we're talking about hypothetical bugs here, and without a
real bug, it's hard to verify that the fix was effective. Marking [rtm need
info] until the cost of shipping without this is clearer.
Whiteboard: [rtm+] → [rtm need info]
(Assignee)

Comment 18

18 years ago
I have noticed the leak of nsNetDiskCache on many instances. Just browsing on 
the mozilla site causes these leaks. Putterman has also seen these leaks. Also, 
anytime the broser crashes,these values are never written out to the cache. For 
the cache to work correctly, we need an accurate values of the number of entries 
and number of storage bytes in use, since the cache logic relies heavily on 
these numbers. The cost of shipping without this is we have all these above 
cases where we would corrupt the cache, which would lead to all sorts of other 
cache corruption bugs, including crashers. See bug 54072. This fix would 
minimize the cases,where we have the corrupt cache, and in the cases where we 
get a corrupt cache (due to a crash), it would clean up the corrupted cache.
Whiteboard: [rtm need info] → [rtm+]

Comment 19

18 years ago
This bug isn't hypothetical. When I was testing cache crashes I continually hit
this case. It wasn't hard.  When this happens the db gets completely out of
synch with what it really contains (because it thinks it has 10 items when it
really has 100, for example) and eventually either led to the crash in 54072 or
made the cache unuseable because other error checking prevented new items from
getting added.

Comment 20

18 years ago
PDT: As putterman pointed out this is a common crash scenario and a much needed
fix. Recommending we accept this. 

Comment 21

18 years ago
PDT needs info: JAR thought cache corruption was no longer fatal, i.e., cache
was self-healing.  Is this true?  
Whiteboard: [rtm+] → [rtm need info]
(Assignee)

Comment 22

18 years ago
This is the fix for self-healing of the cache. Without this fix, the cache would 
not function properly, or would become useless, because other error checking 
would prevent new items from being added. See bug 50559. 

Renominating again for rtm++.
Whiteboard: [rtm need info] → [rtm+]
(Assignee)

Comment 23

18 years ago
Cache corruptness could also cause the disk size to grow without bound, because 
the actual size of the database could be out of sync with what it really 
contains. See bug 42606.
(Assignee)

Updated

18 years ago
Blocks: 50559
(Assignee)

Updated

18 years ago
Blocks: 42606

Comment 24

18 years ago
PDT says rtm++, land on branch ASAP
Whiteboard: [rtm+] → [rtm++]
(Assignee)

Comment 25

18 years ago
Fix checked in on branch and trunk.
(Assignee)

Comment 26

18 years ago
Marking it fixed
Status: NEW → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED

Comment 27

18 years ago
verified on branch:
WinNT4 10/30
Linux rh6 10/30
Mac os9 10/30

needs verified on trunk
Keywords: vtrunk
You need to log in before you can comment on or make changes to this bug.