Closed Bug 55405 Opened 24 years ago Closed 24 years ago

SetSizeEntry(..) called during shutdown could lead to corrupted cache.

Categories

(Core :: Networking: Cache, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: neeti, Assigned: neeti)

References

Details

(Whiteboard: [rtm++])

Attachments

(6 files)

Currently, we write dowm the number of entries, and the number of storage bytes in use to the database only during shutdown.
The SetSizeEntry(..) method is called in the destructer of nsNetDiskCache, which is called only during shutdown. If there is a leak, or if the app crashes, then these values are never written out to the database, leading to cache corruption. I think we should call SetSizeEntry at the end of each url load, so the cache is always in sync. My fix is to call SetSizeEntry in nsCachedNetData::Release(..).
Keywords: rtm
Blocks: 54072
No longer blocks: 54072
neeti: once you have the review-set this to rtm+
Whiteboard: [rtm need info]
I am attaching a new patch which instead of updating the database for every url load, detects a corrupt cache (Which could be due to a leak or due to a crash) at startup and does a DbRecovery.
I have attached another patch, using AssignWithConversion instead of NS_LITERAL_STRING, since it does not compile with NS_LITERAL_STRING. Rick, We need a super review on this patch. Thanks, Neeti
hey neeti, this patch looks fine. The only question I have is why the NetDiskCache needs to be an nsWeakReference? It doesn't look like your changes take advantage of that... (sr=rpotts) -- rick
ObserverService will store a weakref to an observer if it implemented it. That way the observer service doesn't have a owning reference to the object increasing its lifetime, and the object and observerservice can get destroyed in any order and all will be fine. That is why we dont even try to RemoveObserver too. This is the std way for doing observers. :-) Setting rtm+ as this bug has been reviewed and super reviewed.
Whiteboard: [rtm need info] → [rtm+]
r=dp That is for being paranoid and preventing an infinite loop if the db file is not writeable (no permission and stuff). We disable the cache in that senario.
I attached a patch to return an error from RemoveAll(...) which is called by DBRecovery(..), if the cache.db file exists and it could not be deleted (it could be in use). This prevents an infinite loop condition, where we could end up calling DBRecovery(..) again and again. Rick, could you please do a super review for this? Thanks, Neeti
I added a one line change to nsNetDiskCache.h to update the cache version. This will blow away all old corrupt caches, so everyone starts with a clean new cache.
(sr=rpotts)
This all sounds like good stuff, but do we have evidence that this is really needed? It sounds like we're talking about hypothetical bugs here, and without a real bug, it's hard to verify that the fix was effective. Marking [rtm need info] until the cost of shipping without this is clearer.
Whiteboard: [rtm+] → [rtm need info]
I have noticed the leak of nsNetDiskCache on many instances. Just browsing on the mozilla site causes these leaks. Putterman has also seen these leaks. Also, anytime the broser crashes,these values are never written out to the cache. For the cache to work correctly, we need an accurate values of the number of entries and number of storage bytes in use, since the cache logic relies heavily on these numbers. The cost of shipping without this is we have all these above cases where we would corrupt the cache, which would lead to all sorts of other cache corruption bugs, including crashers. See bug 54072. This fix would minimize the cases,where we have the corrupt cache, and in the cases where we get a corrupt cache (due to a crash), it would clean up the corrupted cache.
Whiteboard: [rtm need info] → [rtm+]
This bug isn't hypothetical. When I was testing cache crashes I continually hit this case. It wasn't hard. When this happens the db gets completely out of synch with what it really contains (because it thinks it has 10 items when it really has 100, for example) and eventually either led to the crash in 54072 or made the cache unuseable because other error checking prevented new items from getting added.
PDT: As putterman pointed out this is a common crash scenario and a much needed fix. Recommending we accept this.
PDT needs info: JAR thought cache corruption was no longer fatal, i.e., cache was self-healing. Is this true?
Whiteboard: [rtm+] → [rtm need info]
This is the fix for self-healing of the cache. Without this fix, the cache would not function properly, or would become useless, because other error checking would prevent new items from being added. See bug 50559. Renominating again for rtm++.
Whiteboard: [rtm need info] → [rtm+]
Cache corruptness could also cause the disk size to grow without bound, because the actual size of the database could be out of sync with what it really contains. See bug 42606.
Blocks: 50559
Blocks: 42606
PDT says rtm++, land on branch ASAP
Whiteboard: [rtm+] → [rtm++]
Fix checked in on branch and trunk.
Marking it fixed
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
verified on branch: WinNT4 10/30 Linux rh6 10/30 Mac os9 10/30 needs verified on trunk
Keywords: vtrunk
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: