Closed Bug 44452 Opened 25 years ago Closed 24 years ago

nsReplacementPolicy goes into infinite recursion if disk cache size is too small

Categories

(Core :: Networking: Cache, defect, P3)

x86
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: matt, Assigned: gordon)

References

()

Details

(Whiteboard: [nsbeta2-])

Attachments

(2 files)

Mozilla 2000070209, Linux 2.2.14 i686, RedHat 6.1 When attempting to download the latest Mozilla Linux build via HTTP, the "Unknown File Type" dialog tells me the type is "application/x-tar"; when I then try to download, after about 9% of the file has been fetched, NSGetModule goes into an infinte recursion, crashing after recursing about 55,000 times. If I try downloading exactly the same file with FTP, I get MIME type being "application/gzip", and the file is fetched with no problem. This might have something to do with bug 35956.
It seems that this isn't just a MIME types thing. Another way to reproude is: 1) Go to http://www.nizkor.org/ 2) On the righthand-column, click on "People A to Z" 3) Click on "H" 4) Click on "Hoess, Rudolph" 5) Sit there for a while. I really have no clue what's causing this. I also got the same thing on a Bugzilla report result a few minutes ago.
Summary: gzip <-> x-tar MIME type confusion leads to crash → NSGetModule() goes into infinite recursion
updating component and setting default owner.
Assignee: asa → gagan,ruslan
Component: Browser-General → Networking
QA Contact: doronr → tever
Looks like unknown handler problem
Assignee: gagan,ruslan → mscott
Hmm this sounds bad...nsbeta2?
Keywords: nsbeta2
I recommend a + for nsbeta2.
Hmmm I folllowed the nizkor.org example and it worked fine for me on a windows build from 7/6. either this got fixed or it really is linux only. I haven't tried on linux yet.
Hmm it works on linux for me too.
downloading a nightly build using http on linux also works for me. Sounds like whatever was causing the problem has been fixed.
The problem seems to have been caused by an old version of cache.db in my profile; when I recreated my profile, the problem went away. Using the debug compile, I saw that the message "Error: RecordID not in DB" was being constantly repeated on pages that crashed; this is from method nsNetDiskCache::GetCachedNetDataByID() of file ./netwerk/cache/filecache/nsNetDiskCache.cpp. The NSGetModule() recursion was happening with the NSGetModule of libnecko.so. I tried adding debugging statements that would figure out what was going on, but when I did it looked like NSGetModule() was never being called recrusively before the crash (I don't have enough memory to use gdb on the debug build).
Grrrrrr! The problem with downloading the nightly Linux build has returned, although not the problem with the nizkor.org pages. It seems to be tied into the file cache. If I go into my profile directory and rename the Cache directory, the problem goes away, and this time the problem can't be blamed on an outdated cache.db file, since this profile was created just today. Maybe I should upload my cache.db as an attachemnet? Could the version of the BerkeleyDB library I'm using be important? How do I figure out which version I'm using?
Putting on [NEED INFO] radar. PDT needs to know impact to user and risk of fix to make a call on this bug. Can QA please try to reproduce this? Do we know exactly that might be causing this? If it cannot be reproduced, please mark worksforme.
Whiteboard: [NEED INFO]
Alright, I've finally got it figure out. The problem seems to be that the disk cache size was smaller than the file I was trying to cache, namely the nightly download; the default disk cache size is about 5 Meg, while the daily build is abot 7 Meg; when I upped my disk cache size of 10 Meg, everything went fine. I think that the real root of the problem is InterceptStreamListener::Read() in mozilla/netwerk/cache/mgr/nsCachedNetData.cpp. If a write to the cache stream fails, then it (probably) means that the cache is full, so the cache entry is marked with the flag TRUNCATED_CONTENT. However, subsequent calls to InterceptStreamListener::Read() will try to write to the cache stream again. As far as I can tell, if the cache stream is written to under full-cache conditions too many times, things start going weird; I don't know why. I've fixed this problem in a seperate patch, which I'll attach to another bug which I'll open shortly. The infinite recursion happens in mozilla/netwerk/cache/mgr/nsReplacementPolicy.cpp; for some reason, gdb reports this as NSGetModule() of libnecko.so when I use gdb on the nightly build. There can be a infinite mutual recursion between these five methods: LoadAllRecordsInAllCacheDatabases AddAllRecordsInCache AssociateCacheEntryWithRecord DeleteOneEntry CheckForTooManyCacheEntries This is how they call each other, when the infinite recursion happens: LoadAll -> AddAll Uses add all AddAll -> Assoc Uses Assoc Assoc -> CheckFor Can't add new assoc if too many cached entries CheckFor -> Del If too many, must delete one Del -> LoadAll Must have all records in memory to delete one I think that this happens because each time that the cache stream is written to under cache-full conditions, a new entry is created in the cache database, and after a while the cache database gets filled with entries. I have no clue as how to fix this properly, so I'm adding this hack which detects the infinite recursion and breaks out of it, which I'll upload as an attachment.
Status: NEW → ASSIGNED
Summary: NSGetModule() goes into infinite recursion → nsReplacementPolicy goes into infinite recursion if disk cache size is too small
The bug with the other patch is bug 44856. Changing component to "Networking: Cache". Assigning bug to myself.
Assignee: mscott → matt
Status: ASSIGNED → NEW
Component: Networking → Networking: Cache
Accepting bug again, since I've got Bugzilla confused (probably because *I* was confused about how to use Bugzilla).
Status: NEW → ASSIGNED
Hi Matt, thanks for the great work on this bug. This got incorrectly assigned to me as a uriloader bug. I'd suggest you take your patch and work with gagan or ruslan to get it checked in. They are the networking folks that work with the cache.
Putting on [nsbeta2-] radar. Not critical to beta2. Adding "relnote" keyword for PR2 release. Matthew talk to Gagan and if the fix is good, check it in.
Keywords: relnote
Whiteboard: [NEED INFO] → [nsbeta2-]
Are we sure we want to not fix it for B2? Sounds like a serious bug to me.
matt@nightrealms.com - are you still working on this bug? If not, I'll assign it back to the Component owner. Gerv
No, I'm not working on it any more; I'll do the reassignment with this comment commit.
Assignee: matt → neeti
Status: ASSIGNED → NEW
qawanted - can someone see if this bug is still relevant/a bug? Gerv
Keywords: qawanted
Cache bugs to Gordon
Assignee: neeti → gordon
Target Milestone: --- → mozilla1.0
Target Milestone: mozilla1.0 → mozilla0.9
Anyone: will this be fixed by the cache rewrite? Gerv
You bet. The class nsReplacementPolicy will go away.
marking fixed, this was a problem with the old cache
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
verified
Status: RESOLVED → VERIFIED
Keywords: qawanted
relnote
Keywords: relnote
This was a bug in the old cache, which doesn't exist anymore.
I mean to say "-relnote". opps.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: