Open Bug 307815 Opened 15 years ago Updated 11 years ago
kill stat caches because they are fundamentally flawed
stat caches (see nsLocalFileWin.h/.cpp for an example) are fundamentally flawed, and cause problems like the one in bug 307429. If you have multiple nsIFile objects pointing to the same file on disk, their stat caches are bound to get out of sync in some cases. Managing stat caches correctly would just be too difficult to do in a reasonable way, and they are probably no longer a necessary perf feature. They should be removed.
if it hurts performance, I think we are doomed to live with it. Are you able to put together a patch and get some numbers?
This patch should kill stat caches on all platforms. I have only tested on Mac OS X because that is all I have available to build on at the moment. The rest I just wrote up quickly, but its not too complicated and should work.
I don't think it is going to hurt perf too much. Seems like the only nice way to allow people to work around problems that arise from stat caches would be some function in nsIFile like "setStatCacheDirtyIfCacheExists" :)
All I have are Macs, so I can't test startup time with my patch. Can someone on Windows and Linux apply my patch and post pre and post patch startup numbers?
Requesting blocking status to get this on the radar. I'm not sure if it should really block or not, but we may want it for general stability reasons.
Comment on attachment 195490 [details] [diff] [review] fix v1.0 v1.0 is broken on Windows, new patch coming up
doh! Forgot to get rid of invalidating function calls.
more compile fixes - I'd catch this myself before posting, but again I only have a mac...
v2.1 compiles for me (WinXP, SP2) and works just as i would expect it to. Bug 307429 is helped with this one (using v2 patch for that bug).
Josh: To make this fly, we need to understand the performance impact. Can you run some startup tests to determine if indeed the stat caches have insignficant benefit?
if you don't have boxes, on a slow night locally patch a tbox.
I'll do it this weekend or on Monday. I'll try it on a Linux box at the office and if that doesn't show anything too bad then maybe I can put it on the trunk for an hour and back it out after I get a bunch of startup numbers. Can I get non-perf reviews necessary to try a one-hour trunk landing if it works out locally?
Did some preliminary performance testing on XP. Looks like almost an 8% increase in load times (although it's still less than a second...). Anyway, I dumped the tick count in main() and again for nsWindow::StandardWindowCreate. The time interval was thus between app startup and the creation of the first window. 10 tests per build (unpatched, patched). Here are the numbers: Avg ms unpatched: 1431 Avg ms patched: 1544 (a 7.9% increase) Is this acceptable?
I would think that this wouldn't be acceptable. There are workarounds to this problem as I discussed with Josh on Friday.
Yeah, that is too much of a perf hit. A few thoughts: - Is there a single thing in the stat cache that is all we really need to cache so we could cut down on the bad data we hand out and get back most of our perf hit? - It would be interesting to add code that compares what we would be handing out as stat values when using the cache vs. a fresh stat. By counting the number of times the values don't match up over about 10 minutes of stressing the browser, we could get a real understanding of how bad the stat cache problem is. I don't think we really know how often we're getting bad values, and if it is more often than we hope it could be contributing to any number of seemingly random problems (especially when you bring multiple threads into the mix). Profile data issues come to mind... - If we decide to keep the stat cache, we should add stat caching to Mac OS X. If we're going to work around the problems when they come up, Macs might as well benefit from the speed boost there is to be had. - I'm still a bit curious to see what would happen across all affected platforms if we dropped this on the trunk for an hour...
OK, so I think there are several problems with the way we are removing the stat cache on Win32. First of all, ResolveAndStat does more than just read the file size. It resolves the path into the target, and that operation can be safely cached. Finally, it also ends up calling PR_GetFileInfo64 twice, which calls FindFirstFile. And, that function returns more than just the file size. I wonder if we could get better performance by calling CreateFile + GetFileSize, while still caching the target resolution.
Yeah - I've been investigating this some more and I'm going to try to do something better about this problem over the next week or two. I've already looked into breaking up resolve and stat.
At the very least, we should not copy the stat cache when cloning a nsLocalFile. It sucks to have to not be able to use nsIFile::clone to get a new nsIFile that does not have a cached stat buffer.
According to bug 122892, preserving the stat cache across clones saved 0.2% of startup time under Linux.
Comment on attachment 195494 [details] [diff] [review] fix v2.1 OK, let's not completely kill the stat caches. Maybe it would be wise to mark the stat cache dirty whenever we return a writable file descriptor though. Want to write a patch that does that?
Attachment #195494 - Flags: review?(darin) → review-
please renominate if we get more agreement on what should be done here.
Flags: blocking1.8b5? → blocking1.8b5-
I noticed that there's a patch in the works that will greatly improve the preformance of PR_GetFileInfo. See bug 285157. Once we have that patch in NSPR, we should give removing the stat caches a try again.
Depends on: 285157
I came here from http://blog.mozilla.com/bienvenu/2007/05/04/fun-with-nsilocalfile/ and wondered, before bug 285157 is fixed and we try removing the stat cache, whether we could not fix the implementations to be coherent. We know when we're writing to a file, so we could invalidate or even compute the new size for the scenario described in David's blog. Comments? /be
We should really do this in Mozilla 2.
Assignee: joshmoz → nobody
You need to log in before you can comment on or make changes to this bug.