Closed Bug 112308 Opened 19 years ago Closed 18 years ago
.dat file adds 30% to startup time and 1MB or more to RAM
33.03 KB, text/html
453 bytes, patch
|Details | Diff | Splinter Review|
3.35 KB, image/png
652 bytes, patch
|Details | Diff | Splinter Review|
2M history.dat file adds 30% to the startup time on openwound! 7sec of 27sec go away when we whack the 2M history.dat file. We need to profile this! A 2M history.dat file is probably pretty common.
jprof data showed problems are caused by mork
the problem is that you have a 2MB history file - loading that into memory is going to be slow, obviously. History should be pruned occasionally. If you keep months of history around, you're going to pay the price.
Well, maybe we can be smart about how we load this file. Maybe partially load recent entries, finish startup, then finish loading the rest of the history file in the background?
Our default of 9 days costs me 1MB for history.dat + history.mab. IE has a default of 20 days, but their history items seem lighter weight. Why do we need to store so much information (title, location, last visited, first visited, referrer, visit count, hostname)? Wouldn't everyone prefer faster startup? If history file size remains such an important performance factor, we should surface it in the UI, perhaps by offering a way to specify that as a limit, instead of/in addition to number of days.
history.mab is your history address book, and really isn't part of this discussion (though mine is 192K and I have the max number of entries in it - it's also not loaded on mail or browser startup). And, you can control how big your history address book is in prefs. My history.dat is 167K by comparison, though I don't have time to do a lot of browsing :-( We could shrink the history.dat file by maybe 25% by using the form field in mork yarns to avoid writing out binary data, but we couldn't get this to work because of a mork bug. I could try again given time. This would not give a 25% speedup, though, more like a 3-5% speedup, and that's just in the case where you have a gigantic history.dat. I'm a little worried that the history.dat is not getting compressed often enough (normally we just write out change logs, and when enough space is wasted, we do a compress commit). Chris, we could do what you suggest, but it would involve two databases, and a bit of merging fun in the history code
I think we need to do a compressed-commit on shutdown. that will compress the history file down to its smallest possible size...if we're going to do it, shutdown seems like the best time... I think the next best win would be the morkform stuff.. the problem I ran into with the mork Form stuff was that after doing a sweeping change to convert all titles from UCS2 to UTF8 and doing a compressed commit, the history would be slightly corrupted the next time I started the browser - the file looked fine but the parser would barf on 4-byte ASCII title strings, like "Trek" (4 bytes is the threshold where mork stops storing atom IDs and starts storing the raw string instead)
doing a compress commit is going to slow down shutdown - we have logic in there to do a compress commit when it seems like a certain amount of space is wasted. We could try tweaking that logic a little to be more agressive. It's also possible that there really is 2MB of data, though I can't imagine how many entries that would be. (If Cathleen wants to count them, that would be very interesting) I could also try to fix mork so it knew exactly how much space is wasted, but that code is very cryptic.
hey, I'd rather eat that time at shutdown than startup :)
here is a link to the 2MB history.dat file. tons of entries. (sorry it's not available for external right now, contact me if you need a copy) http://blues/users/cathleen/publish/performance/bug112308/history.dat you can view it by just clicking on it. to test it, copy the file to your profile directory and set your date back to 11/27/2001 around 7pm (we think things might get purged, if trying it 9 days out)
For this particular file, I think this is kind of an abnormal case.. you've definitely run browserbuster or some other thing that adds many, many unique urls to your history. This particular file would benefit from the use of "tries" (I think that's what they're called) where we have aggressive atomization of string fragments - basically you've got 1000's of urls which are exactly the same up until the last 10 or so characters.. I don't browse the web a huge amount, but I have about 20-30 sites which I visit frequently...my history is only 352k
Assignee: blaker → alecf
I think we need to compact the db more often.. 2M is pretty big, all things considered.
hey dp - this is somewhere I could use that not-recently-used timer thing... I could compress the database when the user isn't doing anything. Just to update everyone on what is going on here - mork is a journalling database, which means that if we don't do a compress commit, it will just continue to grow and grow... we do do compressed commits under certain circumstances, but maybe we aren't hitting that code often enough. you really shouldn't have a 2M history file.. I will add that mine is 940k with 9 days of surfing stored there...
Alec, I didnt add in a no-doing-anything timer. Pav sez that an LOWEST_PRIORITY timer would be that. So just go ahead and hold your own timer and say every 5 mins do the compaction. To see what I did: mozilla/xpcom/ds/nsRecyclingAllocator.cpp
I really don't think we should have a timer going off every x seconds, ad infinitum. What about machines that have a sleep mode, or that spin down the hard disk. we got a lot of flak in 4.x when the browser kept waking up the hard drive. dp, you had talked about a mechanism where you would essentially have a sort of idle-resource mechanism - basically every time you used a named resource, you'd say "hey I just used <resource>" - and then a timer would go off something like 5 minutes from the last use of <resource> to notify the resource to clean itself up...that would be perfect in this case. in the mean time, we can probably just leverage the existing history flush mechanism (see nsGlobalHistory::Flush) - we could make a 2nd timer that goes off less often (the current one goes off 10 seconds after the last url is run) which does a compress commit. blake, this would be a great way for you to get more familiar with the history back end :)
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.9
nominating for nsbeta1
ok, I'll bite.
Assignee: alecf → blaker
Status: ASSIGNED → NEW
So I was in the midst of adding a 5-minute or so compress commit timeout when I noticed that our sync timer only fires the first time in a session. I talked to Stuart and he said that he, er, doesn't recommend calling init more than once on any given timer, but he thought it should work. He suggested calling SetDelay(), but I tried that without success. That no place else in the codebase currently uses SetDelay() properly doesn't give me much hope. However, setting the timer to nsnull and creating a new one each time works. I then found http://lxr.mozilla.org/seamonkey/source/editor/composer/src/nsInterfaceState.cpp#227, which seems to be the same situation. I note this comment in ::Commit: // check how much space would be saved by doing a compress commit. // If it's more than 30%, go for it. So maybe fixing this sync timer problem will help out with this somewhat? Or is that bogus? At the very least, it seems as if fixing this problem will fix bug 63292, no...?
Relevant snippet from irc: <stuart> the problem is, is that if you cancel, and we've already fired an event to your event queue, we need to drop the resulting event processing <stuart> but we can't pull the event out of the event queue, so we just wait until we are told about it and then ignore it <stuart> but if you were to re-init it, and we changed that flag, you could get events you didn't want <stuart> hrm :) <stuart> i'm not quite sure how to fix it without keeping a list of plevents around and doing lookups on them <stuart> i'd create a new timer each time <stuart> i'll add asserts to the code though so can I get r/sr on this patch, for a start?
that is interesting about the timers not firing. That said, I had forgotten about the compress-commit-on-30%-savings.. I think that the 30% calculation is kinda bogus (it might say that in the comments, actually) - I wonder what would happen if we increased the threshold up to 50% or something. We'd have to be careful that we don't end up doing a compress commit every time!
Comment on attachment 67626 [details] [diff] [review] patch (not necessarily for checkin) damn, that's easy. sr=alecf
Attachment #67626 - Flags: superreview+
Back to the original question, couldn't the loading of history be delayed until after the user has started the browser? I always change the 9 days to something longer because history might be the only way to find that page you visited list week or so. (I remember when I finally found out that it was my 90 day history file that made my NS3/NS4 go as fast as a speeding snail so handling large history data isn't a new problem)
One more aspect to consider: http://bugzilla.mozilla.org/show_bug.cgi?id=38621#c50 history.dat is being read in after main window is visible. So although it takes 30% longer to say Document Done, all this time is spent post window being shown to the user. It is needed to do link coloring I think. We could do a two pass on it but then it aint worth the trouble since this is all after window visible.
it is quite possible that the delayed-history-loading has broken recently, we should probably investigate that too. You can just put a breakpoint in nsGlobalHistory::OpenDB() and see who calls it first.
Delayed history loading seems to be working okay. I checked in the timer fix yesterday, don't know if it impacted this or not.
Nav triage team needs info: Is delayed loading a sufficient solution here? What is the risk/reward ratio?
Whiteboard: [need info]
triage team: we've delayed the load as much as possible, it can't be delayed any longer! At this point, the startup hit happens after the first window has been created. This is a very old bug and really needs someone to re-test and verify the behavior.
reassigning to perfQA. cathleen or dbaron, you may want to take another look at this since profiling is beyond my scope.
QA Contact: claudius → hong
let me take a look, I can go test this on openwound.
I need a ~2M history file to test with, anyone got one of these?
ADT triage team: Thanks Alec. We are still not sure if this performance is sufficient for MachV, or if further work is required, and what the cost/risk/benefit of that work would be.
somebody find me a 2M history file, and I'll get some numbers so we don't have to guess.
Is history.dat the right file? Mine is 2361k, and I can e-mail that to someone, if you like (as I wouldn't want to display my browsing habits to the entire world by attaching it to this bug).
Alex: McAfee solicited such a file, so email should be fine. thank you.
Peter: I've e-mailed McAfee my history.dat :).
I ran into delays to get to testing this, lemme try now.
This looks like about a ~22% increase
Thanks Chris! One followup Q, this represents an increase over what? Do we have a control case where history is cleared? It would be good to know the typical cost of history on startup. Also, are the history files on tboxen growing over time?
beginning of test case was no, or very small history.dat file. We started clearing off history.dat (and the rest of the profile) for each tinderbox cycle a few months ago.
A few things: we compress commit in certain conditions when the user is idle. So maybe if the tester is constantly loading pages, it doesn't get this win. I don't know. (I thought we also compressed on shutdown, so perhaps it's irrelevant). And please see comment 25. This may be unimportant at this point if it's happening after the window appears. Someone with a large history file should test by perception.
Since we already delay as long as we can, what would we do if it is still perceived slow? Does anyone consider the changes mentioned here worth the risk at this point? Few people change their default prefs, so this should mostly affect people in proporition to the amount of pages they load. Few people, modest serverity ->low impact.
nsbeta1- per Nav triage team
jrgm asked me to comment here with my history file size: -rw-r--r-- 1 mozilla 2894079 Jun 26 23:35 history.dat My history prefs are set to only retain history for 7 days. trudelle mentioned in comment 5, and there was never an answer, why are we storing so much information in history? Assuming the history window columns are 1:1 with what we store, what's the purpose of a hostname *and* URL field? I suppose it's vagely interesting that I've visited news.bbc.co.uk 125 times, and news.bbc.co.uk/ticker/ticker.stm (an IFRAME on the page) 117 times, with my first visit on 4/28/02 (why it knows about dates more then 7 days old I don't know) at 10:22:27, and there was no referer (ever? the first time? the most recent?), but, IE seems to manage without storing all of this data. In fact, I'd opt for storing less of this useless-to-the-user data for privacy reasons alone. Consider the perf just a bonus. :)
yeah the referer bit is interesting, basically, the GUI, the RDF and DB and all of history *except* the Add method expose referer. the minor problem is of course the add method. I'm actually currently looking into implementing referer. I could have it done w/in 2 days, most of the work is in DocShell (since that's the consumer of the Add method), with a tiny bit of code in history so that the Add method takes a referer. -- You can see my progress by looking for funny bugs against docshell.
Okay, I believe (with pretty good evidence) that we _never_ compress-commit 'history.dat'. It grows without bounds, and barring the user manually cleaning this up, it will collect every URL ever visited and will display those in the UI. This has enormous performance impacts on startup-to-first-url and memory consumption. We need to force a compress-commit on shutdown, and this should be tested on the trunk asap and should be considered a rtm stopper, imho. Example A: I had a win32 tinderbox running for some time several weeks ago. The history.dat in that profile had grown to 11.0 MB. Now, the newest entry in that db was May 1st, and when I dropped that into my profile, the history UI showed me all those long-expired URLs. I shut down but nothing compressed that db and upon restart all of that history was still present. I stepped through this in the debugger, both for the timer call to nsGlobalHistory::Commit() and for the shutdown call (from CloseDB). There is logic in Commit() to convert a request for a kLargeCommit or a kSessionCommit into a kCompressCommit. However, even with this huge db, both the check of Mork's ShouldCompress() and for the backup heuristic that follows, neither flipped the kCompressCommit on. The cost for loading this history datasource was about 12 seconds before loading the first URL in the browser and about 15MB of RAM. Example B: I found a profile with a more normal history.dat. This profile was created 06/14 and used regularly to 06/20. The file was 750KB in size. (See below for some more fun facts). Similar to the above, all the visited URLs, even those from more than nine days ago were shown in the UI and a shutdown and restart did nothing to remove these entries or compress the db. Similarly, stepping through the code showed that Commit() passed on the chance to compress-commit the db. The cost for loading this history datasource was several seconds (I didn't time real closely, but noticeable lockup) before loading the first URL in the browser and over 1 MB of RAM. Opening and closing the history UI was also slow (several seconds) and just closing the window was slow to allow the browser behind to repaint (one or two seconds). Really bad user experience. So, I converted the call to Commit() from CloseDB() [called during shutdown] to do a compress-commit (as alecf had earlier suggested). Index: src/nsGlobalHistory.cpp =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/history/src/nsGlobalHistory.cpp,v retrieving revision 1.145 diff -u -r1.145 nsGlobalHistory.cpp --- src/nsGlobalHistory.cpp 22 Jun 2002 20:06:37 -0000 1.145 +++ src/nsGlobalHistory.cpp 28 Jun 2002 07:32:13 -0000 @@ -2776,7 +2776,7 @@ mdb_err err; ExpireEntries(PR_FALSE /* don't notify */); - err = Commit(kSessionCommit); + err = Commit(kCompressCommit); if (mTable) mTable->Release(); That works like a charm and appears to take negligible time even for the 11MB file. A user will not notice the difference. (I couldn't). In the case of the 06/14-06/20 db, it removed all the entries for 06/14-06/18 and left behind the two days that were within the 9 day window, 06/19-06/20. I set the home page to a window.close page (i.e., shutdown as abruptly as possible) and this seems pretty robust. Okay, to recap, I think that we never compress-commit the history.dat db, we grow it forever, this consumes enormous wasted resources, and is visibly slow. The fix is to compress-commit at each shutdown which seems to work without a problem. I think we should land this on the trunk asap (even if we decide later to fix the logic in nsGlobalHistory::Commit()). If there are no regressions or problems reported in trunk builds, then we should take this on the branch. ------ [Fun Facts: the 750KB history.dat, in use for six days, contained 3660 URLs! Of these, I could tell (by URL) that about 3400 were just for iframe ad banners of which about 3000 from the ar.atwola.com ad server that serves aoltw ad content!! Huh? Well, from the timestamps, visit count and URLs, it was apparent that this browser had been left on overnight, pointing at the netscape.com home page. Over the course of the night, it added 3 URLs per minute to global history. Sweet!!]
Hmm, I mangled that inline copy of the patch with a fill-region. Try this one. Index: src/nsGlobalHistory.cpp =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/history/src/nsGlobalHistory.cpp,v retrieving revision 1.145 diff -u -r1.145 nsGlobalHistory.cpp --- src/nsGlobalHistory.cpp 22 Jun 2002 20:06:37 -0000 1.145 +++ src/nsGlobalHistory.cpp 28 Jun 2002 07:32:13 -0000 @@ -2776,7 +2776,7 @@ mdb_err err; ExpireEntries(PR_FALSE /* don't notify */); - err = Commit(kSessionCommit); + err = Commit(kCompressCommit); if (mTable) mTable->Release();
compress commit and url aging are two different things - compress commit just removes deleted rows from the backing store; it does not delete rows. If rows are not being removed from history, that's a problem with aging, or a problem with the history code. My history is aged fine, so I doubt that everyone is seeing every url they ever visited in the history UI. I'm not sure why the backup logic for issuing compress commits isn't working, but my only guess would be that the avg row size is configured too high. What did you see in the debugger as to why that code wasn't hit? Compress commit will work OK if your history file is small (i.e, you only keep around a few days of history). But if you keep a month, for example, you could be writing out a multi-megabyte file every time you shut down. At some point, that's going to be noticeable.
So, I dropped by the cube next to mine and had shrir start up his browser on one of his profiles. This profile was set to expire in the default 9 days yet contained entries going back well into May and these were all loaded and shown in the UI. The disk size for this file was >500KB and it took approx. 750KB of RAM to load it. At shutdown, none of those entries were deleted. (In fact, history.dat grew by an additional 20KB just by starting and shutting down).
Summary: 2M history.dat file adds 30% to startup time on openwound → 2M history.dat file adds 30% to startup time and 1MB or more to RAM
Hee-hee. So |mFileSizeOnDisk| is _always_ zero, which makes the following a permanently losing comparison: 2716 LL_UI2L(numRows, count); 2717 LL_DIV(bytesPerRow, mFileSizeOnDisk, numRows); 2718 LL_I2L(desiredAvgRowSize, 400); 2719 if (LL_CMP(bytesPerRow, >, desiredAvgRowSize)) 2720 commitType = kCompressCommit; http://lxr.mozilla.org/seamonkey/source/xpfe/components/history/src/nsGlobalHistory.cpp#2688 http://lxr.mozilla.org/seamonkey/search?string=mFileSizeOnDisk If mFileSizeOnDisk actually represented the file size of history.dat, then, for the history.dat in example B: above, |bytesPerRow| would be ~4500, and a compress-commit would be performed. (sz = 841954, rows = 188).
Okay, this sets mFileSizeOnDisk when we have successfully opened history.dat in nsGlobalHistory::OpenDB(). Works well on win32 and truncates an 800KB history file from example B: to about 60K on disk and permanently removes all those entries that are older than 9 days (the current [default] setting in that profile). But, after six months, can we stop pretending there is no bug here and that real users are not significantly hurt by this.
I hope I didn't give that impression that there was no bug here. I merely was trying to correct some misconceptions flying around about what compress commit does. It does reduce disk size, there's no doubt about that. But it doesn't remove history entries. This is because of the way mork does incremental updates. If you delete a row (e.g., expire a history entry), it just writes out change record at the end of the file, saying the row has been deleted). The earlier table definition in the db still has that row in it, but when the db is read in, the change record for the delete row removes the row from the in-memory version of the table. The more changes you make, the more change records you get in the db. When you do a compress commit, the tables are completely re-written out to disk, so in this example, the table would be written out without the deleted row. So you save the space for the deleted row, and the change record. Re mFileSizeOnDisk, it was set in the initial patch I submitted for bug 55293 but it looks like that part of the code got lost in all the history excitement. I looked at fixing the underlying mork bug with shouldcompress but I never got very far. If I had a week to look at it, I could figure it out, probably.
I apologize for the "no bug". That was a tad harsh. Sorry. Thanks for the explanation of removing history entries vs. compress-commit. It seemed, though, that expired entries were not removed from the visible UI ever, for the two example history.dat files. But they would be properly cleaned up if a compress commit was done (by forcing it in CloseDB(), patch #1). It may that because the history.dat in example b: and in shrirang's case hadn't been used in a few days, this tripped up some logic in mork or the global history client code. Perhaps if a profile is in regular use every day, a user will not have this problem. But, nonetheless, I don't think our license terms allow us to dictate to users that they must run the browser every day for normal operation :-). At any rate, I think that we should land attachment 89642 [details] [diff] [review] on the trunk, check for regressions and land it on the branch. We know now that we never compress history.dat, and even if rows get properly expired and removed, that file must still be parsed into memory at startup, which slows the show of the first window and wastes significant amounts of RAM. If that file grows without bound, then this will hurt users down the road.
I agree that we should land the patch on the trunk and try to get it into the branch. I suspect that the mork "shouldCompress" call occasionally does return true, so we do occasionally compress commit, and it probably does depend on your useage pattern - I've forgotten what the logic in Mork was checking for - it was some goofy highwater mark thing that was not doing the right thing, except if you were lucky.
r=blake please have bienvenu sr=.
Comment on attachment 89642 [details] [diff] [review] alternative patch; set mFileSizeOnDisk when opening the db sr=bienvenu
Attachment #89642 - Flags: superreview+
Comment on attachment 89642 [details] [diff] [review] alternative patch; set mFileSizeOnDisk when opening the db noting my r=
Attachment #89642 - Flags: review+
marking [nsbeta1+ adt3] per discussion on navtriage (although I protest the adt3 :-), but I'll debate the issue when it comes time for adt1.0.1+, etc.)
Checked in for jrgm - marking fixed and adding adt1.0.1 keyword. cvs commit: Examining . Checking in nsGlobalHistory.cpp; /cvsroot/mozilla/xpfe/components/history/src/nsGlobalHistory.cpp,v <-- nsGlobalHistory.cpp new revision: 1.146; previous revision: 1.145 done
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
I've (of course) already tried out this patch on Linux and win32. On win32, I've tried several different history.dat files including the massive 11MB history.dat and haven't had problems (the file is compress committed as appropriate; no crash or dataloss in any test). I've started to do the same on Linux and will get Mac when builds come out tomorrow. I'd like to see a couple of days of trunk data from talkback to make sure there are no problems there, and then ask for approval to land on the branch. Yes, I consider this something I would hold the release for.
Target Milestone: mozilla1.2alpha → mozilla1.0.1
I've tried this more thoroughly on Linux, and thoroughly on Mac osx and os9 (using both the 11MB history.dat and an 800KB one, and with just a normal history.dat file, going through starts and stops and watching the file get appropriately compressed). The only hiccup was that I couldn't get the Netscape build on os9 to compress the history.dat, yet the mozilla build from the same install dir would do so correctly. That makes no sense. Perhaps that comm. build is mislabeled or otherwise doesn't have the fix. I'll try again in a later comm. build.
Whiteboard: essentially verified on trunk; just waiting to check talkback for any snafus
Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.1a+) Gecko/20020703 [NT4sp6a] I found this bug ages ago, and was hoping it would be fixed soon. My history file is 4.6Mb, with ~13000 entries in the history dialog I have history set to 250days (yes I know, it is large,but hey) and the last visted entry was 2001-11-26. I ran this build, closed it, opened again, went to multiple pages, closed it, went to mail&news, closed Moz again, went to history and scrolled, then closed Moz again. After each of these attempts, the history.dat file stayed the same size, so either it doesn't work, I don't have a new enough build, or it works and I just have to live with a large history file and prune it every few months...
Hmm, well, from the build ID above, you appear to have the build with this patch. However, sorry to say it, but given a 250 day history limit, this isn't "unexpected". This patch doesn't really address your situation. 2001-11-26 was 220 days ago; 13000 entries in 220 days is about 60 per day (seems reasonable enough for average browser usage); 4.6MB for 13,000 entries is about 350 bytes per entry (which seems somewhat high on a per-entry basis, but is below the 400 bytes per entry cutoff that will trigger a compress commit. [And it may be the case that, including overhead, it does require that much storage for 13000 entries and a compress commit won't find any savings]). I imagine you're taking a large hit on RAM and startup. You may want to consider cutting back on the amount of history you save. Because the current implementation will load the entire data set into memory (correct if I'm wrong, but I believe that's the case) it is not really optimal for dealing with 6 months of history data.
Yeah, I understand the disadvantages in having a large history time, though I like to keep a recond of where I have been for the last semester (+ a bit more), hence 250 days. I'd even go 365 but history would be too big. I just went through and pruned the history via the UI, deleting old news articles, session unique urls etc. I got it down to 6300 entries. It didn't compress when I first started deleting, but when I looked after I finished, history.dat was 2.4Mb, which sounds right given 13,000 entries --> 4.6Mb. I guess its the best it will do for me, and has compressed as much as it can. Of course I'd like the overhead of each entrie to be lower, but either way, nice work on this bug!!!!
verified fixed for 07/03 win32, linux, osx builds and 07/06 os9 build. This correctly chooses to do a compress commit now for those platforms. I've checked the talkback data on climate, and there is no indication of any crash associated with this change. I've reviewed bugzilla reports for 07/03 to present, and there are no noted problems with this change.
Status: RESOLVED → VERIFIED
adding adt1.0.1+. Please get drivers approval before checking into the branch.
Comment on attachment 89642 [details] [diff] [review] alternative patch; set mFileSizeOnDisk when opening the db a=chofmann for 1.0.1 add the fixed1.0.1 keyword after checking in on the branch.
Attachment #89642 - Flags: approval+
Patch checked in by bzbarsky, rev 220.127.116.11, at 23:42 07/09
verified on branch then? When verified, pls replace fixed1.0.1 keyword with verified1.0.1. Thanks.
You need to log in before you can comment on or make changes to this bug.