So, current expiration demonstrated to work pretty well, but it's still a bit complex due to having to recalculate a bunch of stuff in both history and expiration and failing in some edge case. Plus it may do a better job at containing memory and disk usage. The idea is to keep the current system but changing the heuristics to be more in par with the real system specs: - Calculate a target max database filesize, based on these data: * disk space available * memory available * number of cores (mostly if > 1) * have a minimum size to fallback (even in case of broken system apis) * have a maximum size for high-end specced systems (like our building boxes with 16 or more gigs) - Calculate a max memory cache size based on the current size of the database * have a minimum size to fallback and preserve performances - Expire if the database is over the target size * but never expire the last 7 days of history, to preserve functionality for users with more bookmarks than history - set the journal size relative to the database filesize * have a minimum as well, to preserve functionality This change is going to solve the follwing problems: - places.sqlite memory cache is sometimes really large, I've seen it taking over 1/4 of my FF memory usage (100MB out of 400MB) - Users with lots of bookmarks keep losing history - database still takes lot of space on device with lots of memory but limited storage - journal grows too much on some device - stop allowing to disable the page cache, it may hurt wal badly - Move all calculation to a single point rather than duplicating in history and expiration, as far as possible. Considering current users feedback on existing expiration: - It will still allow to disable expiration setting the db size to large values, but at least it's clear what the disk usage will look like rathere than guessing based on number of pages. - I'll make a restartless add-on to expire by days, it's the most requested miss from the old timed expiration system, and it's easy to do.
Whiteboard: [MemShrink] → [MemShrink:P1]
Created attachment 556040 [details] [diff] [review] wip 1.0 Saving work, just in case. There are a couple things I forgot in the overall approach. Mostly the fact we have chunked grow that inflates the database file and that SQLite dbs never shrink till vacuum. So I'm using the target db size as a guess to calculate a good cache size, but I can't expire based on the current db size; I may workaround the chunked growth problem by using page_count, but there is no way to know how much space is really used and how much space is wasted in fragmented pages. So I will retain the previous page count limit approach, this has advantage of not forcing me to change prefs, the existing ones are fine. All points in comment 0 will still be addressed, but the journal size limit (bug 658305 will take care of that), and moving all calculation to a single point, it's just not feasible, btw bug 661447 is about that, for future investigation. Plus I've found a better way to avoid wrongly expiring just-added-pages, through frecency. The only remaining thing to do is to take a memory threashold (like 1GB) to better differentiate low-end hardware from high-end hardware and allow a bit more history on the latter, I tried parabolic and exponential distributions but they are not as controllable as I'd like. I'm instead dropping the core check, since recent phones/tablets are going to get dual core, but are not as powerful as a desktop multi-core. Globally on my system (8GBs) I've often seen places.sqlite cache going over 100MB, with the patch I plan to force a maximum of 60MB (remember this will vary based on hardware specs), that is a >40% improvement. We may further reduce in future but better being on the safe side for now.
Created attachment 556121 [details] [diff] [review] wip 1.1
Attachment #556040 - Attachment is obsolete: true
Created attachment 556179 [details] [diff] [review] patch v1.0 it took a bit but I think I'm satisfied enough for now, further tweaking may be done in future based on feedback. This is actually building on try to check I didn't miss some tests and Talos. So, on startup history calculates the new cache size based on a memory percentage and current database size, we will try to never have a cache that is larger than half the database size. There is also a minimum of 5M, we should never work without a cache since we use WAL. It also calculates an "optimal" database size, based on cache size, a maximum limit and available disk space. Expiration uses this calculated size to guess a number of pages we can store before starting expiring some. The guessing is based on old places stats and the large databases I've collected (included mine), hope to have new stats soon with telemetry. Finally, it always avoids expiring history for the last 7 days, in case some user has lots of bookmarks that cover our calculated limit, and I've figured out a better way to avoid expiring pages that are in the process of being added.
Created attachment 556239 [details] [diff] [review] patch v1.1 Fixes a build error on Linux64 and the evergreen downloads history expiration test, that I keep forgetting since it's not in Places :)
Created attachment 556240 [details] [diff] [review] patch v1.2 typo :(
Comment on attachment 556240 [details] [diff] [review] patch v1.2 Review of attachment 556240 [details] [diff] [review]: ----------------------------------------------------------------- r=me. i'd prefer to fix the bookmarks vs history problem by not counting bookmarks in the set (however we do that, separate dbs or whatever), but that certainly doesn't block this from moving forward. i'm also a little concerned about the static 7 day limit on expiration. a burst in browsing activity might result in sacrificing higher frecency older visits for lower frecency newer visits. but let's try it out and get feedback on the approach. please do blog about these changes, so that we can get the word to people to keep an eye out for problems. ::: toolkit/components/downloads/test/unit/test_history_expiration.js @@ +92,5 @@ > getService(Ci.nsINavHistoryService); > // Add the download to places > // Add the visit in the past to circumvent possible VM timing bugs > + // Go back by 8 days, since expiration ignores history in the last 7 days. > + let yesterday = Date.now() - 8 * 24 * 60 * 60 * 1000; nit: for clarity, rename variable. ::: toolkit/components/places/nsNavHistory.cpp @@ +701,5 @@ > nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING("PRAGMA page_size"), > getter_AddRefs(statement)); > NS_ENSURE_SUCCESS(rv, rv); > > PRBool hasResult; why removing scoper here?
Attachment #556240 - Flags: review?(dietrich) → review+
(In reply to Dietrich Ayala (:dietrich) from comment #6) > Comment on attachment 556240 [details] [diff] [review] > patch v1.2 > > Review of attachment 556240 [details] [diff] [review]: > ----------------------------------------------------------------- > > r=me. i'd prefer to fix the bookmarks vs history problem by not counting > bookmarks in the set (however we do that, separate dbs or whatever), but > that certainly doesn't block this from moving forward I initially thought about that, it is expensive though (both from a perf and a code point of view) > i'm also a little > concerned about the static 7 day limit on expiration. a burst in browsing > activity might result in sacrificing higher frecency older visits for lower > frecency newer visits. but let's try it out and get feedback on the approach. ideally this only hits in the edge case where we reach max pages in 7 days, based on the calculated limit that's pretty hard (my history atm goes back to 2 years ago, and I use the browser a lot every day). Btw it's a valid concern to check. > please do blog about these changes, so that we can get the word to people to > keep an eye out for problems. sure > > + // Go back by 8 days, since expiration ignores history in the last 7 days. > > + let yesterday = Date.now() - 8 * 24 * 60 * 60 * 1000; > > nit: for clarity, rename variable. sure, this is not a nit, it's plain wrong :) > ::: toolkit/components/places/nsNavHistory.cpp > @@ +701,5 @@ > > nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING("PRAGMA page_size"), > > getter_AddRefs(statement)); > > NS_ENSURE_SUCCESS(rv, rv); > > > > PRBool hasResult; > > why removing scoper here? The scoper is used to reset a stmt when it goes out of the scope, in this case when the statement goes out of scope it should just be finalized. But since I was curious and a light switched on in my brain, I went back to blame and figured we added that scoper in Bug 617779, the fact is that while scoping (by bracing) is important I'm not sure if resetting the statement is really needed. So I'll double verify that before proceeding. I think it's worth to add a comment there, we didn't add one originally and that's a bad choice.
I've confirmed the scoper is just useless work there by forcing history to change journal mode on the same test profile, from wal to truncate and from truncate to wal, the scope ensures the statement is finalized, that is enough for our needs.
Created attachment 556832 [details] [diff] [review] patch v1.3
Attachment #556240 - Attachment is obsolete: true
Summary: Reorganize history expiration to use a target database size → Reduce places.sqlite cache size and reorganize history expiration around the new value
Whiteboard: [MemShrink:P1] → [MemShrink:P1][inbound]
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
You need to log in before you can comment on or make changes to this bug.