Closed Bug 419170 Opened 13 years ago Closed 13 years ago

The history includes too few days, browser.history_expire_days_min is ignored

Categories

(Firefox :: Bookmarks & History, defect, P2)

defect

Tracking

()

VERIFIED FIXED
Firefox 3 beta5

People

(Reporter: broedli, Assigned: mak)

References

Details

(Keywords: dataloss)

Attachments

(3 files, 3 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; de; rv:1.8.1.12) Gecko/20080207 Ubuntu/7.10 (gutsy) Firefox/2.0.0.12
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022303 Minefield/3.0b4pre

My history has always circa 40000 visits, even though it includes less than 90 days. This behavior contradicts the preference in the ui ("Keep my history for at least 90 days"). It seems that for the case:
1. Zero visits older than "browser.history_expire_days_min" and
2. more visits than "browser.history_expire_sites"
the history is limited through "browser.history_expire_sites" and hence includes less than "browser.history_expire_days_min" days.

Reproducible: Always

Steps to Reproduce:
1.Set browser.history_expire_days_min to a higher value than the age of the oldest history visit
2.Set browser.history_expire_sites to a considerably lower value than the number of history visits
3.Restart firefox several times
Actual Results:  
The number of history visits is reduced to the value specified in step 2


This bug is similar to bug 403572, i can also reproduce it on Windows XP.
Version: unspecified → Trunk
Keywords: dataloss
Flags: blocking-firefox3?
Assigning to Dietrich and blocking for resolution, which might be to dupe against and re-open bug 403572?
Assignee: nobody → dietrich
Flags: blocking-firefox3? → blocking-firefox3+
Priority: -- → P2
Target Milestone: --- → Firefox 3
Assignee: dietrich → mak77
Robert, how is your browser.history_expire_days setup? (Should be 180 by default)
i suppose this could happen if we end up having history_expire_days < history_expire_days_min.

while there i've seen that the query in findVisits does not do what we expect from it, so it would be better rewrite that to behave correctly
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
I did only test cases with expire_days_min < expire_days and could reproduce this with my normal profile (expire_days_min: 90, expire_days: 180, expire_sites: 40000, reduced to 39000 for testing) and a new test profile (expire_days_min: 90, expire_days: 180, expire_sites: 10).

As far as my testing was correct build 2007120711 (before second check-in from bug 407018) limited the history only through expire_days, hence bug 403572 was verified fixed (before first check-in from bug 407018). In build 2007120802 (current situation) the history is limited through expire_days or expire_sites, depending on which preference is hit first. (Build 2007120508 and build 2007120515 show the corresponding behavior, so i think it's related to bug 407018.)
The problematic case (days < expire_days_min and sites > expire_sites) doesn't seem to be included in the automatic tests.

> test 1: NO EXPIRATION CRITERIA MET
>
> 1. zero visits > {browser.history_expire_days}
> 2. zero visits > {browser.history_expire_days_min}
>    AND total visited site count < {browser.history_expire_sites}
(In reply to comment #4)

> The problematic case (days < expire_days_min and sites > expire_sites) doesn't
> seem to be included in the automatic tests.
> 
> > test 1: NO EXPIRATION CRITERIA MET
> >
> > 1. zero visits > {browser.history_expire_days}
> > 2. zero visits > {browser.history_expire_days_min}
> >    AND total visited site count < {browser.history_expire_sites}
> 

test 3: MIN-AGE+VISIT-CAP CRITERIA MET

1. zero visits > {browser.history_expire_days}
2. some visits > {browser.history_expire_days_min}
   AND total visited sites count > {browser.history_expire_sites}
The difference is "_zero_ visits > {browser.history_expire_days_min}" and "_some_ visits > {browser.history_expire_days_min}". I could be wrong, but shouldn't a test exist like:

test x: NO EXPIRATION CRITERIA MET

1. zero visits > {browser.history_expire_days}
2. zero visits > {browser.history_expire_days_min}
   AND total visited site count _>_ {browser.history_expire_sites}
i have been able to see the problem a couple of times but that does not happen always... I've rechecked the queries multiple times, but they appear correct.

my new theory about this is that is due to a bad cast
PRInt64 minDaysAgo = mHistory->mExpireDaysMin * 86400 * PR_USEC_PER_SEC;

mHistory->mExpireDaysMin is a PRInt32, if that overflows the search date will be wrong.

Robert if i put here a patch, can you compile it and see if that solves the problem on your side?
I'm sorry, i can't compile it. I've had also problems to reproduce it in a new profile, firefox seems to start the expiring only after some time. That's the reason because i suggested to use an already existing profile (places.sqlite) in the STR and to change the preferences accordingly. I will attach a places.sqlite file with 147 history_visits from 2007-02-21. If i put it in a new profile, start firefox, select "Show a blank page" and set browser.history_expire_sites to 10, after 4 restarts the number of history_visits is reduced to 12 (1. Restart: 143, 2.: 93, 3.: 43, 4.: 12). The Startpage and the Milestone-Startpage seem to be to "fresh" to expire.
Whiteboard: [swag: 0.5d]
thank you robert, your report and test file were very useful, i have confirmation that this was a cast problem
Attached patch patch (obsolete) — Splinter Review
this fixes our expiration criteria. We wanted to expire visits over the min only if we had more than 40 000 sites (history_expire_sites) but the real query was wrong because:

- it was offsetting the final resultset of the search after WHERE clause
- it was not checking number of sites at all

so we were ending up having a maximum of 40 000 visits for sites over 90 days, not surely the intended behaviour.

This patch fixes:

- case in which expire_days_min > expire_days, we would not observe the user pref choice if this happens, we always delete before expire_days

- a bad cast was forcing us to expire visits before expire_days_min, i've refactored getExpirationTimeAgo to do conversion to PRTime for our prefs, so this should not happen anymore

- FindVisits was limiting on the number of visits, not on the number of sites as  was stated during its creation, refactored

- FindVisits will always try to expire aNumToExpire visits if possible, so if we don't find another visits to expire before max, we go on expiring the remaining part before min (only if we are over sites cap)

- added a new test to check that that we don't expire valid visits if we are over the sites cap

- ClearHistory does not use anymore ExpireItems (Bug 409723), this has brought my clearhistory time (150 000 visits) from 28s -> 8s. Our paranoid functions are good enough to do the cleanup by themselves, in onClearHistory then we call Refresh()
Attachment #309396 - Flags: review?(dietrich)
Comment on attachment 309396 [details] [diff] [review]
patch

> nsNavHistory::LoadPrefs(PRBool aInitializing)
..
mPrefBranch->GetIntPref(PREF_BROWSER_HISTORY_EXPIRE_DAYS_MAX, &mExpireDaysMax);
>   mPrefBranch->GetIntPref(PREF_BROWSER_HISTORY_EXPIRE_DAYS_MIN, &mExpireDaysMin);
>+  // if this happens we could expire wrong visits
>+  if (mExpireDaysMax < mExpireDaysMin)
>+    mExpireDaysMax = mExpireDaysMin;
How about a comment like "cap max days to min days to prevent expiring pages younger than min"

> // nsNavHistoryExpire::FindVisits
> //
> //    Find visits to expire, meeting the following criteria:
> //
>-//    * With a visit date greater than (now - browser.history_expire_days_min)
> //    * With a visit date less than (now - browser.history_expire_days)
>-//    * With a visit date greater than the minimum, and less than the maximum,
>-//      and over the visit cap of browser.history_expire_sites.
>+//    * With a visit date less than (now - browser.history_expire_days_min)
>+//      if we are over the unique urls limit (browser.history_expire_sites)
Could you clean up the comments to be more understandable.. at least to me :p
"With a visit date older than history_expire_days ago (mExpireDaysMax)"
"With a visit date older than history_expire_days_min ago (mExpireDaysMin)"

> nsNavHistoryExpire::FindVisits(PRTime 
..
>+  // If we have found less than aNumToExpire over-max-age records, and we are
>+  // over the unique urls cap, select records older than the min-age cap .
>+  if (aNumToExpire - aRecords.Length() > 0) {
Switch that to (aNumToExpire > aRecords.Length()) ?

>+    if (pageCount <= mHistory->mExpireSites)
>+        return NS_OK;
Add a comment "Don't find any more pages to expire if we have fewer than the max number of pages to keep in history"
Attachment #309396 - Flags: review+
> >+  // If we have found less than aNumToExpire over-max-age records, and we are
> >+  // over the unique urls cap, select records older than the min-age cap .
> >+  if (aNumToExpire - aRecords.Length() > 0) {
> Switch that to (aNumToExpire > aRecords.Length()) ?

Actually, other way around to keep with the comment:
(aRecords.Length() < aNumToExpire)
Attached patch patch (obsolete) — Splinter Review
addressed comments
Attachment #309396 - Attachment is obsolete: true
Attachment #309439 - Flags: review?(dietrich)
Attachment #309396 - Flags: review?(dietrich)
(In reply to comment #13)
> Actually, other way around to keep with the comment:
> (aRecords.Length() < aNumToExpire)

yeah that could be more readable, will address with final review

also
+  PRTime GetExpirationTimeAgo(PRInt32 aExpireTime);

i probably should use aExpireDays since it's not a PRTime
Attached patch patch (obsolete) — Splinter Review
updated with latest comments
Attachment #309439 - Attachment is obsolete: true
Attachment #309941 - Flags: review?(dietrich)
Attachment #309439 - Flags: review?(dietrich)
Comment on attachment 309941 [details] [diff] [review]
patch


>@@ -347,17 +347,17 @@ nsNavHistoryExpire::ExpireItems(PRUint32
> 
>   *aKeepGoing = PR_TRUE;
> 
>   PRInt64 expireTime;
>   if (aNumToExpire == 0) {
>     // special case: erase all history
>     expireTime = 0;
>   } else {
>-    expireTime = PR_Now() - GetExpirationTimeAgo();
>+    expireTime = PR_Now() - GetExpirationTimeAgo(mHistory->mExpireDaysMax);
>   }
> 
>   // find some visits to expire
>   nsTArray<nsNavHistoryExpireRecord> expiredVisits;
>   nsresult rv = FindVisits(expireTime, aNumToExpire, connection,
>                            expiredVisits);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>@@ -435,94 +435,101 @@ nsNavHistoryExpireRecord::nsNavHistoryEx
>   erased = PR_FALSE;
> }
> 
> 
> // nsNavHistoryExpire::FindVisits
> //
> //    Find visits to expire, meeting the following criteria:
> //
>-//    * With a visit date greater than (now - browser.history_expire_days_min)
>-//    * With a visit date less than (now - browser.history_expire_days)
>-//    * With a visit date greater than the minimum, and less than the maximum,
>-//      and over the visit cap of browser.history_expire_sites.
>+//    * With a visit date older than browser.history_expire_days ago.
>+//    * With a visit date older than browser.history_expire_days_min ago
>+//      if we have more than browser.history_expire_sites unique urls.
> //
> //    aExpireThreshold is the time at which we will delete visits before.
>-//    If it is zero, we will not use a threshold and will match everything.
>+//    If it is zero, we will match everything.
> //
> //    aNumToExpire is the maximum number of visits to find. If it is 0, then
> //    we will get all matching visits.
> 
> nsresult
> nsNavHistoryExpire::FindVisits(PRTime aExpireThreshold, PRUint32 aNumToExpire,
>                                mozIStorageConnection* aConnection,
>                                nsTArray<nsNavHistoryExpireRecord>& aRecords)
> {
>-  // Select moz_places records, including whether visited and whether the URI
>-  // is bookmarked or not.
>-  nsCAutoString sqlBase;
>-  sqlBase.AssignLiteral(
>-    "SELECT v.id, v.place_id, v.visit_date, h.url, h.favicon_id, h.hidden, "
>-    "(SELECT fk FROM moz_bookmarks WHERE fk = h.id) "
>-    "FROM moz_places h JOIN moz_historyvisits v ON h.id = v.place_id ");
>-
>-  // 1. Expire records older than the max-age cap
>-  nsCAutoString sqlMaxAge;
>-  sqlMaxAge.Assign(sqlBase);
>-
>-  if (aNumToExpire) {
>-    // Select records older than the max-age cap
>-    sqlMaxAge.AppendLiteral("WHERE v.visit_date < ?1 "
>-                            "ORDER BY v.visit_date ASC LIMIT ?2");
>-  }
>-
>+  // Select a limited number of visits older than a time
>   nsCOMPtr<mozIStorageStatement> selectStatement;
>-  nsresult rv = aConnection->CreateStatement(sqlMaxAge, getter_AddRefs(selectStatement));
>-  NS_ENSURE_SUCCESS(rv, rv);
>-
>-  if (aNumToExpire) {
>-    rv = selectStatement->BindInt64Parameter(0, aExpireThreshold);
>-    NS_ENSURE_SUCCESS(rv, rv);
>-    rv = selectStatement->BindInt64Parameter(1, aNumToExpire);
>+  nsresult rv = aConnection->CreateStatement(NS_LITERAL_CSTRING(
>+      "SELECT v.id, v.place_id, v.visit_date, h.url, h.favicon_id, h.hidden, "
>+        "(SELECT fk FROM moz_bookmarks WHERE fk = h.id) "
>+      "FROM moz_places h JOIN moz_historyvisits v ON h.id = v.place_id "
>+      "WHERE v.visit_date < ?1 "
>+      "ORDER BY v.visit_date ASC LIMIT ?2"),
>+    getter_AddRefs(selectStatement));
>     NS_ENSURE_SUCCESS(rv, rv);
>-  }
>+
>+  // browser.history_expire_days || match all visits
>+  rv = selectStatement->BindInt64Parameter(0, aExpireThreshold ?
>+                                                aExpireThreshold :
>+                                                LL_MAXINT);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+  // use LIMIT -1 to not limit
>+  rv = selectStatement->BindInt64Parameter(1, aNumToExpire ?
>+                                                aNumToExpire : -1);
>+  NS_ENSURE_SUCCESS(rv, rv);

these inline conditionals are oddly indented and decrease readability. please put in  a temp variable.

>+  // If we have found less than aNumToExpire over-max-age records, and we are
>+  // over the unique urls cap, select records older than the min-age cap .
>+  if (aRecords.Length() < aNumToExpire) {
>+    // check the number of visited unique urls in the db.
>+    // NOTE: this is an hard guess on the number, the correct query should be:
>+    //
>+    //  SELECT count(*) FROM moz_places h WHERE EXISTS
>+    //    (SELECT v.id FROM moz_historyvisits v WHERE v.place_id = h.id)
>+    //
>+    // but that would be slower, so we are limiting on the full number of
>+    // sites in places, that includes also unvisited bookmarks.
>+    nsCOMPtr<mozIStorageStatement> countStatement;
>+    rv = aConnection->CreateStatement(NS_LITERAL_CSTRING(
>+        "SELECT count(*) FROM moz_places"),
>+      getter_AddRefs(countStatement));
>     NS_ENSURE_SUCCESS(rv, rv);

why not use visit_count > 0?

also, how does sqlite evaluate "*" for count()? how does that perform vs something specific and indexed, such as count(id)?

>+    rv = selectStatement->Reset();
>     NS_ENSURE_SUCCESS(rv, rv);
>-    rv = selectMinStatement->BindInt64Parameter(1, aNumToExpire);
>+
>+    // browser.history_expire_days_min
>+    rv = selectStatement->BindInt64Parameter(0,
>+                    PR_Now() - GetExpirationTimeAgo(mHistory->mExpireDaysMin));

again, cleaner to put this in a tmp var.

r=me, with these fixed
Attachment #309941 - Flags: review?(dietrich) → review+
Attached patch for check-inSplinter Review
> >+  // over the unique urls cap, select records older than the min-age cap .
> >+  if (aRecords.Length() < aNumToExpire) {
> >+    // check the number of visited unique urls in the db.
> >+    // NOTE: this is an hard guess on the number, the correct query should be:
> >+    //
> >+    //  SELECT count(*) FROM moz_places h WHERE EXISTS
> >+    //    (SELECT v.id FROM moz_historyvisits v WHERE v.place_id = h.id)
> >+    //
> >+    // but that would be slower, so we are limiting on the full number of
> >+    // sites in places, that includes also unvisited bookmarks.
> >+    nsCOMPtr<mozIStorageStatement> countStatement;
> >+    rv = aConnection->CreateStatement(NS_LITERAL_CSTRING(
> >+        "SELECT count(*) FROM moz_places"),
> >+      getter_AddRefs(countStatement));
> >     NS_ENSURE_SUCCESS(rv, rv);
> 
> why not use visit_count > 0?

it is slightly slower (2x, not a problem), and does not take in count 0,4,7 visits, imho we should have a stricter limit on the global number of places, but since we can trimmer the expire_sites value and embedded visits are expired by themselves we can take this change.
We will need to check-in the visit_count patch after this.

In future we should see how expiration is working for power users, 40000 visited pages could generate a huge historyvisits table, so we could need to trimmer it down a bit since it's a limit used only on visits older than min_days

> also, how does sqlite evaluate "*" for count()? how does that perform vs
> something specific and indexed, such as count(id)?

should be almost the same, but from tests count(*) is faster than count(id)
Attachment #309941 - Attachment is obsolete: true
mh forgot: all comments (but count(id)) applied.
Keywords: checkin-needed
Whiteboard: [swag: 0.5d]
Blocks: 409723
Depends on: 416313
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.275; previous revision: 1.274
done
Checking in toolkit/components/places/src/nsNavHistoryExpire.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistoryExpire.cpp,v  <--  nsNavHistoryExpire.cpp
new revision: 1.43; previous revision: 1.42
done
Checking in toolkit/components/places/src/nsNavHistoryExpire.h;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistoryExpire.h,v  <--  nsNavHistoryExpire.h
new revision: 1.11; previous revision: 1.10
done
Checking in toolkit/components/places/tests/unit/test_expiration.js;
/cvsroot/mozilla/toolkit/components/places/tests/unit/test_expiration.js,v  <--  test_expiration.js
new revision: 1.17; previous revision: 1.16
done
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Keywords: checkin-needed
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: Firefox 3 → Firefox 3 beta5
Attached image screenshot
clicked link color is changed.

[setting]
http://img255.imageshack.us/img255/7905/setyj4.jpg

OK: 20080318_0945_firefox-3.0b5pre.en-US.win32.zip
NG: 20080318_1117_firefox-3.0b5pre.en-US.win32.zip

[Bonsai Query]
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1205858700&maxdate=1205864219

regression or intended ?
(In reply to comment #22)
> regression or intended ?

regression, i'm going filling that

Depends on: 423960
filled bug 423960, thank you VERY MUCH!
Flags: in-testsuite+
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008060306 Minefield/3.0pre

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9) Gecko/2008053008 Firefox/3.0

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1a1pre) Gecko/2008060302 Minefield/3.1a1pre

This doesn't seem to be a problem anymore. I tried latest build and RC2 on Windows and Mac, and the history is fine. It shows all history and leaves visited links colored different. Marked as verified.
Status: RESOLVED → VERIFIED
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.