Closed Bug 394038 Opened 17 years ago Closed 17 years ago

make url bar autocomplete frecency algorithm global

Categories

(Firefox :: Address Bar, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 3 beta3

People

(Reporter: moco, Assigned: dietrich)

References

(Blocks 1 open bug, )

Details

Attachments

(5 files, 53 obsolete files)

98.53 KB, image/jpeg
Details
1.36 KB, patch
rhelmer
: review+
Details | Diff | Splinter Review
4.55 KB, text/plain
Details
122.99 KB, patch
mconnor
: review+
Details | Diff | Splinter Review
2.47 KB, text/plain
Details
make url bar autocomplete frecency algorithm global

The current "frecency" algorithm (as checked in) is:  among each chunk of time (currently 4 days), sort by visit count.

mconnor writes:

Yeah, I think we need global frecency for sure, file a new bug.  I think we want to chunk searching but re-sort based on the frecency rating, this should not result in jumping around much, though some results might get pushed down (but never up).

http://wiki.mozilla.org/User:Mconnor/PlacesFrecency outlines what I've been thinking about wrt generating a frecency rating that we can calculate at visit time, and weight at sort time with relatively little effort/overhead.  We can discuss how bad it is in the bug though.
Assignee: nobody → sspitzer
Depends on: 394066
Flags: blocking-firefox3+
Target Milestone: --- → Firefox 3 M10
Priority: -- → P2
met with dietrich and mconnor today to discuss how we could implement this.

separate from the algorithm for calculating the frecency value, we'll need to add the value to the moz_places table, and fix the autocomplete code to chunk over the top <n> places at a time, as well as adding code to (on idle) update and age the frencency value, as well as how to handle first-run import from fx 2 -> fx 3

I'll write up a detailed summary of what we discussed today and link to it.
Status: NEW → ASSIGNED
note, this work is separate from the work Edward is doing in bug #395739.

the plan is to use his idea/code (in addition to global frecency), in order to improve the results.
this will also allow us to show unvisited bookmarks in url history, which is bug #393570
Blocks: 393570
Target Milestone: Firefox 3 M10 → Firefox 3 M11
(In reply to comment #0)
> though some results might get pushed down (but never up).
Some results might go up if we consider it a better match. E.g., an older-than-4-day item that has been visited millions of times. That doesn't get caught in the 4-day chunking.

Or more likely for my stuff.. typing more can cause an "exact match" that gets a rank boost, but because there's existing items in the list, it gets appended.
Priority: P2 → P1
note, fixing this will probably turn #392399 into a wontfix.
Status: ASSIGNED → NEW
Attached patch wip patch (obsolete) — Splinter Review
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #293016 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #293025 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #293028 - Attachment is obsolete: true
Attached patch updated patch (obsolete) — Splinter Review
The intent here is to implement option 3 from http://wiki.mozilla.org/User:Mconnor/PlacesFrecency
Attachment #293031 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #293062 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #293068 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #293071 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
when creating new bookmarks, we should generate a frecency to act like we visited that page when we created it.

I think the same should go for when we modify a bookmark, so I've added several xxx todo items.

note, newly created bookmarks are still hidden, so they are not going to show up in autocomplete yet.
Attachment #293074 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
address the hidden bookmark issue.

we have fought the hidden battle with bookmarks before, see bug #369887

with a new bookmark, we create a moz_place for it, with hidden = 1.  if you were to visit that bookmark, we would set hidden to 0, even if you later cleared all visits.

my point is, for bookmarks, we should not be setting hidden.  I've addressed this (at least partially) in the new version of this patch.

as for existing bookmarks, we will fix the hidden when we update frecencies on the idle timer.

my fix should take care of the fx 2 migration case, as well as generic import, and any new bookmarks created.

we do need to recalculate frecency on history import, still.

we might also decide to do a large amount of frecency recalculation for existing fx 3 users, not sure yet.
Attachment #293082 - Attachment is obsolete: true
> note, newly created bookmarks are still hidden, so they are not going to show
> up in autocomplete yet

that is now fixed, as is bug #393570:  "unvisited bookmarks should show up in autocomplete result."
Status: NEW → ASSIGNED
Attached patch updated wip patch (obsolete) — Splinter Review
adding some xxx todo for expiration
Attachment #293086 - Attachment is obsolete: true
No longer depends on: 409327
Should there be a continuous function that maps an infinite domain to a particular range?

For example, recent results (smaller number of hours ago) should have a higher weight than older results: weight = 1 / hours

Or frequently visited pages (large number of visits) get more weight than infrequently visited: weight = 1 - 1 / visits


And just making sure, "frecency" is essentially going to be a statically computed rank so that we don't spend time recalculating on each query?
a potential problem with the current approach:

1) say I type in a page several times in one day, like ebay.com, because I'm trying to find something.
2) on each visit, we recalc frecency, and with several times in one day (and typed) we are going to generate a high frecency value.  (count matters here, too, so assume I do it enough to hike up the count.)
3) after my last visit, I buy what I'm looking for, and then I'm done with ebay.
4) ebay will continue to show up high in my autocomplete results (long after I stop going there) because the last time I visited it we generated a high frecency value.
 
a potential solution:

We already recalc frecencies values on idle.  From the patch:

+  // XXX todo, probably can do more than 10 at a time
+  // recalculate the non-hidden items that are typed and have a high visit count
+  // as those will probably have the highest frecency
+  rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
+    "SELECT id, url, visit_count, hidden, typed "
+    "FROM moz_places WHERE frecency = -1 AND hidden <> 1 ORDER BY typed DESC, visit_count DESC LIMIT ") +
+     nsPrintfCString("%d", COUNT_TO_RECALCULATE_FRECENCY_ON_IDLE),
+    getter_AddRefs(mDBInvalidFrecencies));
+  NS_ENSURE_SUCCESS(rv, rv);

But this just attempts to calculate frecencies where we don't have them (from migration and clearing private data, etc.)

But to solve the ebay problem, we should also recalc the moz_places with high frecency and "old" last visit date.  (In another patch, marco has added last visit date to moz_places, which will make the query we need more efficient as we don't have to join against moz_historyvisits)

If something has a recent last visit date, then we recently recalculated frecency, so the frecency would be accurate.  

If something has a low frecency and an old last visit date, recalculating would only lower the frecency (and low frecency wouldn't be hurting our autocomplete results.)

But if something has a high frecency and an old last visit date, recalculating the frecency would lower the frecency, which would solve the ebay problem.
Attached patch updated wip patch (obsolete) — Splinter Review
better handling of bookmarks that are livemark items (which, unlike other bookmarks) should not show up in autocomplete unless visited or independently bookmarked.
Attachment #293087 - Attachment is obsolete: true
Just to make sure I've understood how things are going to be.. the 'global frecency' is to make a first level cut based on the number and time of visits. The top #X places picked out here are then sorted by some ranking algorithm such as bug 410133. Then with user feedback, the results can be re-prioritized with bug 395739.
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #294715 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #294860 - Attachment is obsolete: true
No longer depends on: 392399
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #294908 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #294937 - Attachment is obsolete: true
Attached patch update wip patch (obsolete) — Splinter Review
exclude anything with frecency = 0 from autocomplete

after invalidating frecencies, fix livemarkitems and place queries to prevent them from showing up.

as for the rest of the places with invalid frecency (-1), we get to them on idle (or manually, see todo items)
Attachment #295165 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #295193 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #295469 - Attachment is obsolete: true
Attached patch updated patch (obsolete) — Splinter Review
Attachment #295474 - Attachment is obsolete: true
Attached patch updated wip to trunk (obsolete) — Splinter Review
Attachment #295715 - Attachment is obsolete: true
Seth (and others), is "frecency", in the summary of this bug and everywhere in the comments, a codename or is it just a typo for "frequency"? Its constant use makes me believe there is something I don't quite grasp.
Tony, see http://quotes.burntelectrons.org/2831

As far as I know, credit for the term goes to Patrick Hunt (from work on his delicious extension for yahoo!)
frececny = frequency + recent
Whiteboard: [work ongoing, wip patch]
Attached patch updated wip patch (obsolete) — Splinter Review
fix a problem where duplicate places would appear in autocomplete results when clicking on the history drop down.

add comments about an existing bug (due to how we remove duplicates)
Attachment #295737 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
Attachment #295850 - Attachment is obsolete: true
Attached patch updated wip patch (obsolete) — Splinter Review
this patch is not ready for checkin, and there are several decisions, spin off bugs, and XXX todo comments to address, but it is ready for a first review so that dietrich can come up to speed on this beast.
Attachment #295875 - Attachment is obsolete: true
Attachment #295881 - Flags: review?(dietrich)
Comment on attachment 295881 [details] [diff] [review]
updated wip patch

>+// bonus for visit transition types for frecency calculations
>+pref("browser.frecency.embedVisitBonus", 0);
>+pref("browser.frecency.linkVisitBonus", 20);
>+pref("browser.frecency.typedVisitBonus", 100);
>+pref("browser.frecency.bookmarkVisitBonus", 40);
>+pref("browser.frecency.downloadVisitBonus", 0);
>+pref("browser.frecency.permRedirectVisitBonus", 0);
>+pref("browser.frecency.tempRedirectVisitBonus", 0);
>+pref("browser.frecency.defaultVisitBonus", 0);
>+
>+// bonus for place types for frecency calculations
>+pref("browser.frecency.unvisitedBookmarkBonus", 40);
>+pref("browser.frecency.unvisitedTypedBonus", 100);
>+

these prefs should not be on the browser prefbranch. should use "places" or "toolkit".

>@@ -907,16 +906,54 @@ nsNavBookmarks::InsertBookmark(PRInt64 a
>   NS_ENSURE_SUCCESS(rv, rv);
>
>   // get row id of the new bookmark
>   PRInt64 rowId;
>   rv = dbConn->GetLastInsertRowID(&rowId);
>   NS_ENSURE_SUCCESS(rv, rv);
>   *aNewBookmarkId = rowId;
>
>+  // XXX
>+  // on import / fx 2 migration, is the frecency work going to slow us down?
>+  // find out if we import history before or after bookmarks for fx 2

bookmark migration happens after history

>+  // we might want to skip this stuff, as well as the frecency work
>+  // caused by GetUrlIdFor() which calls InternalAddNewPage()
>+  // if we do skip this, after import, we will
>+  // need to call FixInvalidFrecenciesForExcludedPlaces()
>+  // we might need to call it anyways, if items aren't properly annotated
>+  // as livemarks feeds yet.

if we don't have to do during the insert path, we shouldn't. it'll slow down import, migration, sync, etc, any batch operation really.

hm, but that's a bit premature - we should try the synchronous calculation, and do a test to see how it affects transaction speed (do an import or something like that). if it turns out to be very slow, we can move to larger idle batches instead.

>+
>+  nsCAutoString url;
>+  rv = aItem->GetSpec(url);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  // prevent place: queries from showing up in the URL bar autocomplete results
>+  PRBool isBookmark = !IsQueryURI(url);
>+
>+  if (isBookmark) {
>+    // if it is a livemark item (the parent is a livemark),
>+    // we pass in false for isBookmark.  otherwise, unvisited livemark
>+    // items will appear in URL autocomplete before we visit them.
>+    PRBool parentIsLivemark;
>+    nsCOMPtr<nsILivemarkService> lms =
>+      do_GetService(NS_LIVEMARKSERVICE_CONTRACTID, &rv);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    rv = lms->IsLivemark(aFolder, &parentIsLivemark);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    isBookmark = !parentIsLivemark;

hrm, considering that this will happen for every single bookmark, i think the perf effect of this outweighs the small amount of code cleanliness this provides. i'd recommend making the livemark anno name a #define here in the bookmark service and checking via the annotation service directly.

>@@ -658,20 +693,16 @@ nsNavHistory::InitDB(PRBool *aDoImport)
>         NS_ENSURE_SUCCESS(rv, rv);
>       }
>     }
>
>     // update schema version in the db
>     rv = UpdateSchemaVersion();
>     NS_ENSURE_SUCCESS(rv, rv);
>   }
>-  else {
>-    rv = EnsureCurrentSchema(mDBConn);
>-    NS_ENSURE_SUCCESS(rv, rv);
>-  }

didn't this change already land?

>@@ -917,16 +956,84 @@ nsNavHistory::InitStatements()
>   // mFoldersWithAnnotationQuery
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>     "SELECT annos.item_id, annos.content FROM moz_anno_attributes attrs "
>     "JOIN moz_items_annos annos ON attrs.id = annos.anno_attribute_id "
>     "WHERE attrs.name = ?1"),
>     getter_AddRefs(mFoldersWithAnnotationQuery));
>   NS_ENSURE_SUCCESS(rv, rv);
>
>+  // mDBVisitsForFrecency
>+  // visit_type <> 0 == undefined (see bug #375777 for details)
>+  rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+    "SELECT visit_date, visit_type FROM moz_historyvisits "
>+    "WHERE place_id = ?1 AND visit_type <> 0 and visit_type <> ") +

note that in bug 392399 it was found that "NOT IN (0,4)" was sometimes faster than this. would it be worth doing a quick test to see if it makes a difference here?

>+     nsPrintfCString("%d", nsINavHistoryService::TRANSITION_EMBED) +
>+     NS_LITERAL_CSTRING(" ORDER BY visit_date DESC LIMIT ") +
>+     nsPrintfCString("%d", mNumVisitsForFrecency),
>+    getter_AddRefs(mDBVisitsForFrecency));
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  // find places with invalid frecencies (frecency = -1)
>+  // invalid frecencies can happen in these scenarios:
>+  // 1) we've done "clear private data"
>+  // 2) we've expired or deleted visits
>+  // 3) we've migrated from an older version, before global frecency
>+  //
>+  // from older versions, unmigrated bookmarks might be hidden,
>+  // so we can't exclude hidden places (by doing "WHERE hidden <> 1")
>+  // from our query, as we want to calculate the freceny for those

s/freceny/frecency/

>-    rv = pageindexTransaction.Commit();
>+  // for existing profiles, we may not have a frecency column
>+  nsCOMPtr<mozIStorageStatement> statement;
>+  rv = mDBConn->CreateStatement(
>+    NS_LITERAL_CSTRING("SELECT frecency from moz_places"),
>+    getter_AddRefs(statement));
>+
>+  if (NS_FAILED(rv)) {
>+    // add frecency column to moz_places, default to -1
>+    // so that all the frecencies are invalid and we'll
>+    // recalculate them on idle.
>+    rv = aDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>+      "ALTER TABLE moz_places ADD frecency INTEGER DEFAULT -1 NOT NULL"));
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    // for place: items and unvisited livemark items, we need to set
>+    // the frecency to 0 so that they don't show up in url bar autocomplete
>+    rv = FixInvalidFrecenciesForExcludedPlaces();
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    // XXX todo
>+    // forcibly call the "on idle" timer here to do a little work
>+    // but the rest will happen on idle.
>+
>+    // create index for the frecency column
>+    // XXX create this index before or after updating?

iirc we found that it was faster to create the index first. however, i don't remember which bug that contains those comments.

reference in the source here:

http://mxr.mozilla.org/seamonkey/source/toolkit/components/places/src/nsNavHistory.cpp#1119

note to self: create that SQLite best-practices wiki page.

>@@ -1787,16 +1984,68 @@ nsNavHistory::GetHasHistoryEntries(PRBoo
>   nsCOMPtr<mozIStorageStatement> dbSelectStatement;
>   nsresult rv = mDBConn->CreateStatement(
>       NS_LITERAL_CSTRING("SELECT id FROM moz_historyvisits LIMIT 1"),
>       getter_AddRefs(dbSelectStatement));
>   NS_ENSURE_SUCCESS(rv, rv);
>   return dbSelectStatement->ExecuteStep(aHasEntries);
> }
>
>+nsresult
>+nsNavHistory::FixInvalidFrecenciesForExcludedPlaces()
>+{
>+  // for every moz_place that has an invalid frecency (-1) and
>+  // begins with "place:" or is an unvisited child of a livemark feed,
>+  // set frecency to 0 so that it is excluded from url bar autocomplete.
>+  nsCOMPtr<mozIStorageStatement> dbUpdateStatement;
>+  nsresult rv = mDBConn->CreateStatement(
>+    NS_LITERAL_CSTRING("UPDATE moz_places SET frecency = 0 WHERE id IN (SELECT h.id FROM moz_places h JOIN moz_bookmarks b ON h.id = b.fk WHERE frecency = -1 AND (b.parent IN (SELECT annos.item_id FROM moz_anno_attributes attrs JOIN moz_items_annos annos ON attrs.id = annos.anno_attribute_id WHERE attrs.name = ?1) AND (SELECT visit_date FROM moz_historyvisits WHERE place_id = h.id AND visit_type NOT IN (0,4) LIMIT 1) is null) OR SUBSTR(h.url,0,6) = 'place:')"),

nit: line length

we should confirm that the proper indexes are being taken advantage of here.

>+    getter_AddRefs(dbUpdateStatement));
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  rv = dbUpdateStatement->BindUTF8StringParameter(0, NS_LITERAL_CSTRING(LMANNO_FEEDURI));
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  rv = dbUpdateStatement->Execute();
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  return NS_OK;
>+}
>+
>+nsresult
>+nsNavHistory::CalculateVisitCount(PRInt64 aPlaceId, PRInt32 *aVisitCount)
>+{
>+  // should we add "AND visit_type NOT IN (0,4)"
>+  // XXX aren't embed's affecting visit count already in ::AddVisit()?
>+  // I think we have a bug about not letting them do that.
>+  // XXX note, should we fix AddVisit() to recalc visit count by using
>+  // CalculateVisitCount()?

all 3 of these comments sound right

to here...
thanks for the first pass, dietrich.  I'm working on these changes and one more (see bug #411452), but while testing, I ran into a few things:

>+  // mDBVisitsForFrecency
>+  // visit_type <> 0 == undefined (see bug #375777 for details)
>+  rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+    "SELECT visit_date, visit_type FROM moz_historyvisits "
>+    "WHERE place_id = ?1 AND visit_type <> 0 and visit_type <> ") +

>note that in bug 392399 it was found that "NOT IN (0,4)" was sometimes faster
>than this. would it be worth doing a quick test to see if it makes a difference
>here?

in general, yes, we should be doing NOT IN (0,4).

but in this case, we don't want to limit by visit_type at all, as I found out while testing.

from a patch I'm about to upload:

  // NOTE: we are not limiting to visits with "visit_type NOT IN (0,4)"
  // because if we do that, mDBVisitsForFrecency would return no visits
  // for places with only embed (or undefined) visits.  That would
  // cause use to estimate a frecency based on what information we do have,
  // see CalculateFrecencyInternal().  that would result in a   // non-zero frecency for a place with only
  // embedded visits, instead of a frececny of 0.


>+nsresult
>+nsNavHistory::CalculateVisitCount(PRInt64 aPlaceId, PRInt32 *aVisitCount)
>+{
>+  // should we add "AND visit_type NOT IN (0,4)"
>+  // XXX aren't embed's affecting visit count already in ::AddVisit()?
>+  // I think we have a bug about not letting them do that.
>+  // XXX note, should we fix AddVisit() to recalc visit count by using
>+  // CalculateVisitCount()?

>all 3 of these comments sound right

When we call CalculateVisitCount() and pass the result to CalculateFrecency(), we should not add that clause, for the same reason above.  But, for the visit_count field we do want to exclude those visit_types.

So I think I need to tweak the code to never use the visit_count field when calculating frecency, but also (when updating visit_count), call CalculateVisitCount())

I'm adding a boolean param to CalculateVisitCount() so that we can use it from both places.

Additionally, in AddVisit(), we should not blindly add 1 to the visit count.  we should only do that if aTransitionType is not TRANSITION_EMBED.

new patch coming...
more partial responses to dietrich's review:

1)

>-  else {
>-    rv = EnsureCurrentSchema(mDBConn);
>-    NS_ENSURE_SUCCESS(rv, rv);
>-  }

> didn't this change already land?

no, it is a supplimental patch part of https://bugzilla.mozilla.org/show_bug.cgi?id=392399, awaiting your review.

2)

>+  // from our query, as we want to calculate the freceny for those

s/freceny/frecency/

fixed, thanks.
another problem I came across while testing:

say you visit http://bar.com and the title is foox, but you bookmark it with the title foo.

type "foo", and you'll get:

foo
http://bar.com

keep typing to get "foox" and you should get:

foox
http://bar.com

but you will get nothing.

this is because of an optimization in the code to re-use previous search result if the old search term (foo) is a subset of the new search term (foox).

when re-using the previous search result, the title is "foo", so we won't match.

One solution it do not do this re-using optimization, but that would be a perf hit.

Another solution is to include both results:

foo
http://bar.com

foox
http://bar.com

(currently, we use the url to remove duplicates.)

But this has problems because is the worst case, you'd see n+1 result for the same result (if there were n bookmarks), and the + 1 comes from the title.

i might spin this issue off to a separate bug.
Attachment #295881 - Attachment is obsolete: true
Attachment #295881 - Flags: review?(dietrich)
> One solution it do not do this re-using optimization, but that would be a perf hit.

just following up, this solution does work, but I have not tried this out with
the dreaded ispiked profile.

edward, I would think that you would have run into a similar issue in your
autocomplete patches, specifically the one about preferring word boundaries. 
(see bug #410133 and bug #410136)

when re-using the previous search result, we would not be recalculating rank,
so the order of the results would be the order from the first search (that we
are reusing), right?

the "cheap" fix is to do:

  // Search through the previous result
-  if (searchPrevious) {
+ if (0 && searchPrevious) {

in nsNavHistoryAutoComplete.cpp
(In reply to comment #50)
> edward, I would think that you would have run into a similar issue in your
> autocomplete patches
Right. There's some quirks with the ranking because we put "more recent but poorer ranked" results over better results from later chunks. That can be refreshed by clearing the list and pushing down to have the list repopulated.

So either we remove the optimization or allow for insertion instead of just appending items.
Attached patch updated patch (obsolete) — Splinter Review
Attachment #296105 - Attachment is obsolete: true
Attached patch updated patch (obsolete) — Splinter Review
keep sites that only have embedded visits out of url bar autocomplete and other changes.

also, address these items:

>+  // XXX aren't embed's affecting visit count already in ::AddVisit()?
>+  // I think we have a bug about not letting them do that.
>+  // XXX note, should we fix AddVisit() to recalc visit count by using
>+  // CalculateVisitCount()?
Attachment #296211 - Attachment is obsolete: true
Attached patch updated patch (obsolete) — Splinter Review
Attachment #296236 - Attachment is obsolete: true
Attached patch updated patch (obsolete) — Splinter Review
Attachment #296247 - Attachment is obsolete: true
dietrich, since you last review, here are the diffs:

https://bugzilla.mozilla.org/attachment.cgi?oldid=295881&action=interdiff&newid=296271&headers=1

looking that interdiff over, here's a list of what has changed.

I've been using this against the dreaded ispiked profile (and a new profile) to help catch bugs and problems.

1)

bonuses in default prefs are percent, to avoid adding 100 in the code, but more importantly, to avoid the problem of things with bonus 0 getting a non zero frecency!

I caught this when transition embeds only places (from espn.com ad's) were in autocomplete, doh!

2)

removed unused local variable.

3)

when getting visits for frecency, we can't exclude those of type 0,4 (see comment in the patch)

also found this when visiting espn.

4)

two new queries, one for getting visit count that excludes embeds (that we use for updating visit_count, which we can't trust in older versions of fx 3) and one that gets them all (for frecency)

5)

create index for frecency before we make changes.

6)

oops, I'll remove that NS_BREAK()

7)

fix code that drops user_title to handle the frecency column

8)

only increment visit_count on a visit (and the visit count we pass when calculating frecency) if the transition is not embed.

9)

optimization when calculating frecency, if the bonus is zero, we don't need to figure out the weight

9)

fix a problem in the calculating frecency code where if we calculated zero (which can happen), we should use that value.

10)

oops, order the url bar autcomplete by frecency DESC (not ASC)

11)

as discussed in another bug, favor bookmark title over page title.
Attached patch updated to trunk (obsolete) — Splinter Review
Attachment #296271 - Attachment is obsolete: true
Just to clarify about reusing search, chunking and inserting.. Reusing search is problematic because old results can contain pages from "old chunk #3" which shouldn't show up before "new chunk #1".

However, we shouldn't need to insert "better ranked" results from "chunk #2" into "chunk #1" results because we can use the frecency chunking to figure out the high order bit.
>@@ -1001,16 +1038,27 @@ nsNavBookmarks::RemoveItem(PRInt64 aItem
>   NS_ENSURE_SUCCESS(rv, rv);
>
>   rv = transaction.Commit();
>   NS_ENSURE_SUCCESS(rv, rv);
>
>   rv = UpdateBookmarkHashOnRemove(placeId);
>   NS_ENSURE_SUCCESS(rv, rv);
>
>+  // XXX is this too expensive when updating livemarks?
>+  // UpdateBookmarkHashOnRemove() does a sanity check using
>+  // IsBookmarkedInDatabase(),  so it might not have actually
>+  // removed the bookmark.  should we have a boolean out param
>+  // for if we actually removed it, and use that to decide if we call
>+  // UpdateFrecency() and the rest of this code?

can be spun off

>+  // mDBVisitsForFrecency
>+  // NOTE: we are not limiting to visits with "visit_type NOT IN (0,4)"
>+  // because if we do that, mDBVisitsForFrecency would return no visits
>+  // for places with only embed (or undefined) visits.  That would
>+  // cause use to estimate a frecency based on what information we do have,
>+  // see CalculateFrecencyInternal().  that would result in a
>+  // non-zero frecency for a place with only
>+  // embedded visits, instead of a frececny of 0.

nit: s/frececny/frecency

>@@ -1040,33 +1165,58 @@ nsNavHistory::EnsureCurrentSchema(mozISt
>   // to speed up finding last visit date when joinin with moz_places.
>   // See bug 392399 for more details.
>   PRBool oldIndexExists = PR_FALSE;
>   nsresult rv = aDBConn->IndexExists(
>     NS_LITERAL_CSTRING("moz_historyvisits_pageindex"), &oldIndexExists);
>   NS_ENSURE_SUCCESS(rv, rv);
>
>   if (oldIndexExists) {
>-    // wrap in a transaction for safety and performance
>-    mozStorageTransaction pageindexTransaction(aDBConn, PR_FALSE);
>-

instead of removing the transaction, maybe wrap each migration scenario in it's own transaction?

>+nsresult
>+nsNavHistory::FixInvalidFrecenciesForExcludedPlaces()
>+{
>+  // for every moz_place that has an invalid frecency (-1) and
>+  // begins with "place:" or is an unvisited child of a livemark feed,
>+  // set frecency to 0 so that it is excluded from url bar autocomplete.
>+  nsCOMPtr<mozIStorageStatement> dbUpdateStatement;
>+  nsresult rv = mDBConn->CreateStatement(
>+    NS_LITERAL_CSTRING("UPDATE moz_places SET frecency = 0 WHERE "
>+      "id IN (SELECT h.id FROM moz_places h JOIN moz_bookmarks b ON "
>+      "h.id = b.fk WHERE frecency = -1 AND "
>+      "(b.parent IN (SELECT annos.item_id FROM moz_anno_attributes attrs "
>+      "JOIN moz_items_annos annos ON attrs.id = annos.anno_attribute_id "
>+      "WHERE attrs.name = ?1) AND (SELECT visit_date FROM moz_historyvisits "
>+      "WHERE place_id = h.id AND visit_type NOT IN (0,4) LIMIT 1) is null) "
>+      "OR SUBSTR(h.url,0,6) = 'place:')"),

nit: can you break this up by subselect for better readability?

> // nsNavHistory::MarkPageAsFollowedBookmark
> //
> // We call MarkPageAsFollowedBookmark() before visiting a URL in order to
> // help determine the transition type of the visit.
> // We keep track of the URL so that later, in AddVisitChain()
> // we can use TRANSITION_BOOKMARK as the transition.
> // Note, AddVisitChain() is not called immediately when we are doing LAZY_ADDs
>@@ -1891,16 +2159,18 @@ nsNavHistory::SetPageDetails(nsIURI* aUR
>           "hidden = ?5, "
>           "typed = ?6 "
>        "WHERE id = ?1"),
>     getter_AddRefs(statement));
>   NS_ENSURE_SUCCESS(rv, rv);
>   rv = statement->BindInt64Parameter(0, pageID);
>   NS_ENSURE_SUCCESS(rv, rv);
>
>+  // XXX should we be calculating / setting frecency here?
>+

hrm, doesn't really have any callers, maybe this should be removed altogether, if the equivalent functionality is achievable via other apis.

http://mxr.mozilla.org/seamonkey/search?string=setpagedetails

>+
>+    // only increment the visit_count if the transition was not EMBED
>+    // XXX what about TRANSITION_DOWNLOAD?
>+    if (aTransitionType != TRANSITION_EMBED)
>+      trueVisitCount++;

hrm, a download is fairly different from a visit

>+
>+    // only increment the visit_count if the transition was not EMBED
>+    // XXX what about TRANSITION_DOWNLOAD?
>+    if (aTransitionType != TRANSITION_EMBED)
>+      visitCountForFrecency++;

ditto

>+
>+    // placeId could have a livemark item, so setting the frecency to -1
>+    // would cause it to show up in the url bar autocomplete
>+    // call FixInvalidFrecenciesForExcludedPlaces() to handle that scenario
>+    rv = FixInvalidFrecenciesForExcludedPlaces();
>+    NS_ENSURE_SUCCESS(rv, rv);

i'm worried this will really slow down bulk removes (eg: removing a selection from history sidebar)

>@@ -2949,16 +3276,39 @@ nsNavHistory::RemovePagesFromHost(const
>   // - do not have EXPIRE_NEVER annotations
>   rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>     "DELETE FROM moz_places WHERE id IN (") + deletedPlaceIds +
>       NS_LITERAL_CSTRING(") AND id NOT IN (") + deletedPlaceIdsBookmarked +
>       NS_LITERAL_CSTRING(") AND id NOT IN (") + deletedPlaceIdsWithAnno +
>       NS_LITERAL_CSTRING(")"));
>   NS_ENSURE_SUCCESS(rv, rv);
>
>+  // reset the frecency the places we did not delete.  Note, we don't

nit: s/the places/for the places/

>@@ -5240,16 +5598,18 @@ nsNavHistory::RemoveDuplicateURIs()
>     rv = countStatement->BindInt64Parameter(0, visit_count);
>     NS_ENSURE_SUCCESS(rv, rv);
>     rv = countStatement->BindInt64Parameter(1, id);
>     NS_ENSURE_SUCCESS(rv, rv);
>     rv = countStatement->Execute();
>     NS_ENSURE_SUCCESS(rv, rv);
>   }
>
>+  // XXX when removing duplicate ids, should we be resetting recency to -1?
>+

this is only called at history import, so i don't think so.

>+nsresult
>+nsNavHistory::UpdateFrecency(PRInt64 aPlaceId, PRBool aIsBookmarked)
>+{
>+  mozStorageStatementScoper statsScoper(mDBGetPlaceVisitStats);
>+  nsresult rv = mDBGetPlaceVisitStats->BindInt64Parameter(0, aPlaceId);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRBool hasResults = PR_FALSE;
>+  rv = mDBGetPlaceVisitStats->ExecuteStep(&hasResults);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  if (!hasResults) {
>+    NS_WARNING("attempting to update frecency for a bogus place");
>+    // before I added the check for itemType == TYPE_BOOKMARK
>+    // I hit this with aPlaceId of 0 (on import)
>+    return NS_OK;
>+  }

to reduce db traffic and not regress perf of history import maybe we should check the place id value before calling UpdateFrecency.

Assignee: sspitzer → dietrich
Status: ASSIGNED → NEW
Blocks: 412132
>+
>+  PRInt32 typed = 0;
>+  rv = mDBGetPlaceVisitStats->GetInt32(0, &typed);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRInt32 hidden = 0;
>+  rv = mDBGetPlaceVisitStats->GetInt32(1, &hidden);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRInt32 oldFrecency = 0;
>+  rv = mDBGetPlaceVisitStats->GetInt32(2, &oldFrecency);
>+  NS_ENSURE_SUCCESS(rv, rv);

all three of these cols have default values. is there a reason you're initializing them to 0?

>+  // if we calculated a non-zero frecency we should unhide this place
>+  // so that previously hidden (non-livebookmark item) bookmarks 
>+  // will now appear in autocomplete
>+  // if we calculated a zero frecency, we re-use the old hidden value.
>+  rv = mDBUpdateFrecencyAndHidden->BindInt32Parameter(2, 
>+         newFrecency ? 0 /* not hidden */ : hidden);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  rv = mDBUpdateFrecencyAndHidden->Execute();
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  return NS_OK;
>+}
>+
>+nsresult
>+nsNavHistory::CalculateFrecencyInternal(PRInt32 aTyped, PRInt32 aVisitCount, PRInt64 aPlaceId, PRBool aIsBookmarked, PRInt32 *aFrecency)

nit: aPlaceId should be first input param (all the other in-params are characteristics of it)

>+ 
>+    // estimate frecency by the last few visits
>+    // use NS_ceilf() so that we don't round down to 0, which
>+    // would cause us to completely ignore the place during autocomplete
>+    if (numSampledVisits) {
>+      *aFrecency = (PRInt32) NS_ceilf(aVisitCount * pointsForSampledVisits / numSampledVisits);
>+      return NS_OK;
>+    }
>+  }
>+ 
>+  // XXX the code below works well for guessing the frecency on import, and we'll correct later once we have
>+  // visits.
>+  // what if we don't have visits and we never visit?  we could end up with a really high value
>+  // that keeps coming up in ac results?  only do this on import?  something to figure out.
>+  PRInt32 bonus = 0;
>+
>+  // not the same logic above, as a single visit could not both

s/not/not be/

>+  // for a unvisited bookmark, produce a non-zero frecency
>+  // so that unvisited bookmarks show up in URL bar autocomplete
>+  if (!aVisitCount && aIsBookmarked)
>+    aVisitCount = 1;
>+
>+  // use NS_ceilf() so that we don't round down to 0, which
>+  // would cause us to completely ignore the place during autocomplete
>+  *aFrecency = (PRInt32) NS_ceilf(aVisitCount * pointsForSampledVisits);
>+  return NS_OK;
>+}
>+
>+nsresult 
>+nsNavHistory::CalculateFrecency(PRInt32 aTyped, PRInt32 aVisitCount, PRInt64 aPlaceId, nsCAutoString &aURL, PRInt32 *aFrecency)
>+{

nit: aPlaceId and aURL should be first params

>+  *aFrecency = 0;
>+
>+  nsresult rv;
>+
>+  nsCOMPtr<nsILivemarkService> lms = 
>+    do_GetService(NS_LIVEMARKSERVICE_CONTRACTID, &rv);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRBool isBookmark = PR_FALSE;
>+
>+  // determine if the place a (non-livemark item) bookmark and prevent

nit: s/a/is a/

>+    // this query can return multiple parent folders because
>+    // it is possible for the user to bookmark something that
>+    // is also a livemark item
>+    PRBool hasMore = PR_FALSE;
>+    while (NS_SUCCEEDED(mDBGetBookmarkParentsForPlace->ExecuteStep(&hasMore)) 
>+           && hasMore) {
>+      PRInt64 folderId;
>+      rv = mDBGetBookmarkParentsForPlace->GetInt64(0, &folderId);
>+      NS_ENSURE_SUCCESS(rv, rv);
>+
>+      PRBool parentIsLivemark;
>+      rv = lms->IsLivemark(folderId, &parentIsLivemark);
>+      NS_ENSURE_SUCCESS(rv, rv);

not stoked to expose implementation details, but we should skip xpconnect traversal and instead  go direct to annosvc here

> /**
>  * Autocomplete algorithm:
>  *
>- * The current algorithm searches history from now backwards to the oldest
>- * visit in chunks of time (AUTOCOMPLETE_SEARCH_CHUNK).  We currently
>- * do SQL LIKE searches of the search term in the url and title
>- * within in each chunk of time.  the results are ordered by visit_count 
>- * (and then visit_date), giving us poor man's "frecency".
>+ * Searches moz_places by frecency (in descending order)
>+ * in chunks (AUTOCOMPLETE_SEARCH_CHUNK_SIZE).  We currently
>+ * do SQL LIKE searches of the search term in the place title, place url
>+ * and bookmark titles (since a "place" can have multiple bookmarks)
>+ * within in each chunk.  The results are ordered by frecency.
>+ * Note, we exclude places with no frecency (0) because that indicates that...

fill-in

>+ * but places with frecency (-1) are included, as that means that...

fill-in
Status: NEW → ASSIGNED
Blocks: 412146
dietrich is taking over this beast, so I'm going to review his review and then review the changes he ends up making.

part 1:

> these prefs should not be on the browser prefbranch. should use "places" or
> "toolkit".

1)

No objection from me.

(on a related note, I see a comment in firefox.js that we should one day make a toolkit.js)

2)

>+  // XXX
>+  // on import / fx 2 migration, is the frecency work going to slow us down?
>+  // find out if we import history before or after bookmarks for fx 2

> bookmark migration happens after history

If so, that's very cool for us, because it means that when we call InsertBookmark() from nsPlacesImportExportService, we would be able to determine the proper frecency by the last view visits.

Will this slow us down?  It depends on how many bookmarks you have.  Some thoughts:

Calculating frecency when you do InsertBookmark() in the single case (like you star a page you've visited) is not an issue.

But calling InsertBookmark() (and then calculating frecency) from nsPlacesImportExportService many times on first run (from fx 2 -> fx 3) might really slow us down.

If it turns out to slow us down too much, we might have to figure out a way to not calculate frecency on bm import.

Note, on history import, we don't calculate frecency.  See nsNavHistory::AddPageWithVisit()

Not calculating frecency is acceptable, as long as we at least call FixInvalidFrecenciesForExcludedPlaces() after we import the bookmarks.  (but maybe not, if children of livemark items were not written to bookmarks.html in firefox 2!)

3)


>+  // we might want to skip this stuff, as well as the frecency work
>+  // caused by GetUrlIdFor() which calls InternalAddNewPage()
>+  // if we do skip this, after import, we will
>+  // need to call FixInvalidFrecenciesForExcludedPlaces()
>+  // we might need to call it anyways, if items aren't properly annotated
>+  // as livemarks feeds yet.

>if we don't have to do during the insert path, we shouldn't. it'll slow down
>import, migration, sync, etc, any batch operation really.

>hm, but that's a bit premature - we should try the synchronous calculation, and
>do a test to see how it affects transaction speed (do an import or something
>like that). if it turns out to be very slow, we can move to larger idle batches
>instead.

agreed.  see #2 above.  I think it really matters on how many bookmarks we have.

Keep in mind that after we migration fx 2 history we will end up with frecencies of all -1, and then the InsertBookmark() calls will recalcuate frecency for the bookmarks, but that is not ideal.

After history import, regardless of what we do about InsertBookmark() perf problems on import, we should call RecalculateFrecencies() [which is what we call on idle]

Calling that will re-calculate the frecencies for the top few "best" places.  We might even consider calling it a couple times

Search for "forcibly call the "on idle" timer here to do a little work" to see other areas where I think this might be necessary.


4)


>+  if (isBookmark) {
>+    // if it is a livemark item (the parent is a livemark),
>+    // we pass in false for isBookmark.  otherwise, unvisited livemark
>+    // items will appear in URL autocomplete before we visit them.
>+    PRBool parentIsLivemark;
>+    nsCOMPtr<nsILivemarkService> lms =
>+      do_GetService(NS_LIVEMARKSERVICE_CONTRACTID, &rv);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    rv = lms->IsLivemark(aFolder, &parentIsLivemark);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+
>+    isBookmark = !parentIsLivemark;

> hrm, considering that this will happen for every single bookmark, i think the
> perf effect of this outweighs the small amount of code cleanliness this
> provides. i'd recommend making the livemark anno name a #define here in the
> bookmark service and checking via the annotation service directly.

I have no objection to that, but I would recommend confirming it is a perf issue first.

5)

>   }
>-  else {
>-    rv = EnsureCurrentSchema(mDBConn);
>-    NS_ENSURE_SUCCESS(rv, rv);
>-  }

> didn't this change already land?

already addressed this.

6)

>+  // mDBVisitsForFrecency
>+  // visit_type <> 0 == undefined (see bug #375777 for details)
>+  rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>+    "SELECT visit_date, visit_type FROM moz_historyvisits "
>+    "WHERE place_id = ?1 AND visit_type <> 0 and visit_type <> ") +

> note that in bug 392399 it was found that "NOT IN (0,4)" was sometimes faster
> than this. would it be worth doing a quick test to see if it makes a difference
> here?

already addressed this.

7)

>+  // from older versions, unmigrated bookmarks might be hidden,
>+  // so we can't exclude hidden places (by doing "WHERE hidden <> 1")
>+  // from our query, as we want to calculate the freceny for those

s/freceny/frecency/

already addressed this.

8)

>+    // XXX todo
>+    // forcibly call the "on idle" timer here to do a little work
>+    // but the rest will happen on idle.
>+
>+    // create index for the frecency column
>+    // XXX create this index before or after updating?

>iirc we found that it was faster to create the index first. however, i don't
>remember which bug that contains those comments.

>reference in the source here:

>http://mxr.mozilla.org/seamonkey/source/toolkit/components/places/src/nsNavHistory.cpp#1119

>note to self: create that SQLite best-practices wiki page.

I may have already addressed this.  If I haven't, we should.

9)

+nsresult
>+nsNavHistory::FixInvalidFrecenciesForExcludedPlaces()
>+{
>+  // for every moz_place that has an invalid frecency (-1) and
>+  // begins with "place:" or is an unvisited child of a livemark feed,
>+  // set frecency to 0 so that it is excluded from url bar autocomplete.
>+  nsCOMPtr<mozIStorageStatement> dbUpdateStatement;
>+  nsresult rv = mDBConn->CreateStatement(
>+    NS_LITERAL_CSTRING("UPDATE moz_places SET frecency = 0 WHERE id IN (SELECT h.id FROM moz_places h JOIN moz_bookmarks b ON h.id = b.fk WHERE frecency = -1 AND (b.parent IN (SELECT annos.item_id FROM moz_anno_attributes attrs JOIN moz_items_annos annos ON attrs.id = annos.anno_attribute_id WHERE attrs.name = ?1) AND (SELECT visit_date FROM moz_historyvisits WHERE place_id = h.id AND visit_type NOT IN (0,4) LIMIT 1) is null) OR SUBSTR(h.url,0,6) = 'place:')"),

> nit: line length

addressed.

> we should confirm that the proper indexes are being taken advantage of here.

agreed.  perhaps marco can help?

10)

>+nsNavHistory::CalculateVisitCount(PRInt64 aPlaceId, PRInt32 *aVisitCount)
>+{
>+  // should we add "AND visit_type NOT IN (0,4)"
>+  // XXX aren't embed's affecting visit count already in ::AddVisit()?
>+  // I think we have a bug about not letting them do that.
>+  // XXX note, should we fix AddVisit() to recalc visit count by using
>+  // CalculateVisitCount()?

all 3 of these comments sound right

already addressed, see previous comments and the latest patch.
part 2 of the review of the review:

1)

>+  PRInt32 typed = 0;
>+  rv = mDBGetPlaceVisitStats->GetInt32(0, &typed);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRInt32 hidden = 0;
>+  rv = mDBGetPlaceVisitStats->GetInt32(1, &hidden);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRInt32 oldFrecency = 0;
>+  rv = mDBGetPlaceVisitStats->GetInt32(2, &oldFrecency);
>+  NS_ENSURE_SUCCESS(rv, rv);

>all three of these cols have default values. is there a reason you're
>initializing them to 0?

just defensive programming:  I prefer to initalize variables.

2)

>+nsresult
>+nsNavHistory::CalculateFrecencyInternal(PRInt32 aTyped, PRInt32 aVisitCount, PRInt64 aPlaceId, PRBool aIsBookmarked, PRInt32 *aFrecency)

>nit: aPlaceId should be first input param (all the other in-params are
>characteristics of it)

agreed, thanks.

3)


>+  // not the same logic above, as a single visit could not both

>s/not/not be/

thanks for spotting that.

4)

>+nsresult 
>+nsNavHistory::CalculateFrecency(PRInt32 aTyped, PRInt32 aVisitCount, PRInt64 aPlaceId, nsCAutoString &aURL, PRInt32 *aFrecency)
>+{

>nit: aPlaceId and aURL should be first params

agreed.

5)


>+  // determine if the place a (non-livemark item) bookmark and prevent

>nit: s/a/is a/

thanks.

6)

>+    // this query can return multiple parent folders because
>+    // it is possible for the user to bookmark something that
>+    // is also a livemark item
>+    PRBool hasMore = PR_FALSE;
>+    while (NS_SUCCEEDED(mDBGetBookmarkParentsForPlace->ExecuteStep(&hasMore)) 
>+           && hasMore) {
>+      PRInt64 folderId;
>+      rv = mDBGetBookmarkParentsForPlace->GetInt64(0, &folderId);
>+      NS_ENSURE_SUCCESS(rv, rv);
>+
>+      PRBool parentIsLivemark;
>+      rv = lms->IsLivemark(folderId, &parentIsLivemark);
>+      NS_ENSURE_SUCCESS(rv, rv);

>not stoked to expose implementation details, but we should skip xpconnect
>traversal and instead  go direct to annosvc here

No objection, but consider measuring if it is really a perf hot spot first.

7)

> /**
>  * Autocomplete algorithm:
>  *
>- * The current algorithm searches history from now backwards to the oldest
>- * visit in chunks of time (AUTOCOMPLETE_SEARCH_CHUNK).  We currently
>- * do SQL LIKE searches of the search term in the url and title
>- * within in each chunk of time.  the results are ordered by visit_count 
>- * (and then visit_date), giving us poor man's "frecency".
>+ * Searches moz_places by frecency (in descending order)
>+ * in chunks (AUTOCOMPLETE_SEARCH_CHUNK_SIZE).  We currently
>+ * do SQL LIKE searches of the search term in the place title, place url
>+ * and bookmark titles (since a "place" can have multiple bookmarks)
>+ * within in each chunk.  The results are ordered by frecency.
>+ * Note, we exclude places with no frecency (0) because that indicates that...

> fill-in

>+ * but places with frecency (-1) are included, as that means that...

> fill-in

frecency = 0 means "don't show this in autocomplete".  place: queries should have that, as should unvisited children of livemark feeds (that aren't bookmarked elsewhere, which is possible)

we exclude places with frecency = 0 when looking for frecencies to update (on idle.)  Because of that, we need to make sure that it is NOT possible for our frecency calculating code to return 0.

this is a big deal, and worthy of a spin off bug, so I will go log it now.

frecency = -1 means invalid, or really:  this thing will have a non-zero frecency, but we don't know what it is.  we search for these on idle to calculate.

frecency > 0, we've calculated the frecency.
>+  //
>+  // XXX todo
>+  // in the case of a frecency tie, break it with h.typed and h.visit_count
>+  // which is better than nothing.  but this is slow, so not doing it yet.
>   sql += NS_LITERAL_CSTRING(
>-    "(h.title LIKE ?3 ESCAPE '/' OR h.url LIKE ?3 ESCAPE '/') "
>-    "GROUP BY h.id ORDER BY h.typed DESC, h.visit_count DESC, MAX(v.visit_date) DESC;");
>+    "(b.title LIKE ?1 ESCAPE '/' OR " 
>+     "h.title LIKE ?1 ESCAPE '/' OR "
>+     "h.url LIKE ?1 ESCAPE '/') "

nit: 2 space indent

>+
>+    // determine the number of places we will search
>+    mozStorageStatementScoper scope(mDBAutoCompleteCountQuery);
>+
>+    PRBool hasCount = PR_FALSE;
>+    rv = mDBAutoCompleteCountQuery->ExecuteStep(&hasCount);
>     NS_ENSURE_SUCCESS(rv, rv);
>   
>-    if (hasMinVisit) {
>-      rv = dbSelectStatement->GetInt64(0, &mCurrentOldestVisit);
>+    if (hasCount) {
>+      rv = mDBAutoCompleteCountQuery->GetInt32(0, &mNumPlacesToSearch);
>       NS_ENSURE_SUCCESS(rv, rv);
>     }
> 

hm. instead of this using the count query, would it work to set LIMIT to (chunk size + 1), throw away the last result, and have AutoCompleteFullHistorySearch return a bool indicating whether the "there's more" record was found or not?

note: discussed over irc, sounds good

>-    if (!mCurrentOldestVisit) {
>-      // if we have no visits, use a reasonable value
>-      mCurrentOldestVisit = PR_Now() - USECS_PER_DAY;
>+    if (!mNumPlacesToSearch) {
>+      // if we have no places, don't bother searching
>+      // just report back that we have no matches
>+      mCurrentResult->SetSearchResult(
>+        nsIAutoCompleteResult::RESULT_NOMATCH);
>+      mCurrentResult->SetDefaultIndex(-1);
>+      
>+      rv = mCurrentResult->SetListener(this);
>+      NS_ENSURE_SUCCESS(rv, rv);
>+
>+      mCurrentListener->OnSearchResult(this, mCurrentResult);
>+      DoneSearching();
>+      return NS_OK;

[12:10pm] sspitzerMsgMe:     if (!mNumPlacesToSearch) {      // if we have no places, don't bother searching      // just report back that we have no matches
[12:10pm] sspitzerMsgMe: I think that we can remove that code.
[12:10pm] sspitzerMsgMe: because I think we will always have a place to search
[12:10pm] sspitzerMsgMe: even if you expire everything
[12:10pm] sspitzerMsgMe: because you should have one bookmark
[12:10pm] sspitzerMsgMe: right?
[12:10pm] sspitzerMsgMe: even if you didn't
[12:11pm] dietrich: unless you removed them all, yeah
[12:11pm] sspitzerMsgMe: it is not a common case
[12:11pm] dietrich: yep
[12:11pm] sspitzerMsgMe: so we can remove that if () block!
[12:11pm] sspitzerMsgMe: acutally the whole thing:
[12:11pm] sspitzerMsgMe: else if (!mCurrentSearchString.IsEmpty()) {
[12:11pm] sspitzerMsgMe: ...
[12:11pm] sspitzerMsgMe: }
[12:12pm] sspitzerMsgMe: and the nthe code that builds up that query
[12:12pm] sspitzerMsgMe: for moreChunksToSearch
[12:12pm] sspitzerMsgMe: lets make AutoCompleteFullHistorySearch have an out param
[12:12pm] sspitzerMsgMe: a bool
[12:13pm] sspitzerMsgMe: so we'll do:
[12:13pm] sspitzerMsgMe: AutoCompleteFullHistorySearch(&moreChunksToSearch);
[12:13pm] sspitzerMsgMe: see what I mean?
[12:14pm] sspitzerMsgMe: and no need to do +1 row
[12:14pm] sspitzerMsgMe: just if:
[12:14pm] sspitzerMsgMe: while (NS_SUCCEEDED(mDBAutoCompleteQuery->ExecuteStep(&hasMore)) && hasMore) { fails before we find any matches
[12:14pm] sspitzerMsgMe: we set the out param to false.

>@@ -500,32 +520,61 @@ nsNavHistoryExpire::FindVisits(PRTime aE
> 
> 
> // nsNavHistoryExpire::EraseVisits
> 
> nsresult
> nsNavHistoryExpire::EraseVisits(mozIStorageConnection* aConnection,
>     const nsTArray<nsNavHistoryExpireRecord>& aRecords)
> {
>-  // build a comma separated string of visit ids to delete
>+  // build a comma separated string of visit ids to delete.
>+  // also build a comma separated string of place ids to reset frecency and
>+  // visit_count.
>   nsCString deletedVisitIds;
>+  nsCString placeIds;
>   for (PRUint32 i = 0; i < aRecords.Length(); i ++) {
>-    // Do not add comma separator for the first entry
>+    // Do not add comma separator for the first visit id 
>     if (! deletedVisitIds.IsEmpty())
>       deletedVisitIds.AppendLiteral(", ");
>     deletedVisitIds.AppendInt(aRecords[i].visitID);
>+
>+    // Do not add comma separator for the first place id
>+    // NOTE: this can contain duplicate place ids
>+    if (! placeIds.IsEmpty())

nit: no space after ! (fix above as well)

>+
>+// buckets (in days) for frecency calculation
>+pref("browser.frecency.firstBucketCuttoff", 4);
>+pref("browser.frecency.secondBucketCuttoff", 14);
>+pref("browser.frecency.thirdBucketCuttoff", 31);
>+pref("browser.frecency.fourthBucketCuttoff", 90);

s/Cuttoff/Cutoff (and elsewhere used)
part 3 of the review of the review:

1)

>+  // XXX is this too expensive when updating livemarks?
>+  // UpdateBookmarkHashOnRemove() does a sanity check using
>+  // IsBookmarkedInDatabase(),  so it might not have actually
>+  // removed the bookmark.  should we have a boolean out param
>+  // for if we actually removed it, and use that to decide if we call
>+  // UpdateFrecency() and the rest of this code?

> can be spun off

will do.

2)

>+  // mDBVisitsForFrecency
>+  // NOTE: we are not limiting to visits with "visit_type NOT IN (0,4)"
>+  // because if we do that, mDBVisitsForFrecency would return no visits
>+  // for places with only embed (or undefined) visits.  That would
>+  // cause use to estimate a frecency based on what information we do have,
>+  // see CalculateFrecencyInternal().  that would result in a
>+  // non-zero frecency for a place with only
>+  // embedded visits, instead of a frececny of 0.

nit: s/frececny/frecency

yes, thanks

3)
>   if (oldIndexExists) {
>-    // wrap in a transaction for safety and performance
>-    mozStorageTransaction pageindexTransaction(aDBConn, PR_FALSE);
>-

>instead of removing the transaction, maybe wrap each migration scenario in it's
>own transaction?

that seems reasonable.

4)

>+  // for every moz_place that has an invalid frecency (-1) and
>+  // begins with "place:" or is an unvisited child of a livemark feed,
>+  // set frecency to 0 so that it is excluded from url bar autocomplete.
>+  nsCOMPtr<mozIStorageStatement> dbUpdateStatement;
>+  nsresult rv = mDBConn->CreateStatement(
>+    NS_LITERAL_CSTRING("UPDATE moz_places SET frecency = 0 WHERE "
>+      "id IN (SELECT h.id FROM moz_places h JOIN moz_bookmarks b ON "
>+      "h.id = b.fk WHERE frecency = -1 AND "
>+      "(b.parent IN (SELECT annos.item_id FROM moz_anno_attributes attrs "
>+      "JOIN moz_items_annos annos ON attrs.id = annos.anno_attribute_id "
>+      "WHERE attrs.name = ?1) AND (SELECT visit_date FROM moz_historyvisits "
>+      "WHERE place_id = h.id AND visit_type NOT IN (0,4) LIMIT 1) is null) "
>+      "OR SUBSTR(h.url,0,6) = 'place:')"),

> nit: can you break this up by subselect for better readability?

yes, that would help.

5)

>+  // XXX should we be calculating / setting frecency here?
>+

> hrm, doesn't really have any callers, maybe this should be removed altogether,
> if the equivalent functionality is achievable via other apis.

I agree, let's remove that from the interface and the code (and the test).  

For bug #392399, I removed a bunch of unused interfaces / code.

unless we think that people extending places will find it valuable (but unable to do the same with other apis)

6)

>+
>+    // only increment the visit_count if the transition was not EMBED
>+    // XXX what about TRANSITION_DOWNLOAD?
>+    if (aTransitionType != TRANSITION_EMBED)
>+      trueVisitCount++;

> hrm, a download is fairly different from a visit

>+
>+    // only increment the visit_count if the transition was not EMBED
>+    // XXX what about TRANSITION_DOWNLOAD?
>+    if (aTransitionType != TRANSITION_EMBED)
>+      visitCountForFrecency++;

> ditto


In our code, we've begun to treat TRANSITION_DOWNLOAD similar to TRANSITION_EMBED, but we need to finish that up.  For example, switch to NOT IN(0,4,7)

I'll log a spin off bug about this, and cc sdwilsh.

7)

>+
>+    // placeId could have a livemark item, so setting the frecency to -1
>+    // would cause it to show up in the url bar autocomplete
>+    // call FixInvalidFrecenciesForExcludedPlaces() to handle that scenario
>+    rv = FixInvalidFrecenciesForExcludedPlaces();
>+    NS_ENSURE_SUCCESS(rv, rv);

i'm worried this will really slow down bulk removes (eg: removing a selection
from history sidebar)

Yes, you are right to worry.  bulk removes are slow enough already, let's not make it worse.

for bulk removes, perhaps we should call that code after all the removes are done from controller.js / _removeRowsFromHistory().

8)


>@@ -2949,16 +3276,39 @@ nsNavHistory::RemovePagesFromHost(const
>   // - do not have EXPIRE_NEVER annotations
>   rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>     "DELETE FROM moz_places WHERE id IN (") + deletedPlaceIds +
>       NS_LITERAL_CSTRING(") AND id NOT IN (") + deletedPlaceIdsBookmarked +
>       NS_LITERAL_CSTRING(") AND id NOT IN (") + deletedPlaceIdsWithAnno +
>       NS_LITERAL_CSTRING(")"));
>   NS_ENSURE_SUCCESS(rv, rv);
>
>+  // reset the frecency the places we did not delete.  Note, we don't

nit: s/the places/for the places/

yes, thanks

9)

>+  // XXX when removing duplicate ids, should we be resetting recency to -1?
>+

> this is only called at history import, so i don't think so.

you are right, we can remove that.  currently, on import from history, to make it faster we don't calculate frecency (as mentioned above in the review of the review).

so, we don't need to reset the frecency to -1.

10)

>+nsresult
>+nsNavHistory::UpdateFrecency(PRInt64 aPlaceId, PRBool aIsBookmarked)
>+{
>+  mozStorageStatementScoper statsScoper(mDBGetPlaceVisitStats);
>+  nsresult rv = mDBGetPlaceVisitStats->BindInt64Parameter(0, aPlaceId);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  PRBool hasResults = PR_FALSE;
>+  rv = mDBGetPlaceVisitStats->ExecuteStep(&hasResults);
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  if (!hasResults) {
>+    NS_WARNING("attempting to update frecency for a bogus place");
>+    // before I added the check for itemType == TYPE_BOOKMARK
>+    // I hit this with aPlaceId of 0 (on import)
>+    return NS_OK;
>+  }

> to reduce db traffic and not regress perf of history import maybe we should
> check the place id value before calling UpdateFrecency.

we could do that, but note, nsNavHistory::UpdateFrecency() is not called on history import.  it is currently only called from bookmarks code.
edward wrote:

>Just to clarify about reusing search, chunking and inserting.. Reusing search
>is problematic because old results can contain pages from "old chunk #3" which
>shouldn't show up before "new chunk #1".

correct.

>However, we shouldn't need to insert "better ranked" results from "chunk #2"
>into "chunk #1" results because we can use the frecency chunking to figure out
>the high order bit.

for your work (for example, the word boundary improvement) does not have to be global, and we are ok with it being "per chunk", then you are right.

actually, your stuff *has* to be per chunk for performance reasons.
Blocks: 412211
Blocks: 412217
Blocks: 412218
Blocks: 412219
Attached patch fixed nits + other (obsolete) — Splinter Review
- nits fixed
- count query removed
- no longer calling FixInvalidFrecenciesForExcludedPlaces in RemoveURI
- transactionalized each chunk of migration work
- prefs re-branched

all the other issues look like they could be spun off, and handled after the initial landing.

the only other todo is deciding default idle count, so asking for first review on these changes.
Attachment #296438 - Attachment is obsolete: true
Attachment #296910 - Flags: review?(seth)
Attached patch patch for the sake of interdiff (obsolete) — Splinter Review
Attachment #297125 - Attachment is obsolete: true
dietrich, thanks for taking over here.  here is a review of the changes between the last patch and yours (see https://bugzilla.mozilla.org/attachment.cgi?oldid=297125&action=interdiff&newid=296910&headers=1 for the interdiff)

1)

"no longer calling FixInvalidFrecenciesForExcludedPlaces in RemoveURI"

You mean RemovePage() right?  If so, I think we want to keep that code.

say I have a livemark item that I visit.  since it is visited, it should show up in autocomplete.  when I visit, the frecency will be calculated to non-zero.

then, in the history sidebar, I remove that page, which calls RemovePage(), right?

after that happens, because it is non an unvisited livemark item, it should not be in ac results.

that is why we call FixInvalidFrecenciesForExcludedPlaces(), so that the unvisited livemark item will get a frecency of 0.

@@ -2793,16 +3095,41 @@ nsNavHistory::RemovePage(nsIURI *aURI)
         "DELETE FROM moz_places WHERE id = ?1"),
         getter_AddRefs(statement));
     NS_ENSURE_SUCCESS(rv, rv);
     rv = statement->BindInt64Parameter(0, placeId);
     NS_ENSURE_SUCCESS(rv, rv);
     rv = statement->Execute();
     NS_ENSURE_SUCCESS(rv, rv);
   }
+  else {
+    // because the moz_place was annotated (or it was a bookmark), 
+    // we didn't delete it, but we did delete the moz_visits
+    // so we need to reset the frecency.  Note, we don't
+    // reset the visit_count, as we use that in our "on idle"
+    // query to figure out which places to recalcuate frecency first.
+    rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
+        "UPDATE moz_places SET frecency = -1 WHERE id = ?1"),
+        getter_AddRefs(statement));
+    NS_ENSURE_SUCCESS(rv, rv);
+    rv = statement->BindInt64Parameter(0, placeId);
+    NS_ENSURE_SUCCESS(rv, rv);
+    rv = statement->Execute();
+    NS_ENSURE_SUCCESS(rv, rv);
+
+    // placeId could have a livemark item, so setting the frecency to -1
+    // would cause it to show up in the url bar autocomplete
+    // call FixInvalidFrecenciesForExcludedPlaces() to handle that scenario
+    rv = FixInvalidFrecenciesForExcludedPlaces();
+    NS_ENSURE_SUCCESS(rv, rv);
+
+    // XXX todo
+    // forcibly call the "on idle" timer here to do a little work
+    // but the rest will happen on idle.
+  }

2)

+ Seth Spitzer <sspitzer@sspitzer.org>

thanks for the credit!  but can you use sspitzer@mozilla.org instead?

3)

+  /*
+  sql = NS_LITERAL_CSTRING(
+    "SELECT COUNT(*) FROM moz_places h WHERE h.frecency <> 0 ");
 
-  nsresult rv = mDBConn->CreateStatement(sql, getter_AddRefs(mDBAutoCompleteQuery));
+  if (mAutoCompleteOnlyTyped)
+    sql += NS_LITERAL_CSTRING("AND h.typed = 1 ");
+
+  rv = mDBConn->CreateStatement(sql, getter_AddRefs(mDBAutoCompleteCountQuery));
   NS_ENSURE_SUCCESS(rv, rv);
+  */

please remove this instead of commenting it out.

4)

if (!aHasMoreResults)
  *aHasMoreResults = PR_TRUE;

aHasMoreResults is a pointer, and never null, right?

that change is also in an odd place in the patch, inside the code that checks for duplicates.  What if in a given chunk all the matches are duplicates?  we'd stop searching.

I think you something like want:

*aHasMoreResults = PR_FALSE;
while (NS_SUCCEEDED(mDBAutoCompleteQuery->ExecuteStep(&hasMore)) && hasMore) {
  *aHasMoreResults = PR_TRUE;
...
}

*aHasMoreResults = PR_FALSE; is not be necessary if the input param is guaranteed to be initialized as false, though.

please let me know what you think of these comments.
  
Comment on attachment 296910 [details] [diff] [review]
fixed nits + other

clearing request, please see my previous comments
Attachment #296910 - Flags: review?(seth)
(In reply to comment #68)

> say I have a livemark item that I visit.  since it is visited, it should show
> up in autocomplete.  when I visit, the frecency will be calculated to non-zero.
> 
> then, in the history sidebar, I remove that page, which calls RemovePage(),
> right?
> 
> after that happens, because it is non an unvisited livemark item, it should not
> be in ac results.
> 
> that is why we call FixInvalidFrecenciesForExcludedPlaces(), so that the
> unvisited livemark item will get a frecency of 0.

this seems like an uncommon scenario, which even if happens is only temporary (item will eventually be removed by the livemark svc). but yeah, removing it altogether is probably premature.

what if we call FixInvalidFrecenciesForExcludedPlaces() prior to updating frecency on idle?

either way, i'm going to leave it in for now, and spin it off to investigate in a new bug.

> 2)
> 
> thanks for the credit!  but can you use sspitzer@mozilla.org instead?
> 

fixed

> 3)
> 
> please remove this instead of commenting it out.

fixed

> 4)
> 
> if (!aHasMoreResults)
>   *aHasMoreResults = PR_TRUE;
> 
> aHasMoreResults is a pointer, and never null, right?
> 
> that change is also in an odd place in the patch, inside the code that checks
> for duplicates.  What if in a given chunk all the matches are duplicates?  we'd
> stop searching.

yep, you're right we only need to know if result count is non-zero, fixed.

Attached patch fixed comment #68 issues (obsolete) — Splinter Review
Attachment #296910 - Attachment is obsolete: true
Attachment #297209 - Flags: review?
Attachment #297209 - Flags: review? → review?(seth)
> what if we call FixInvalidFrecenciesForExcludedPlaces() prior to updating
> frecency on idle?

are you suggesting that because you'd rather do this work on idle, instead of where I'm currently doing it, for performance reasons?

not a bad idea, especially if I happen to be using it in any place where it is going to impact performance (like import or bulk delete, copy and paste of bookmarks, etc.  we should audit the callers to see.)

let's spin that off.

note, FixInvalidFrecenciesForExcludedPlaces() also attempts to fix place: urls.  on migration from 3b2 (for example) we want to call it.

I don't think any new place urls (from savings a search, for example) get a non zero frecency, but can you double check?

I think having a place: url show up is much worse than a unvisited livemark.
Comment on attachment 297209 [details] [diff] [review]
fixed comment #68 issues

r=sspitzer (on the changes dietrich has made since the original patch.)

(I <3 bugzilla interdiff!)
Attachment #297209 - Flags: review?(seth) → review+
Note: I noticed problems in the content of the awesomebar last night while doing more testing. I'm going to hold off on checking this in until I get tests completed to accompany it.
>  I noticed problems in the content of the awesomebar last nigh

can you elaborate on the problems you saw?
Attached image screenshot(s)
attachment shows screenshots of several different results for the same search string.
from comment #48:

> "i might spin this issue off to a separate bug."

doh.  spun off as bug #412730 – problems re-using autocomplete results due to bookmark titles and cached results

dietrich thinks this be what he is seeing in comment #74.
Blocks: 412734
Blocks: 412736
Attached patch disable result re-use (obsolete) — Splinter Review
- disable re-use of results for now (bug 412730). possibly will adversely affect ac perf in some instances
- minor change to only get record's properties if going to be used.
- comment clarification, bug # pointers
Attachment #297209 - Attachment is obsolete: true
Attachment #297506 - Flags: review?(seth)
Attached file interdiff for previous patch (obsolete) —
Comment on attachment 297506 [details] [diff] [review]
disable result re-use

r=sspitzer, based on the interdiff

note, your optimization to get strings after we check the mCurrentResultURLs hash looks good, and we might be able to do that in other places, like in nsNavHistory::AutoCompleteTagsSearch()

can you check?
Attachment #297506 - Flags: review?(seth) → review+
added suggested optimization. is minor, so carrying over r+.
Attachment #297506 - Attachment is obsolete: true
Attachment #297582 - Flags: review+
Attached patch interdiff for previous patch (obsolete) — Splinter Review
+
+    // Do not add comma separator for the first place id
+    // NOTE: this can contain duplicate place ids
+    if (!placeIds.IsEmpty())
+      placeIds.AppendLiteral(", ");
+    placeIds.AppendInt(aRecords[i].placeID);
   }

please, remove the space after the comma and avoid duplicates here...

also, insted of building a placeIds array and delete them, you could use a simple query that put frecency = -1 for all places without visits and with frecency <> -1 (note i'm not sure if we can have frecency for items without visits, i have still to read out all the patch).

i hope to have the time to take a look at the full patch soon
I have almost 10k rows in moz_places, so that would require 1000 idle timer triggers meaning almost 17 hours of me doing nothing in the browser for all the places to get a frecency.

I've been using a try server build of the latest patch and I still haven't seen any updates from the idle service. Maybe I'm just not idle enough, but at least I'm the type of user that leaves the browser open all day long. However, there could be users that open the browser, actively visit sites, then quit resulting in no idle timer updates... but at least the pages that the user visits would get frecency calculated.

Should there be a way to initially update the first 10 batches on migrate so that there's at least one full chunk of valid frecencies?

Fortunately, most of the interesting sites (at least for me) can be covered in the first 2 idle updates (because we update frecencies based on most visited).
It seems like the frecency is being calculated right away for urls that you type in which is useful for showing that entry the next time the user types it. But if that url redirects to another page, the target page won't have a frecency until idle.

e.g., http://bonsai.mozilla.org/ redirects to http://bonsai.mozilla.org/cvsqueryform.cgi but the latter doesn't show up right away if you type "bonsai".
I've put up a description of how frecency is calculated at http://developer.mozilla.org/en/docs/Places:Awesomebar.

(In reply to comment #83)
> please, remove the space after the comma and avoid duplicates here...

fixed locally

> also, insted of building a placeIds array and delete them, you could use a
> simple query that put frecency = -1 for all places without visits and with
> frecency <> -1 (note i'm not sure if we can have frecency for items without
> visits, i have still to read out all the patch).

Yes, we can have unvisited items with frecency <> -1, unvisited bookmarks for example.

Also, we already have the place ids. Would it be slower to have to re-find them?

(In reply to comment #84)
> I've been using a try server build of the latest patch and I still haven't seen
> any updates from the idle service.

Can you do |select frecency from moz_places where frecency <> -1 order by frecency desc| or something like that to see if any calculation has occured?

> Should there be a way to initially update the first 10 batches on migrate so
> that there's at least one full chunk of valid frecencies?

Maybe we could call RecalculateFrecencies() at the end of nsNavHistory::Init(), that way it'd be called after migration and before startup. Would probably hurt Ts though.
(In reply to comment #86)
> Can you do |select frecency from moz_places where frecency <> -1 order by
> frecency desc| or something like that to see if any calculation has occured?
Yes, there were updates after I left the computer, so the code is working. But I was just pointing out that some user behavior might lead to the idle timer never triggering, so all autocomplete results are based on -1 frecency.

But like I noted earlier, just a couple recalculations would be "good enough" for the commonly visited sites for users. So that's either 2-3 minutes of idle or a one-time triggering of recalculate when migrating.
> Also, we already have the place ids. Would it be slower to have to re-find
> them?

a conditional query should be faster than a id IN(list) query, so when possible it should be preferred (however perf depends on the number of ids in the list, if it is too large it will slow down more, while if it is shorter it will be probably faster). You have to iterate to build the list, and to check for every item for duplicates, while if you could identify clearly what unvisited frecency should be updated you could do with a cleaner (and maybe faster?) code, and that could also fix some bad frecency lost from other code. it's something that should be profiled with real numbers.
Attached patch latest (obsolete) — Splinter Review
changes in this patch:

- calculate frecency for new visits/uris *after* it is added to the db

- fix bug 412219 by explicitly setting frecency = -1 for visited uris w/ a calculated frecency of zero (also allows to remove the float cast which was causing frecencies to round too high)

- fixes marco's comment about dupe place ids

- adds basic test for single visit frecency for many of the permutations of available preferences

still need tests for:
- resulting title
- multiple visits patterns
- matching unvisited (bookmarked, typed)
- valid frecency = 0
- matching on valid frecency = -1
- matches outside of first chunk

still need to fix mardak's comment about "priming the pump" post-migration. this is important as it's the first impression of the frecency change for nightly testers and beta upgraders. todo:
- should fix tie breaker
- add calculation post-migration (maybe have RecalculateFrecencies() take a count param)
Attachment #297582 - Attachment is obsolete: true
Attached file interdiff for previous patch (obsolete) —
Attachment #297508 - Attachment is obsolete: true
Attachment #297583 - Attachment is obsolete: true
dietrich, some unsolicited review comments, I hope you don't mind:

1)

   // mDBUpdatePageVisitStats (see InternalAdd)
   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
       "UPDATE moz_places "
-      "SET visit_count = ?2, hidden = ?3, typed = ?4 "
+      "SET visit_count = ?2, hidden = ?3, typed = ?4, frecency = ?5 "
       "WHERE id = ?1"),
     getter_AddRefs(mDBUpdatePageVisitStats));
   NS_ENSURE_SUCCESS(rv, rv);

Actually, let's not do this change.  as of your current patch, we only call this with -1, so why bother, since
we are going to really update the frecency with our subsequent call to UpdateFrecency() right?

updating that value (and then doing it again) is a waste, right?  (and, it is indexed, so we want to avoid that.)


2)

+    nsCAutoString url;
+    rv = aURI->GetSpec(url);
+    NS_ENSURE_SUCCESS(rv, rv);

was that left over from some earlier printf code?  I think you can remove it.

3)

see comment #1, but we should remove this.

+    // frecency gets calculated below
+    rv = mDBUpdatePageVisitStats->BindInt32Parameter(4, -1);
+    NS_ENSURE_SUCCESS(rv, rv);

4)

+      if (pointsForSampledVisits == 0 && aVisitCount > numSampledVisits)
+        *aFrecency = -1;
+      else
+        *aFrecency = (PRInt32) NS_ceilf(aVisitCount * NS_ceilf(pointsForSampledVisits) / numSampledVisits);

it appear that the if () part of this is your fix for bug #412219

but I don't think it is correct.  aVisitCount is the total number of visits including types 0,4.

so if I have a place with 100 visits, and they are all type 4 (embed), my frecency should be 0.

pointsForSampledVisits would be 0, but 100 > 10, so we'd set frecency to -1, which would cause that
place to show up in autocomplete.

What I think we want is:

+      if (pointsForSampledVisits == 0 && trueVisitCount)
+        *aFrecency = -1;

as that means that, while you did calculate 0 from the sampled points (say the 10 most recent were of type 4),
you have at least 1 non 0,4 visit (otherwise trueVisitCount would be 0).


note, I don't think you have trueVisitCount at this point, so we might have to pass it through, or do something like:


+      // fix for bug #412219
+      if (!pointsForSampledVisits) {
+        PRInt32 trueVisitCount = 0;
+        rv = CalculateVisitCount(aPlaceId, PR_FALSE /* not for frecency */,
+                                 &trueVisitCount);
+        if (NS_SUCCEEDED(rv) && trueVisitCount)
+          *aFrecency = -1;

5)

+    if (deletedVisitIdsArray.IndexOf(aRecords[i].visitID) == -1) {
+      if (! deletedVisitIds.IsEmpty())

nit, remove that space between ! and deletedVisitIds

6)

> still need to fix mardak's comment about "priming the pump" post-migration.

agreed.

in the patch (unless you've removed them), look for things like:

+  // XXX todo
+  // forcibly call the "on idle" timer here to do a little work
+  // but the rest will happen on idle.

There are times, for example after import, where we should call the RecalculateFrecencies() to prime the pump.

See also http://wiki.mozilla.org/User:Sspitzer/GlobalFrecency (and search for "force")
> - should fix tie breaker

are you talking about a general tie breaker (on autocomplete query), or one just for what to recalculate?

if the former, watch out for perf hits (as discussed earlier in another bug)

if the latter, that's a good idea, even if it is a perf hit, because the user is idle, or this is done on migration, and figuring out the top (100?) sites to recalculate would improve the users experience.
Comment on attachment 298004 [details] [diff] [review]
latest

> nsNavHistory::CreateAutoCompleteQueries()
> {
>   nsCString sql = NS_LITERAL_CSTRING(
>-    "SELECT h.url, h.title, f.url, b.id, b.parent "
>+    "SELECT h.url, h.title, f.url, b.id, b.parent, b.title "
Would this be better as ..., IFNULL(b.title, h.title) _title, ...
>     "FROM moz_places h "
..
>+    "WHERE h.frecency <> 0 AND ");
..
>+    "(b.title LIKE ?1 ESCAPE '/' OR " 
>+     "h.title LIKE ?1 ESCAPE '/' OR "
>+     "h.url LIKE ?1 ESCAPE '/') "
_title LIKE ?1 to replace the first 2 lines

>+nsNavHistory::AutoCompleteFullHistorySearch(PRBool* aHasMoreResults)
> {
..
>+      PRInt64 parentId = 0;
>+      // only bother to fetch parent id and bookmark title 
>+      // if we have a bookmark (itemId != 0)
>+      if (itemId) {
>+        rv = mDBAutoCompleteQuery->GetInt64(kAutoCompleteIndex_ParentId, &parentId);
>+        NS_ENSURE_SUCCESS(rv, rv);
>+        rv = mDBAutoCompleteQuery->GetString(kAutoCompleteIndex_BookmarkTitle, entryBookmarkTitle);
>+        NS_ENSURE_SUCCESS(rv, rv);
No need for this separate bookmark title index.
..
>-      rv = mCurrentResult->AppendMatch(entryURL, entryTitle, 
>-        NS_ConvertUTF8toUTF16(faviconSpec), isBookmark ? NS_LITERAL_STRING("bookmark") : NS_LITERAL_STRING("favicon"));
>+
>+      // if the search string is in the bookmark title, show that in the
>+      // result (instead of the page title)
>+      PRBool matchInBookmarkTitle = itemId && 
>+        CaseInsensitiveFindInReadable(mCurrentSearchString, entryBookmarkTitle);
>+
>+      rv = mCurrentResult->AppendMatch(entryURL, 
>+        matchInBookmarkTitle ? entryBookmarkTitle : entryTitle, 
>+        NS_ConvertUTF8toUTF16(faviconSpec), 
>+        isBookmark ? NS_LITERAL_STRING("bookmark") : NS_LITERAL_STRING("favicon"));
This code change wouldn't be necessary either.

This would also help simplify matching against the title because there's only one title to consider. Locally, I've got it changed to IFNULL(b.title || ' — ' || h.title, h.title) _title, but that might just duplicate text if the user uses the default bookmark title which is the same as the page title.
Attached patch latest (obsolete) — Splinter Review
- fixes issues in comment 91
- tests resulting titles (could be more complete)
- tests matching unvisited bookmarks
- "primes the pump" after db creation or a migration
Attachment #298004 - Attachment is obsolete: true
Attached file interdiff for previous patch (obsolete) —
Attachment #298005 - Attachment is obsolete: true
Attachment #298311 - Flags: review?(seth)
(In reply to comment #93)
> >+      rv = mCurrentResult->AppendMatch(entryURL, 
> >+        matchInBookmarkTitle ? entryBookmarkTitle : entryTitle, 
> >+        NS_ConvertUTF8toUTF16(faviconSpec), 
> >+        isBookmark ? NS_LITERAL_STRING("bookmark") : NS_LITERAL_STRING("favicon"));
> This code change wouldn't be necessary either.
> 
> This would also help simplify matching against the title because there's only
> one title to consider. Locally, I've got it changed to IFNULL(b.title || ' —
> ' || h.title, h.title) _title, but that might just duplicate text if the user
> uses the default bookmark title which is the same as the page title.
> 

hrm, the downside of this change is non-match-specific result titles. not sure if we want this, but doesn't block landing frecency, so can you file a followup bug for it?
Comment on attachment 298312 [details]
interdiff for previous patch

chatted with dietrich over irc about why I think it must be:

if (NS_SUCCEEDED(rv) && trueVisitCount)

and can't be:

if (NS_SUCCEEDED(rv) && trueVisitCount > numSampledVisits)

imagine 11 visits, the last 10 are embeds.  truevisit count = 1, sampled visits = 10, but we want frecency of -1.

if I turn out to be correct, r=sspitzer (once you make that change.)
Attached patch for checkin (obsolete) — Splinter Review
Attachment #298311 - Attachment is obsolete: true
Attachment #298311 - Flags: review?(seth)
(In reply to comment #97)
> (From update of attachment 298312 [details])
> chatted with dietrich over irc about why I think it must be:
> 
> if (NS_SUCCEEDED(rv) && trueVisitCount)
> 
> and can't be:
> 
> if (NS_SUCCEEDED(rv) && trueVisitCount > numSampledVisits)
> 
> imagine 11 visits, the last 10 are embeds.  truevisit count = 1, sampled visits
> = 10, but we want frecency of -1.
> 
> if I turn out to be correct, r=sspitzer (once you make that change.)
> 

Yes, you're correct. I've updated the patch with only 3 changes:

1. the change in comment #97
2. change in the test to account for that change
3. un-rot from the changes in bug 340795
Attached patch more (obsolete) — Splinter Review
made the idle update interval a pref, so we won't be updating frecencies during the tinderbox tests. i'll add a supplemental patch for tweaking that pref in the tinder configs.
Attachment #298545 - Attachment is obsolete: true
Attachment #298614 - Flags: review?(seth)
Attachment #298312 - Attachment is obsolete: true
Comment on attachment 298614 [details] [diff] [review]
more

r=sspitzer, thanks dietrich!
Attachment #298614 - Flags: review?(seth) → review+
Attached patch fix idle timer (obsolete) — Splinter Review
Changed to properly support disabling frecency updating on idle by setting places.frecency.updateIdleTime to zero.
Attachment #298614 - Attachment is obsolete: true
Attachment #298667 - Flags: review?(seth)
Attached file interdiff for previous patch (obsolete) —
Attachment #298615 - Attachment is obsolete: true
I was thinking you were going to keep the existing patch, and do:

set_pref($pref_file, 'places.frecency.updateIdleTime', <super big number>);

but this way works, too.

Comment on attachment 298667 [details] [diff] [review]
fix idle timer

r=sspitzer, thanks for the interdiff.

I think this is better, as it gives people a definitely way to turn off frecency recalculation on idle.
Attachment #298667 - Flags: review?(seth) → review+
Attachment #298669 - Flags: review?(rhelmer) → review+
Checking in tools/tinderbox/build-seamonkey-util.pl;
/cvsroot/mozilla/tools/tinderbox/build-seamonkey-util.pl,v  <--  build-seamonkey-util.pl
new revision: 1.380; previous revision: 1.379
done

will check in the frecency patch once i figure out how to propagate the tinderbox change out to the test boxes.
Checking in browser/app/profile/firefox.js;
/cvsroot/mozilla/browser/app/profile/firefox.js,v  <--  firefox.js
new revision: 1.253; previous revision: 1.252
done
Checking in toolkit/components/places/public/nsILivemarkService.idl;
/cvsroot/mozilla/toolkit/components/places/public/nsILivemarkService.idl,v  <--  nsILivemarkService.idl
new revision: 1.10; previous revision: 1.9
done
Checking in toolkit/components/places/src/nsAnnotationService.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsAnnotationService.cpp,v  <--  nsAnnotationService.cpp
new revision: 1.34; previous revision: 1.33
done
Checking in toolkit/components/places/src/nsNavBookmarks.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavBookmarks.cpp,v  <--  nsNavBookmarks.cpp
new revision: 1.145; previous revision: 1.144
done
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.228; previous revision: 1.227
done
Checking in toolkit/components/places/src/nsNavHistory.h;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.h,v  <--  nsNavHistory.h
new revision: 1.121; previous revision: 1.120
done
Checking in toolkit/components/places/src/nsNavHistoryAutoComplete.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistoryAutoComplete.cpp,v  <--  nsNavHistoryAutoComplete.cpp
new revision: 1.30; previous revision: 1.29
done
Checking in toolkit/components/places/src/nsNavHistoryExpire.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistoryExpire.cpp,v  <--  nsNavHistoryExpire.cpp
new revision: 1.35; previous revision: 1.34
done
RCS file: /cvsroot/mozilla/toolkit/components/places/tests/unit/test_000_frecency.js,v
done
Checking in toolkit/components/places/tests/unit/test_000_frecency.js;
/cvsroot/mozilla/toolkit/components/places/tests/unit/test_000_frecency.js,v  <--  test_000_frecency.js
initial revision: 1.1
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Whiteboard: [work ongoing, wip patch]
Depends on: 413788
I can reproduce the Win2k3 test failure on my linux machine under valgrind. This patch should be backed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'm not sure about using the frecency score for this drop down.  Traditionally this is meant to be a list of sites that you very commonly go to, and the drop down allows you to revisit common sites using just the mouse.  However, with the current implementation my drop down contains a site that I have only been to once, for the first time today.  It has high recency, but very poor frequency, resulting in a moderate frecency.

What do people think of using this drop down for pure frequency, since we have the history menu and sidebar for pure recency?

Pure frequency will map better the previous behavior of this field, right? (I should note that I've never really used the drop down in my daily browsing).
shoot, wrong bug, please ignore the above comment.
Depends on: 413794
Backed out due to failing unit test. Please fix both bug 413788 and bug 413794 in your next iteration of the patch.
Blocks: 394066
No longer depends on: 394066
- load prefs prior to initializing the db (bug 413794)
- include the fix for bug 413788
Attachment #298667 - Attachment is obsolete: true
Attachment #298975 - Flags: review?(sayrer)
Attachment #298668 - Attachment is obsolete: true
Comment on attachment 298975 [details] [diff] [review]
patch w/ ts and unit'd prefs fixes

looks sane, let's ship it
Attachment #298975 - Flags: review?(sayrer) → review+
Blocks: 413944
Whiteboard: [ready to land]
relanded
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Whiteboard: [ready to land]
This seems to have regressed url autocomplete responsiveness, typing seems to lag as the results are looked up, whereas previously typing did not lag, even though the result lookup was about the same speed.
(In reply to comment #119)
> This seems to have regressed url autocomplete responsiveness, typing seems to
> lag as the results are looked up, whereas previously typing did not lag, even
> though the result lookup was about the same speed.
> 

see bug 414229 and friends.
Flags: in-testsuite+
Depends on: 414229
There is another regression detected when refreshing livemarks - see bug 417729.
Depends on: 417810
Depends on: 422944
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: