Closed Bug 704977 Opened 13 years ago Closed 12 years ago

Awesomeness calculation should use nicer recency decay

Categories

(Firefox for Android Graveyard :: General, defect, P3)

ARM
Android
defect

Tracking

(firefox13 fixed, firefox14 fixed, blocking-fennec1.0 beta+, fennec11+)

RESOLVED FIXED
Firefox 14
Tracking Status
firefox13 --- fixed
firefox14 --- fixed
blocking-fennec1.0 --- beta+
fennec 11+ ---

People

(Reporter: johnath, Assigned: Margaret)

References

Details

Attachments

(4 files, 3 obsolete files)

Attached image Linear vs. e^(-x^2) (obsolete) —
Right now we calculate our version of frequency by taking visit count and then applying a multiplier based on the recency of the last visit. Basically

score = numVisits * (120-daysSinceLastVisit)

Blassey and I were discussing this in IRC, and I asserted that we wanted a different calculation for the date multiplier. In particular, we wanted the last couple days to all be relatively equivalent, but then it should fall off more quickly than linear (no reason for 3 month old stuff to be 30x more potent than 4 month old stuff).

After consulting with my local math ph.d, I learned that I wanted gaussian decay (e^-(x^2)).

See attachment:
- x axis is "days since last visit"
- y axis is multiplier to apply
- blue is current linear
- orange is gaussian decay with some constant factors to fit our calendars and such - 120*e^(-((x/30)^2))

Code that does the calculation currently lives here:
http://hg.mozilla.org/projects/birch/annotate/fec276ae77d5/mobile/android/base/AwesomeBarTabs.java#l520

I think we should change the calculation to use 120*e^(-((x/30)^2)) (though at this point the 120 is arbitrary, could just as easily be 100, or 10, or omitted altogether depending on how much we want recency to amplify visit count)
Also - we should give more weight to bookmarked sites. Probably also to sites that are open right now on one of your other computers. Possibly also to things you've shared.

But at least to bookmarks.
https://wiki.mozilla.org/User:Jesse/NewFrecency proposes a a similar exponential decay function for frecency, including a way to implement it efficiently.
(In reply to Matt Brubeck (:mbrubeck) from comment #2)
> https://wiki.mozilla.org/User:Jesse/NewFrecency proposes a a similar
> exponential decay function for frecency, including a way to implement it
> efficiently.

Yeah - very similar approach, though Jesse's still requires computing a sum of each visit's score, whereas we may not (??) have the luxury of storing such things using the system history APIs.

The approach in comment 0 is utterly at the mercy of the last visit, but it has the advantage of being in closed form, so as far as efficiency goes it ought to be O(1) (though in actual fact it won't be since it requires knowing the last visit to the page, which means history queries, which I hope are O(log N) on the size of the history. I think, though, that at least *one* hit to the history DB will be required of any awesomeness computation :)
After a bit of coefficient twiddling in a scratchpad, I'd assert that this gives us approximately what we'd want -- yesterday is almost as good as today, 3 days ago is still pretty strong, a week ago is lower but still solid. Visits from any time in the last month still get amplified, but older stuff flatlines.


function multiplier(age) {
 var score = Math.exp( -(Math.pow((age/15),2) ))
 return Math.max(1, 100*score);
}

multiplier(0); /* 100 */
multiplier(1); /* 99.55654174830929 */
multiplier(3); /* 96.07894391523232 */
multiplier(7); /* 80.43041560655723 */
multiplier(14);/* 41.84863060425645 */
multiplier(30);/* 1.8315638888734178 */
multiplier(60);/* 1 */


Of course, it may all be moot, because I don't think sqlite has built in pow and exp functions, and I suspect (??) that we're not in the mood to add new stored functions to the system sqlite?
OS: Mac OS X → Android
Hardware: x86 → ARM
Assignee: nobody → lucasr.at.mozilla
Priority: -- → P3
tracking-fennec: --- → 11+
Additionally - do we, on desktop, give extra weight in the results when your entered string matches the _beginning_ of a URL or title? Versus a set of characters in the middle of a string, which is less likely how a user is going to be looking for things.
Actually - that should be matching to the beginning of a word in the title or section of a URL.
I should note that the math in comment 4 is just a question of how to weight visit count - we should probably also be deliberate about how much weight to give a bookmark here. How should a bookmark weigh against a site you visited 5 times yesterday?

If I check my gut, I'd say

- 3 visits in the last day is worth more than a bookmark
- a bookmark is worth more than 1 visit last week, but probably not more than 1 visit yesterday

to that end, I suspect "being a bookmark" should be worth a 100 point bonus (presumably they will also get visit points, because presumably you also visit your bookmarked sites). 

Also, the code that does this now lives here: http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/db/LocalBrowserDB.java#94

The link in comment 0 dead ends, now that birch has been resorbed into the collective.
If it helps, Math people will cringe, but we used to use Lorentzians (or apparently Cauchy distributions to smarty math people) to approximate exponentials.

http://en.wikipedia.org/wiki/Cauchy_distribution

I think they fall off a little faster than Gaussians though, but something like:

 var score = 15^2 / (age^2 + 15^2)
I've opened Bug 721476 because we seem not to strain out duplicates in top sites or search results.
Blocks: 722709
blocking-fennec1.0: --- → ?
blocking-fennec1.0: ? → beta+
Assignee: lucasr.at.mozilla → margaret.leibovic
So, I started looking into this, and I'm trying to figure out where/how we'll compute this new score multiplier. As johnath mentioned in comment 4, we don't have built-in pow or exp functions in sqlite, so should we look into adding those? (Are we using our own sqlite now? Would this be easy or hard? I'm not familiar with how to go about doing this.)

Alternately, now that we're dependent on LocalDB, we could change our DB schema to add a frecency column, and compute these values in the background. This option sounds like it would be more complicated to implement, and we'd have to be careful to avoid regressing performance, but it might be better in the long run if it could lead to things like syncing frecency. (Did XUL fennec do that? To do that, would we need the same frecency calculation as desktop?)

Anyway, anyone care to advise on what our course of action should be here?
>So, I started looking into this, and I'm trying to figure out where/how we'll 
>compute this new score multiplier. As johnath mentioned in comment 4, we don't 
>have built-in pow or exp functions in sqlite, so should we look into adding those? 
>(Are we using our own sqlite now? Would this be easy or hard? I'm not familiar 
>with how to go about doing this.)

We can use our own SQLite, but you will likely need to add connection caching to it if you want to use it for performance sensitive stuff (right now we use it for compatibility with Gecko-originating databases).

Comment 8 explains how to approximate it with math that uses basic operations. This should allow you to do the calculation in current SQLite - if you choose to go that route.
Cauchy, in green, is a not-bad approximation, as Wes says, and only requires multiplication. I'd say ship it (with the rest of this bug's bookmark bonuses and such), it's gotta be an improvement over linear.
Attachment #576621 - Attachment is obsolete: true
One route could be to update a frecency score column in the history table on each URL visit. Then the query would be a simple SORT BY frecency. Need to investigate implications for sync though.
(In reply to Lucas Rocha (:lucasr) from comment #14)
> One route could be to update a frecency score column in the history table on
> each URL visit. Then the query would be a simple SORT BY frecency. Need to
> investigate implications for sync though.

I think that's a better long term play, if it's robust and not-sync-breaking. To me it feels like follow-up work, though. If my vague mathematical conjectures hold water, we can make things better right now by just replacing the math with different math. Do you agree?
(In reply to Johnathan Nightingale [:johnath] from comment #15)
> (In reply to Lucas Rocha (:lucasr) from comment #14)
> > One route could be to update a frecency score column in the history table on
> > each URL visit. Then the query would be a simple SORT BY frecency. Need to
> > investigate implications for sync though.
> 
> I think that's a better long term play, if it's robust and
> not-sync-breaking. To me it feels like follow-up work, though. If my vague
> mathematical conjectures hold water, we can make things better right now by
> just replacing the math with different math. Do you agree?

I agree.
(In reply to Johnathan Nightingale [:johnath] from comment #15)
> (In reply to Lucas Rocha (:lucasr) from comment #14)
> > One route could be to update a frecency score column in the history table on
> > each URL visit. Then the query would be a simple SORT BY frecency. Need to
> > investigate implications for sync though.
> 
> I think that's a better long term play, if it's robust and
> not-sync-breaking. To me it feels like follow-up work, though. If my vague
> mathematical conjectures hold water, we can make things better right now by
> just replacing the math with different math. Do you agree?

Agree. The only limitation we'd have is that the frecency calculation wouldn't be able to account for bookmarks though (as they're in a separate table). But, yes, that's a follow-up.
Via comment 4's scratchpad, here's what the updated math does:

function multiplier(age) {
// var score = Math.exp( -(Math.pow((age/15),2) ))
 var score = 225 / ( age*age + 225); 
 return Math.max(1, 100*score);
}

multiplier(0);  /* 100 */
multiplier(1);  /* 99.5575221238938 */
multiplier(3);  /* 96.15384615384616 */
multiplier(7);  /* 82.11678832116789 */
multiplier(14); /* 53.44418052256532 */
multiplier(30); /* 20 */
multiplier(60); /* 5.88235294117647 */
multiplier(90); /* 2.7027027027027026 */

More gradual fall off, but still decently steep.
Attached patch (1/2) Clean up filterAllSites (obsolete) — Splinter Review
The query in filterAllSites was really difficult to read, so I decided to break it apart using temp variables. There is some repetition here now, but I feel like it's worth it to understand what's going on. (Also, it was confusing that there was both a |urlFilter != null ? ... | and |urlFilter == null ? ...| in the query, so this makes it clearer what's going on in each case).
Attachment #605866 - Flags: review?(lucasr.at.mozilla)
This updates sortOrder to use the Cauchy approximation we've been talking about. Since we need bug 721731 before we can take bookmarks into account, we can split off that twiddling into a follow-up.
Attachment #605875 - Flags: review?(lucasr.at.mozilla)
Margaret, can you patch your new formula into the Profile Migrator? Look for the comment "// see BrowserDB.filterAllSites for this formula".

If you figure out a way to share this so we don't have to do it in the future, that's even more awesome.
Comment on attachment 605866 [details] [diff] [review]
(1/2) Clean up filterAllSites

Review of attachment 605866 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/db/LocalBrowserDB.java
@@ +123,5 @@
> +            int limit, CharSequence urlFilter) {
> +        // The history selection queries for sites with a url or title
> +        // containing the constraint string
> +        String historySelection = "(" + History.URL + " LIKE ? OR " +
> +                                        History.TITLE + " LIKE ?)";

final String

@@ +124,5 @@
> +        // The history selection queries for sites with a url or title
> +        // containing the constraint string
> +        String historySelection = "(" + History.URL + " LIKE ? OR " +
> +                                        History.TITLE + " LIKE ?)";
> +        String historySelectionArg = "%" + constraint.toString() + "%";

final String

@@ +129,5 @@
> +
> +        // ORDER BY is number of visits times a multiplier from 1 - 120 of how recently the site
> +        // was accessed with a site accessed today getting 120 and a site accessed 119 or more
> +        // days ago getting 1
> +        String sortOrder = History.VISITS + " * MAX(1, (" +

final String

@@ +133,5 @@
> +        String sortOrder = History.VISITS + " * MAX(1, (" +
> +                           History.DATE_LAST_VISITED + " - " + System.currentTimeMillis() + ") / 86400000 + 120) DESC";
> +
> +        Cursor c;
> +        if (urlFilter != null) {

I'd prefer something like:

String selection = historySelection;
String[] selectionArgs = new String[] { historySelectionArg, historySelectionArg};
if (urlFilter != null) {
    selection = DBUtils.concatenateWhere(selection, "(" + History.URL + " NOT LIKE ?));
    selectionArgs = DBUtils.appendSelectionArgs(selectionArgs, new String[] { urlFilter.toString() });
}

Cursor c = cr.query(historyUriWithLimit(limit),
                    projection,
                    selection,
                    selectionArgs,
                    sortOrder);

Simpler and no code duplication.
Attachment #605866 - Flags: review?(lucasr.at.mozilla) → review-
Comment on attachment 605875 [details] [diff] [review]
(2/2) Sort by new frencency calculation

Review of attachment 605875 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.

::: mobile/android/base/db/LocalBrowserDB.java
@@ +132,5 @@
> +        // Since we're limited by the math we can do with sqlite, we're calculating this
> +        // approximation using the Cauchy distribution: multiplier = 15^2 / (age^2 + 15^2).
> +        // Using 15 as our scale parameter, we get a constant 15^2 = 225. Following this math,
> +        // frecencyScore = numVisits * max(1, 100 * 225 / (age*age + 225)). (See bug 704977)
> +        String age = "(" + History.DATE_LAST_VISITED + " - " + System.currentTimeMillis() + ") / 86400000";

final String
Attachment #605875 - Flags: review?(lucasr.at.mozilla) → review+
Before I forget, please file a follow-up bug to write tests for the frecency code.
Depends on: 736171
Thanks for the DBUtils tip. That makes this nicer.
Attachment #605866 - Attachment is obsolete: true
Attachment #606296 - Flags: review?(lucasr.at.mozilla)
Attachment #605875 - Attachment is obsolete: true
I wasn't sure if there was a way to factor out the age calculation like I did in my other patch, so I just did it inline here.
Attachment #606298 - Flags: review?(gpascutto)
Comment on attachment 606296 [details] [diff] [review]
(1/3) Clean up filterAllSites (v2)

Review of attachment 606296 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.
Attachment #606296 - Flags: review?(lucasr.at.mozilla) → review+
Comment on attachment 606298 [details] [diff] [review]
(3/3) Update frecency calcuation in profile migrator

Review of attachment 606298 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/ProfileMigrator.java
@@ +152,5 @@
>          "       places.title            AS p_title, "     +
>          "       MAX(history.visit_date) AS h_date, "      +
>          "       COUNT(*) AS h_visits, "                   +
>          // see BrowserDB.filterAllSites for this formula
> +        "       MAX(1, 100 * 225 / (" + 

nit: trailing whitespace

@@ +446,5 @@
>              try {
>                  final String[] queryParams = new String[] {
>                      /* current time */
>                      Long.toString(System.currentTimeMillis()),
> +                    Long.toString(System.currentTimeMillis()),

nit: factor this out. Although it doesn't matter, there's subtleties here like the times not necessarily being equal.
Attachment #606298 - Flags: review?(gpascutto) → review+
Comment on attachment 606296 [details] [diff] [review]
(1/3) Clean up filterAllSites (v2)

[Approval Request Comment]
Mobile-only. Blocking fennec-1.0.
Attachment #606296 - Flags: approval-mozilla-aurora?
Comment on attachment 606297 [details] [diff] [review]
(2/3) Sort by new frencency calculation (rebased)

[Approval Request Comment]
Mobile-only. Blocking fennec-1.0.
Attachment #606297 - Flags: approval-mozilla-aurora?
Comment on attachment 606298 [details] [diff] [review]
(3/3) Update frecency calcuation in profile migrator

[Approval Request Comment]
Mobile-only. Blocking fennec-1.0.
Attachment #606298 - Flags: approval-mozilla-aurora?
Comment on attachment 606296 [details] [diff] [review]
(1/3) Clean up filterAllSites (v2)

[Triage Comment]
Mobile only - approved for Aurora 13.
Attachment #606296 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #606297 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #606298 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: