Closed Bug 704025 (newfrecency) Opened 13 years ago Closed 1 year ago

Replace "buckets-and-sampling" frecency algorithm with "exponential decay" algorithm

Categories

(Toolkit :: Places, enhancement, P3)

enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jruderman, Unassigned)

References

()

Details

(Whiteboard: [MemShrink:P3][Snappy:P2])

Using exponential decay would reduce memory use on visit, remove the need for idle-daily recomputation, and probably improve awesomebar results.

Details: https://wiki.mozilla.org/User:Jesse/NewFrecency
From a MemShrink POV this doesn't seem like a huge deal, esp. with bugs like 703427 possibly in the pipeline.  So we've made it a MemShrink:P2.  

It does sound like a good idea in general -- the current algorithm is very ad hoc.  But it's not clear to me who would implement it and how it would be evaluated.
Whiteboard: [MemShrink][Snappy] → [MemShrink:P2][Snappy]
The advantages of this approach would be many:
- The current painful point in visits addition is frecency recalculation, so this may speed up visits addition
- The recalculation on idle-daily may be uneffective, for all those users who don't hit idle. Mobile users for example, but other categories may exist.
- It's cheap, so would be suitable for all platforms, included Mobile.
- Expiring old visits cannot "regress" frecency.

There are also downsides:
- it needs careful testing, to check how well it matches compared to the old one.
- while the old frecency also adapts to pages removals (by requesting a recalculation), this won't. Bumping down an unwanted match by the right amount may be harder.
- while the old frecency adapts to bonus changes (for example if a page is unstarred, its frecency gets bumped down) this won't. More generically, the bonuses (bookmarked, typed, ...) handling has to be figured out yet.
- while the old frecency can be rebuilt at any time, starting from the data in the database, this one can't be rebuilt, since it may be based on past and no more existing visits. A guessed one could be rebuilt with available data, but it may not be exactly the same.
Alias: newfrecency
Marking as [Snappy:P2] because I don't think this is a huge source of pauses.
Whiteboard: [MemShrink:P2][Snappy] → [MemShrink:P2][Snappy:P2]
Whiteboard: [MemShrink:P2][Snappy:P2] → [MemShrink:P3][Snappy:P2]
Priority: -- → P3

While evaluating whether to do this in https://github.com/mozilla/application-services, I took the time to elaborate several cases that I found vague or confusing in https://wiki.mozilla.org/User:Jesse/NewFrecency.

https://github.com/mozilla/application-services/issues/610#issuecomment-459908788

  • while the old frecency also adapts to pages removals (by requesting a recalculation), this won't. Bumping down an unwanted match by the right amount may be harder.

This is addressed explicitly in that comment, since it's a big concern for sync. Inserting visits that took place in the past too.

  • while the old frecency adapts to bonus changes (for example if a page is unstarred, its frecency gets bumped down) this won't. More generically, the bonuses (bookmarked, typed, ...) handling has to be figured out yet.

and

  • while the old frecency can be rebuilt at any time, starting from the data in the database, this one can't be rebuilt, since it may be based on past and no more existing visits. A guessed one could be rebuilt with available data, but it may not be exactly the same.

Not listed explicitly, but at all times (with the exception of expiration, which can be handled in the case for bonus changes, but not for the 'rebuild everything' case) the frec_date column of each places entry should be freq_score_to_freq_date(sum(score_modifier_for_visit(visit.visit_type, visit.visit_date, ...))) (those functions are defined in that github link).

Then, for the 'rebuild' case it's just a matter of executing that in a query for all rows. For the bonus case change, this just means you need to subtract the values for the visits that have their bonus changed, and add the new values.

Edit:

but it may not be exactly the same.

It should be the same if you ignore

  • expiration
  • floating point roundoff
  • weird timestamp shenanigans.

(Note that I'm not arguing this should be a higher priority or anything, just wanted to clarify those points now that I've gone through the trouble of figuring some of these details out)

Severity: normal → S3

This is a wontfix in this initial shape, but we are still planning to experiment with how Frecency is calculated, and introduce new algorithms pretty soon.
The reasons to not just use this were multiple: from storing exponentially growing values in the db, to keeping into account removals and bonuses.
In particular the idea of using times in the future to avoid decay is really sound and we're likely to use it, we'll also use some part of an exponential decay with a half-life concept.
We'll use new bugs for experiments implementation.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.