Replace "buckets-and-sampling" frecency algorithm with "exponential decay" algorithm
Categories
(Toolkit :: Places, enhancement, P3)
Tracking
()
People
(Reporter: jruderman, Unassigned)
References
()
Details
(Whiteboard: [MemShrink:P3][Snappy:P2])
Comment 1•14 years ago
|
||
| Reporter | ||
Updated•14 years ago
|
Comment 2•14 years ago
|
||
Updated•14 years ago
|
Comment 3•14 years ago
|
||
Updated•13 years ago
|
Updated•9 years ago
|
Comment 4•7 years ago
•
|
||
While evaluating whether to do this in https://github.com/mozilla/application-services, I took the time to elaborate several cases that I found vague or confusing in https://wiki.mozilla.org/User:Jesse/NewFrecency.
https://github.com/mozilla/application-services/issues/610#issuecomment-459908788
- while the old frecency also adapts to pages removals (by requesting a recalculation), this won't. Bumping down an unwanted match by the right amount may be harder.
This is addressed explicitly in that comment, since it's a big concern for sync. Inserting visits that took place in the past too.
- while the old frecency adapts to bonus changes (for example if a page is unstarred, its frecency gets bumped down) this won't. More generically, the bonuses (bookmarked, typed, ...) handling has to be figured out yet.
and
- while the old frecency can be rebuilt at any time, starting from the data in the database, this one can't be rebuilt, since it may be based on past and no more existing visits. A guessed one could be rebuilt with available data, but it may not be exactly the same.
Not listed explicitly, but at all times (with the exception of expiration, which can be handled in the case for bonus changes, but not for the 'rebuild everything' case) the frec_date column of each places entry should be freq_score_to_freq_date(sum(score_modifier_for_visit(visit.visit_type, visit.visit_date, ...))) (those functions are defined in that github link).
Then, for the 'rebuild' case it's just a matter of executing that in a query for all rows. For the bonus case change, this just means you need to subtract the values for the visits that have their bonus changed, and add the new values.
Edit:
but it may not be exactly the same.
It should be the same if you ignore
- expiration
- floating point roundoff
- weird timestamp shenanigans.
(Note that I'm not arguing this should be a higher priority or anything, just wanted to clarify those points now that I've gone through the trouble of figuring some of these details out)
Updated•3 years ago
|
Comment 5•3 years ago
|
||
This is a wontfix in this initial shape, but we are still planning to experiment with how Frecency is calculated, and introduce new algorithms pretty soon.
The reasons to not just use this were multiple: from storing exponentially growing values in the db, to keeping into account removals and bonuses.
In particular the idea of using times in the future to avoid decay is really sound and we're likely to use it, we'll also use some part of an exponential decay with a half-life concept.
We'll use new bugs for experiments implementation.
Description
•