frecency algorithm is high-maintenance

RESOLVED FIXED in mozilla1.9.2a1

Status

()

Toolkit
Places
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: dietrich, Unassigned)

Tracking

({meta})

Trunk
mozilla1.9.2a1
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 obsolete attachment)

(Reporter)

Description

9 years ago
jesse had an idea (https://bugzilla.mozilla.org/show_bug.cgi?id=449124#c2):

> I'm guessing the frequent computation and database updates are necessary
> because of the use of "buckets"
> (http://developer.mozilla.org/en/docs/The_Places_frecency_algorithm).  What if
> the value of a visit decayed continuously (e.g. 3% per day) instead?  Then the
> total value could be stored as "the time at which the value was or will be 100"
> and no updating would be necessary except upon a new visit.

doing something like this would reduce cpu usage, db fragmentation and filesystem hits.

Comment 1

9 years ago
When you say "per day" do you mean.. 1) keep track of the last time we decayed everything then 2) check for that last decay time on idle and see if the timediff is over 24 hours, do a single decay then 3) update last time decayed.

I suppose we could just do a plain |UPDATE moz_places SET frecency = frecency * .95 WHERE frecency > 0| without moz_places_temp because we're idle anyway.

Comment 2

9 years ago
The only issue here is that on migrate, we currently leave things as uncalculated frecencies -1, and those would show up lower in the list until they're visited by the user.

Alternatively, we can be more aggressive on migrate and calculate everything as a one time hit.
(In reply to comment #1)
> I suppose we could just do a plain |UPDATE moz_places SET frecency = frecency *
> .95 WHERE frecency > 0| without moz_places_temp because we're idle anyway.
We need to get away from that because of bugs like bug 449124.

Comment 4

9 years ago
I was referring to the new "24 hour idle check" not the current "1 minute idle check".

Comment 5

9 years ago
Created attachment 343994 [details] [diff] [review]
proposal v1

Here's a quick stab at what I was describing earlier if you'd like to take a look dietrich. It consists of 2 patches:

1) remove the existing idle stuff
2) add in a once-a-day-on-idle notification that triggers navhistory to decay stuff
Attachment #343994 - Flags: review?(dietrich)

Comment 6

9 years ago
Comment on attachment 343994 [details] [diff] [review]
proposal v1

>-// Perform expiration after 5 minutes of idle time, repeating.
>-// 5 minutes = 300 seconds = 300000 milliseconds
>-#define EXPIRE_IDLE_TIME_IN_MSECS (300000)
>-// Amount of items to expire at idle time.
>-#define MAX_EXPIRE_RECORDS_ON_IDLE 200
>-  if (idleTime > EXPIRE_IDLE_TIME_IN_MSECS) {
>-    (void)mExpire.ExpireItems(MAX_EXPIRE_RECORDS_ON_IDLE, &dummy);

Oh, I forgot to mention that even without this expire-up-to-200-every-5-minutes thing, there still exists a expire-up-to-6-pages-every-page-visit thing happening.

We could get some of the original behavior by expiring a large number of pages on the once-a-day-idle notification.

There's still the issue of -1 frecency pages staying as -1 until visited.
i think that the original idea from jesse was to only hold the bonus values into frecency, but add a new last_visit_date column, when getting values from the db we should then calculate the frecency as (frecency - frecency * 0.03 * (today - last_visit_date)), this way we should only update last_visit_date on new visit, and bonus on itemAdded/removed.

apart that, probably this kind of stuff should be done with an async statement in chunks rather than on idle, but since it's once per day should be better than actual behaviour.

Comment 8

9 years ago
(In reply to comment #7)
> i think that the original idea from jesse was to only hold the bonus values
> into frecency, but add a new last_visit_date column, when getting values from
> the db we should then calculate the frecency as (frecency - frecency * 0.03 *
> (today - last_visit_date)), this way we should only update last_visit_date on
> new visit, and bonus on itemAdded/removed.
That sounds less than ideal because there would no longer be an index to sort on to quickly fetch results and we'll have to dynamically compute the ranking for each query.

> apart that, probably this kind of stuff should be done with an async statement
> in chunks rather than on idle, but since it's once per day should be better
> than actual behaviour.
Sounds like a good idea similar to the async autocomplete. Async expire and no need for partial expire timers and such.
(In reply to comment #8)
> That sounds less than ideal because there would no longer be an index to sort
> on to quickly fetch results and we'll have to dynamically compute the ranking
> for each query.

yes that's an issue, i only wanted to point to what i thought it was the original idea (or what i got from that).

For invalid frecencies (-1) we should probably have a timer that updates a bunch of them every X minutes. I think we can obtain something like that with what i've posted here: https://bugzilla.mozilla.org/show_bug.cgi?id=431558#c15
or we could prepare a bunch of UPDATE statements and pass them to dbConn->executeAsync

Updated

9 years ago
Attachment #343994 - Flags: review?(dietrich) → review?(edilee)

Updated

9 years ago
Depends on: 476254

Updated

9 years ago
Depends on: 476297

Updated

9 years ago
Depends on: 476298

Updated

9 years ago
Depends on: 476299

Comment 11

9 years ago
I've hijacked this bug and made it a meta bug for the 3-part fix of..

bug 476297 bug 476298 bug 476299
Component: Places → Places
Keywords: meta
Product: Firefox → Toolkit
QA Contact: places → places
Target Milestone: --- → mozilla1.9.2a1

Updated

9 years ago
Attachment #343994 - Attachment is obsolete: true
Attachment #343994 - Flags: review?(edilee)

Updated

9 years ago
Blocks: 449124

Comment 12

9 years ago
All dependent bugs are fixed, so now we have an idle-daily notification that triggers recalculation of invalid frecencies as well as estimation of valid ones.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.