Closed Bug 458801 Opened 11 years ago Closed 11 years ago

frecency algorithm is high-maintenance

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: dietrich, Unassigned)

References

Details

(Keywords: meta)

Attachments

(1 obsolete file)

jesse had an idea (https://bugzilla.mozilla.org/show_bug.cgi?id=449124#c2):

> I'm guessing the frequent computation and database updates are necessary
> because of the use of "buckets"
> (http://developer.mozilla.org/en/docs/The_Places_frecency_algorithm).  What if
> the value of a visit decayed continuously (e.g. 3% per day) instead?  Then the
> total value could be stored as "the time at which the value was or will be 100"
> and no updating would be necessary except upon a new visit.

doing something like this would reduce cpu usage, db fragmentation and filesystem hits.
When you say "per day" do you mean.. 1) keep track of the last time we decayed everything then 2) check for that last decay time on idle and see if the timediff is over 24 hours, do a single decay then 3) update last time decayed.

I suppose we could just do a plain |UPDATE moz_places SET frecency = frecency * .95 WHERE frecency > 0| without moz_places_temp because we're idle anyway.
The only issue here is that on migrate, we currently leave things as uncalculated frecencies -1, and those would show up lower in the list until they're visited by the user.

Alternatively, we can be more aggressive on migrate and calculate everything as a one time hit.
(In reply to comment #1)
> I suppose we could just do a plain |UPDATE moz_places SET frecency = frecency *
> .95 WHERE frecency > 0| without moz_places_temp because we're idle anyway.
We need to get away from that because of bugs like bug 449124.
I was referring to the new "24 hour idle check" not the current "1 minute idle check".
Attached patch proposal v1 (obsolete) — Splinter Review
Here's a quick stab at what I was describing earlier if you'd like to take a look dietrich. It consists of 2 patches:

1) remove the existing idle stuff
2) add in a once-a-day-on-idle notification that triggers navhistory to decay stuff
Attachment #343994 - Flags: review?(dietrich)
Comment on attachment 343994 [details] [diff] [review]
proposal v1

>-// Perform expiration after 5 minutes of idle time, repeating.
>-// 5 minutes = 300 seconds = 300000 milliseconds
>-#define EXPIRE_IDLE_TIME_IN_MSECS (300000)
>-// Amount of items to expire at idle time.
>-#define MAX_EXPIRE_RECORDS_ON_IDLE 200
>-  if (idleTime > EXPIRE_IDLE_TIME_IN_MSECS) {
>-    (void)mExpire.ExpireItems(MAX_EXPIRE_RECORDS_ON_IDLE, &dummy);

Oh, I forgot to mention that even without this expire-up-to-200-every-5-minutes thing, there still exists a expire-up-to-6-pages-every-page-visit thing happening.

We could get some of the original behavior by expiring a large number of pages on the once-a-day-idle notification.

There's still the issue of -1 frecency pages staying as -1 until visited.
i think that the original idea from jesse was to only hold the bonus values into frecency, but add a new last_visit_date column, when getting values from the db we should then calculate the frecency as (frecency - frecency * 0.03 * (today - last_visit_date)), this way we should only update last_visit_date on new visit, and bonus on itemAdded/removed.

apart that, probably this kind of stuff should be done with an async statement in chunks rather than on idle, but since it's once per day should be better than actual behaviour.
(In reply to comment #7)
> i think that the original idea from jesse was to only hold the bonus values
> into frecency, but add a new last_visit_date column, when getting values from
> the db we should then calculate the frecency as (frecency - frecency * 0.03 *
> (today - last_visit_date)), this way we should only update last_visit_date on
> new visit, and bonus on itemAdded/removed.
That sounds less than ideal because there would no longer be an index to sort on to quickly fetch results and we'll have to dynamically compute the ranking for each query.

> apart that, probably this kind of stuff should be done with an async statement
> in chunks rather than on idle, but since it's once per day should be better
> than actual behaviour.
Sounds like a good idea similar to the async autocomplete. Async expire and no need for partial expire timers and such.
(In reply to comment #8)
> That sounds less than ideal because there would no longer be an index to sort
> on to quickly fetch results and we'll have to dynamically compute the ranking
> for each query.

yes that's an issue, i only wanted to point to what i thought it was the original idea (or what i got from that).

For invalid frecencies (-1) we should probably have a timer that updates a bunch of them every X minutes. I think we can obtain something like that with what i've posted here: https://bugzilla.mozilla.org/show_bug.cgi?id=431558#c15
or we could prepare a bunch of UPDATE statements and pass them to dbConn->executeAsync
Attachment #343994 - Flags: review?(dietrich) → review?(edilee)
Depends on: 476254
Depends on: 476297
Depends on: 476298
Depends on: 476299
I've hijacked this bug and made it a meta bug for the 3-part fix of..

bug 476297 bug 476298 bug 476299
Keywords: meta
Product: Firefox → Toolkit
QA Contact: places → places
Target Milestone: --- → mozilla1.9.2a1
Attachment #343994 - Attachment is obsolete: true
Attachment #343994 - Flags: review?(edilee)
Blocks: 449124
All dependent bugs are fixed, so now we have an idle-daily notification that triggers recalculation of invalid frecencies as well as estimation of valid ones.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.