Closed Bug 1164660 Opened 9 years ago Closed 9 years ago

Sync doesn't sync enough visits

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: rnewman, Assigned: rnewman)

References

Details

Attachments

(1 file)

Desktop Firefox syncs a very limited set of history:

* Within the last 30 days:

  getAllIDs: function HistStore_getAllIDs() {
    // Only get places visited within the last 30 days (30*24*60*60*1000ms)
    this._allUrlStm.params.cutoff_date = (Date.now() - 2592000000) * 1000;

* No more than 5000 items.

* No more than ten visits per item:

      "SELECT visit_type type, visit_date date " +
      "FROM moz_historyvisits " +
      "WHERE place_id = (SELECT id FROM moz_places WHERE url = :url) " +
      "ORDER BY date DESC LIMIT 10");


This obviously results in a very warped frecency for new clients; they'll see no more than ten visits for any history items, so your top sites on your desktop won't be top on the new device.


We should dramatically increase that visit limit.


Additionally, the visit format makes deletion of individual visits -- as you might expect from Clear Recent History -- utterly impossible: these short lists of visits are 'open world', where absence does not imply non-existence.

Making these records comprehensive is probably impossible, so we should consider adding a virtual 'VISIT_TYPE_DELETED' to allow for this signaling.
CC mak FYSA.
could you please better clarify what that TYPE_DELETED thing would do? I'm not sure I got it.
Right now you can't delete an individual visit via Sync -- you can only delete every visit (including those in the future!) by deleting the record.

This would be a fake type that Sync would understand; when applying a record with a list of visits, the ones with a normal type would be added as usual, and the ones with the special 'deleted' type code would instead be *removed* if locally present.
Trivial change from 10 to 20.

Notably this will increase the size of some history records... but not by as much as you might think.

My profile has 104,000 visited places. 670 of those have more than 10 visits, of which about 450 have more than 15, and only 250 are 20 or more. That is: this will only affect top sites, which is where the value is!

My profile only has 250 places with more than 10 visits with any visits in the last sixty days, which is the range we sync. That's one quarter of one percent of my DB.

For a user with 5,000 history items on the server, we might expect fewer than 50 records to grow, probably to a maximum of a few hundred (because frequent visits will be uploaded more). They'll grow by no more than about 450 bytes each, so the typical impact will be approximately 15KB per user.

Worst case -- 250 places with 20 visits each -- that's an extra 112KB.

CCing bobm and rfkelly FYSA re storage impact.
Attachment #8615751 - Flags: review?(mak77)
Attachment #8615751 - Flags: feedback?(rfkelly)
Attachment #8615751 - Flags: feedback?(bobm)
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
Comment on attachment 8615751 [details] [diff] [review]
Sync more visits. v1

Review of attachment 8615751 [details] [diff] [review]:
-----------------------------------------------------------------

Seems sane to me from storage POV, but I'll defer to :bobm's opinion on these matters
Attachment #8615751 - Flags: feedback?(rfkelly) → feedback+
Comment on attachment 8615751 [details] [diff] [review]
Sync more visits. v1

Review of attachment 8615751 [details] [diff] [review]:
-----------------------------------------------------------------

So, frecency calculation considers only the last N visits where N is defined by places.frecency.numVisits pref, that defaults to 10, and I suppose this is the reason why 10 was chosen in the past.
But those are only considered to calculate bonuses, indeed we also considers last visit date (recency) and visit count (frequency).

The increase of synced visits from 10 to 20 will somehow help, by reducing the number of pages that will fight for the first position in the awesomebar.
I have 286 entries with more than 20 visits, so this sort-of confirms the data for your profile (interesting stats!).
Basically instead of having 600 pages trying to get my attention, it will be "just" 200.

This is still a bad approximation of frecency.
The accuracy could clearly be improved further. In an ideal world we should not flatten on a number, but rather try to reproduce the visits distribution. Basically take the distribution of visits for the records you want to migrate, then translate it to the max number of visits you can sync, and recalculate the limit per each page.
It should be possible to build a simple equation that represents the distribution of the first N most visited pages, and then just pass in the original visit_count of a page to get out a squashed value (capped between 1 and max_visits_we_can_store). This would only flatten frecency of pages that fall out of the caps, so to be accurate the cap should be a little bit higher.

But I agree that is likely not what you were looking for here, so I'll leave the brainstorming to a follow-up.

For now we can take this simple change.

Should you also update the query in tps accordingly, or are we fine testing with a smaller limit?
http://mxr.mozilla.org/mozilla-central/source/services/sync/tps/extensions/tps/resource/modules/history.jsm#66
Attachment #8615751 - Flags: review?(mak77) → review+
Comment on attachment 8615751 [details] [diff] [review]
Sync more visits. v1

Approval Request Comment
[Feature/regressing bug #]:
  None.

[User impact if declined]:
  Top sites and frecency will be less useful.

[Describe test coverage new/current, TreeHerder]:
  None.

[Risks and why]: 
  Low risk: we will simply upload more visits for a user's most visited sites. This can dramatically improve frecency calculations and top sites on Android and iOS (and desktop, too!), particularly for new devices.

[String/UUID change made/needed]:
  None.
Attachment #8615751 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/2dacc2035aad
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment on attachment 8615751 [details] [diff] [review]
Sync more visits. v1

OK, should be safe and improve the user experience.
Attachment #8615751 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Richard Newman [:rnewman] from comment #4)
> frequent visits will be uploaded more). They'll grow by no more than about
> 450 bytes each, so the typical impact will be approximately 15KB per user.
> 
> Worst case -- 250 places with 20 visits each -- that's an extra 112KB.
> 
> CCing bobm and rfkelly FYSA re storage impact.

With the current user distribution, the worst case scenario would be 2% more storage per node.  That shouldn't be enough to cause any problems.  If the impact is mostly typical, I don't think we'll notice the change.
Attachment #8615751 - Flags: feedback?(bobm) → feedback+
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: