Closed Bug 1447891 Opened 6 years ago Closed 6 years ago

Top Sites keep changing dramatically

Categories

(Firefox :: New Tab Page, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Firefox 61
Iteration:
61.4 - May 7
Tracking Status
firefox60 --- wontfix
firefox61 --- fixed

People

(Reporter: mossop, Assigned: Mardak)

References

Details

Attachments

(1 file)

It seems like over short periods of time there can be sudden dramatic shifts in the ordering of Top Sites that appear to not be related to the pages I visit.

All of a sudden a site that was previously in my Top Sites (1 row viewed normally) will be relegated to the second row for no apparent reason.

I'd guess that the vast majority of my page loads are my pinned tabs, twitter, facebook and reddit. I don't particularly notice how my pinned tabs move around in the Top Sites but the other three I sure do, particularly when they fall out of the first row.

For most of today reddit was in the 1st row. Somewhere in the middle if I remember right. Suddenly this evening it isn't. It's now the 4th item in the 2nd row. Above it are:

Wikipedia: 1 visit today
Home depot: Last visited three days ago
Phonebook: Last visited yesterday

Why would a site I visited earlier today suddenly drop below a site I haven't visited for three days?
What's the magnitude of your frecencies?

NewTabUtils.activityStreamLinks.getTopSites().then(v => console.log(v.map(s => [s.frecency, s.url]).join("\n")))

My history has the first few in the millions / 7 digits, then 5 more with 6 digits then last few with 5 digits. I haven't really seen my top sites change for a long time.

I wonder if synced data is causing some frecency recalculations?
This is what I get:

455000,https://irccloud.mozilla.com/#!/ircs://irc1.dmz.scl3.mozilla.com:6697/%23developers
192560,https://www.facebook.com/
178640,https://www.reddit.com/
176500,https://twitter.com/
155610,https://www.amazon.com/
128300,https://mozilla.slack.com/messages/C4D3JFF26/
76963,https://mozilla.github.io/mozpdx-lunch/
73620,https://www.washingtonpost.com/
72072,https://www.wikipedia.org/
66721,https://mail.mozilla.org/admindb/firefox-dev
66464,https://phonebook.mozilla.org/
65879,https://www.homedepot.com/

reddit wasn't showing in the first row after I logged that, then I went to settings and turned on the second row, came back, and then reddit was correctly showing in the 3rd spot in the 1st row.
I do sync with my mobile device, none of the sites that show up in the top sites on my android phone are listed in here.
Hrmm... I could see reddit/twitter swapping 3rd and 4th places, but having it disappear from the first row definitely seems odd.

Do you have pinned top sites -- in particular reddit? NewTabUtils.pinnedLinks.links.map((l, i) => [i, l && l.url]).join("\n")

At first I was thinking it could be related to bug 1422867 comment 3 where old Tiles pinned links could be pinned to a position out of view, but this seems to be the opposite…?

r1cky, andreio any ideas?
No pinned top sites
(In reply to Dave Townsend [:mossop] from comment #6)
> 516780,https://twitter.com/
> 179000,https://twitter.com/
> I'm surprised that the frecency can change so quickly.
Oh ho ho. Very suspicious. mak, ideas?

Maybe it is sync related. I believe because not all history is synced, some subset used to estimate the frecency value.
Flags: needinfo?(mak77)
I'm not sure honestly, I don't expect big jumps as well, and I can't reproduce something similar. Unfortunately we don't have good logging of Places actions, apart from MOZ_LOG storage, that produces huge logs. We should really have a circular buffer log that one can enable through prefs in Places.

The only things that come to my mind are:
1. expiration
   should remove very old pages, and frecency calculation only uses the last 10 visits, so it should not cause big changes, unless the database has tens of thousands of bookmark data that pushes away very recent history data
2. Sync
   Can surely cause a recalculation and if added visits are within the last 10 it can pump up or down frecency. Like, if most of the locally stored visits are "typed" and Sync adds recent "link" visits, the value is likely to go down much. Opposite, if most of the local visits are "link" and Sync adds recent "typed" visits, the value is likely to go up much.

Do you have twitter in a pinned tab, or restored through session restore? IIRC we still don't store any visit if a page is restored, and then if you visit that page on mobile you'd likely get a remote visit quite newer than a local one. This may explain a grow, but I'm not sure how it would explain a sudden fall.

Sync has decent logging, so likely enabling that could be a good first step to see if there's a relation. Ask Thom or Kit for that.

Another interesting thing would be to check coherence of the reddit entry in the DB, you can use this snippet:

(async function() {
  let db = await PlacesUtils.promiseDBConnection();
  let rows = await db.execute(`
    SELECT h.visit_count, count(*),
           h.last_visit_date, MAX(v.visit_date),
           h.frecency, CALCULATE_FRECENCY(h.id)
    FROM moz_places h
    JOIN moz_historyvisits v ON v.place_id = h.id
    WHERE h.url_hash = hash(:url) AND h.url = :url
    GROUP BY h.id
  `, {url: "https://www.reddit.com/"});
  console.log(`count: ${rows[0].getResultByIndex(0)} == ${rows[0].getResultByIndex(1)}\n
               date: ${rows[0].getResultByIndex(2)} == ${rows[0].getResultByIndex(3)}\n
               frec: ${rows[0].getResultByIndex(4)} == ${rows[0].getResultByIndex(5)}`);
})();
Flags: needinfo?(mak77)
So, I'm looking at my db, I have various pages that apparently have very different frecency values from the ones that would normally be calculated:
https://www.reddit.com/r/Amd/new/ count: 637767 == 1195236
https://www.amazon.it/ count: 457758 == 1195236
https://www.facebook.com/ count: 293961 == 1195236
https://www.reddit.com/r/firefox/ count: 228919 == 1195236
off-hand looks like there's some common operation that can have a large impact on frecency but doesn't seem to care to update frecency. When something else runs and actually updates frecency, we move to the right value, that could be quite different.
Now, the hunting is open.
fwiw, expiration doesn't recalculate frecency, and the current frecency algo seems to depend a lot on visit_count.
A possible theory is that Sync adds a bunch of visits and increases frecency, then expiration kicks in because the db is at its maximum and removes some old visits, but doesn't recalculate frecency (it's expensive in general). When some other operation happens that causes a recalculate, the value falls to the right one.
(In reply to Marco Bonardo [::mak] from comment #8)
> Do you have twitter in a pinned tab, or restored through session restore?
> IIRC we still don't store any visit if a page is restored, and then if you
> visit that page on mobile you'd likely get a remote visit quite newer than a
> local one. This may explain a grow, but I'm not sure how it would explain a
> sudden fall.

I don't pin twitter, just visit it a lot. I guess sometimes when I restart for app updates twitter is open so would get restored then.

> Another interesting thing would be to check coherence of the reddit entry in
> the DB, you can use this snippet:
> 
> (async function() {
>   let db = await PlacesUtils.promiseDBConnection();
>   let rows = await db.execute(`
>     SELECT h.visit_count, count(*),
>            h.last_visit_date, MAX(v.visit_date),
>            h.frecency, CALCULATE_FRECENCY(h.id)
>     FROM moz_places h
>     JOIN moz_historyvisits v ON v.place_id = h.id
>     WHERE h.url_hash = hash(:url) AND h.url = :url
>     GROUP BY h.id
>   `, {url: "https://www.reddit.com/"});
>   console.log(`count: ${rows[0].getResultByIndex(0)} ==
> ${rows[0].getResultByIndex(1)}\n
>                date: ${rows[0].getResultByIndex(2)} ==
> ${rows[0].getResultByIndex(3)}\n
>                frec: ${rows[0].getResultByIndex(4)} ==
> ${rows[0].getResultByIndex(5)}`);
> })();

count: 620 == 620
date: 1521761409575878 == 1521761409575878
frec: 179800 == 179800

(In reply to Marco Bonardo [::mak] from comment #10)
> fwiw, expiration doesn't recalculate frecency, and the current frecency algo
> seems to depend a lot on visit_count.
> A possible theory is that Sync adds a bunch of visits and increases
> frecency, then expiration kicks in because the db is at its maximum and
> removes some old visits, but doesn't recalculate frecency (it's expensive in
> general). When some other operation happens that causes a recalculate, the
> value falls to the right one.

Not sure if it is useful data or not, but I basically never visit reddit or twitter on my phone. The top sites on my phone are a completely different set to those on desktop in that I basically never visit the top sites I have on my desktop on my phone and vice versa.
I noticed Twitter had jumped up to the top again, even though I haven't visited it today. Ran the DB query for it and got this:

count: 1805 == 1812
date: 1521786905015043 == 1521786905015043
frec: 1209350 == 1209350
well, visit_count can be a little bit smaller than the real visit count, because we exclude a few types, unfrtonately that doesn't help.
The only clear thing so far is that frecency value can become stale and be updated after a while.

Btw, even if you don't visit twitter on your phone, you may have multiple profiles/computers/mac on the same firefox account?
(In reply to Marco Bonardo [::mak] from comment #13)
> Btw, even if you don't visit twitter on your phone, you may have multiple
> profiles/computers/mac on the same firefox account?

The only things I have signed into my Firefox account right now are my phone and desktop, one profile on each.
Ran into a case where reddit dropped out of tip sites again:

477500,https://irccloud.mozilla.com/#!/ircs://irc1.dmz.scl3.mozilla.com:6697/%23developers
464310,https://www.facebook.com/
189100,https://twitter.com/
133200,https://mozilla.slack.com/messages/C4D3JFF26/
110210,https://www.amazon.com/
85647,https://www.wikipedia.org/
69985,https://www.washingtonpost.com/
68281,https://phonebook.mozilla.org/
63400,https://www.reddit.com/
62370,https://treeherder.mozilla.org/
61845,https://mozilla.github.io/mozpdx-lunch/
61060,https://www.homedepot.com/

count: 634 == 634
date: 1522093826365264 == 1522093826365264
frec: 63400 == 63400

I ran multiple syncs on my phone and desktop and it didn't change reddit's position.
I visited reddit once and the numbers changed a bunch:

477700,https://irccloud.mozilla.com/#!/ircs://irc1.dmz.scl3.mozilla.com:6697/%23developers
333600,https://www.facebook.com/
189400,https://twitter.com/
184440,https://www.reddit.com/
133200,https://mozilla.slack.com/messages/C4D3JFF26/
110210,https://www.amazon.com/
85647,https://www.wikipedia.org/
69985,https://www.washingtonpost.com/
68281,https://phonebook.mozilla.org/
62370,https://treeherder.mozilla.org/
61845,https://mozilla.github.io/mozpdx-lunch/
61060,https://www.homedepot.com/

count: 636 == 636
date: 1522096772053442 == 1522096772053442
frec: 184440 == 184440

BUT, reddit still isn't in the top row :s
(In reply to Dave Townsend [:mossop] from comment #16)
> BUT, reddit still isn't in the top row :s

It just appeared!
The delay in reddit moving back to position 4 is expected due to activity stream's caching.

The wildly different calculated frecency seems likely dependent on the type of visits. Here's a script that should print out the most recent 100 visit types:

PlacesUtils.promiseDBConnection().then(db => db.execute("SELECT GROUP_CONCAT(visit_type) FROM moz_places h JOIN moz_historyvisits v ON v.place_id = h.id WHERE url = 'https://www.reddit.com/'")).then(v => console.log(v[0].getResultByIndex(0).split(",").slice(-100).join(" ")))

If you typically type in the url / select from the address bar, most likely they'll be "2"s:
https://searchfox.org/mozilla-central/rev/003262ae12ce937950ffb8d3b0fa520d1cc38bff/toolkit/components/places/nsINavHistoryService.idl#1196-1248
(In reply to Dave Townsend [:mossop] from comment #15)
> count: 634 == 634
> frec: 63400 == 63400

(In reply to Dave Townsend [:mossop] from comment #16)
> count: 636 == 636
> frec: 184440 == 184440

The former seems to be visit count * 100 while the latter is visit count * 290… Where I believe frecency looks at each visit's type. So unless all visits to the page were link clicks, the former's * 100 seems to be wrong.
> 2 1 1 1 1 1 2 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 1 2 1 2 1 2 1 2 1 1 1 1 2 2 1 2 2 1 2 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1

I normally visit by clicking the top sites, I assume that just counts as a link click. Is it possible that doing that slowly pushes off url bar visits causing it to suddenly drop and then require a url bar visit to go there causing it to jump up again?
(In reply to Dave Townsend [:mossop] from comment #20)
> 2 1 1 1 1 1 1 1 1 1 1 1 2 1
> I normally visit by clicking the top sites, I assume that just counts as a
> link click.
Oh well would you look at that ;)

https://searchfox.org/mozilla-central/rev/49cc27555d5b7ed2eb06baf156693dd578daa06f/toolkit/components/places/nsNavHistory.cpp#283
  , mNumVisitsForFrecency(10)

https://searchfox.org/mozilla-central/rev/49cc27555d5b7ed2eb06baf156693dd578daa06f/toolkit/components/places/SQLFunctions.cpp#638-643

Looks like it grabs the most recent 10 visits to determine the average visit type. And between your two most recent "2" visit types, there were 11 link click visits!

Should the sampling range be increased? Should activity stream treat top site clicks as TYPED… or some other transition type?
Flags: needinfo?(mak77)
Oh, for anyone wanting to double check the math :p

9 * 100 (regular link visits) + 1 * 2000 (typed bonus) -> 290 average bonus matching up with comment 16's frecency score.
From the activity stream side, I suppose we would need to send a message to main to call something like markPageAsTyped:

https://searchfox.org/mozilla-central/rev/49cc27555d5b7ed2eb06baf156693dd578daa06f/browser/components/places/PlacesUIUtils.jsm#419-428

And then navigate instead of allowing it to do a normal link click navigation. (Or maybe if we allow navigation and send a message and hopefully it sets the "next" type sooner than the actual visit…)
(In reply to Ed Lee :Mardak from comment #23)
> From the activity stream side, I suppose we would need to send a message to
> main to call something like markPageAsTyped:
> And then navigate instead of allowing it to do a normal link click
> navigation. (Or maybe if we allow navigation and send a message and
> hopefully it sets the "next" type sooner than the actual visit…)

In years typed grew a very relaxed definition. It pretty much means the user either: typed the url, clicked from a Places view (like history menu or sidebar), paste&go.
Additionally we have bug 1330343, that pretty much causes us to mark as typed anything from the urlbar.
At this point I'd say "typed" pretty much means "revisited from the UI", while "link" is just a link click from a page. As such it makes sense for Activity Stream to use it, but it should do something similar to _openNodeIn, where it sets the visit as "typed" or "bookmark" appropriately.
markPageAs must be invoked before history actually adds the visit, it's a synchronous call that just adds to a memory hash table.
Flags: needinfo?(mak77)
(In reply to Ed Lee :Mardak from comment #21)
> Should the sampling range be increased?

Probably not. Not because it wouldn't be good, but because frecency is expensive enough as-is, we can't increase its costs.
In the past Jesse proposed a different implementation of a monotonic frecency (https://wiki.mozilla.org/User:Jesse/NewFrecency), that would be much cheaper, but it requires analysis and on-the-field comparison with the current frecency to understand if it fits our necessities. We never found the time/resources for that, but the frecency costs are an evergreeen problem.
Iteration: --- → 61.4 - May 7
Priority: -- → P2
Is there any hope to have AS use markAsTyped in 61? Some users seem to notice frecency becoming less useful recently, and this sounds like part of the problem.
The intention is to get it in during 61.4 iteration focusing on more direct-UI changes in earlier iterations.
Assignee: nobody → edilee
Commit pushed to master at https://github.com/mozilla/activity-stream

https://github.com/mozilla/activity-stream/commit/d43f1404071158033527d080359691f8784c01b4
fix(topsites): Give all top site link clicks a typed frecency bonus (#4119)

Fix Bug 1447891 - Top Sites keep changing dramatically
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Blocks: 1457192
Component: Activity Streams: Newtab → New Tab Page
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: