I have a page that I viewed once on my device. It 404ed on my second attempt to load it. It's now my number one top site. Why? I'm going to look into this, but might bump this to someone with more time.
Questions: Why does this get a frecency of 28? 6540 https://smile.amazon.com/gp/dmusic/cloudplayer/player?ie=UTF8&ref_=m_ty_cp#stations Amazon Mus SaMyhhLWp6Mh 350 smile.amazon.com 1.43942622691112e+15 1.4375357522649 60 231 28.7456711035592 Why does this row have only one local visit, when there are dozens in the DB that apparently contribute to the frecency of 542? 8214 http://r.duckduckgo.com/l/?kh=-1&uddg=http%3A%2F%2Fwww.amazon.com%2Fb%3Fie%3DUTF8%26node%3D8335758011 JUAkDkBsn2KU 324 r.duckduckgo.com 1.43915351772058e+15 0 1 0 542.315427090679
The 'fix' for this is to compute frecency for each individual site (or even visit), then aggregate. Desktop does it on a per-visit basis. In order to be performant we need an additional filter on recency. An updated query looks something like: SELECT historyID, url, title, guid, domain_id, domain, localVisitDate, remoteVisitDate, localVisitCount, remoteVisitCount, iconID, iconURL, iconDate, iconType, iconWidth FROM ( SELECT historyID, url, title, guid, domain_id, domain, max(localVisitDate) AS localVisitDate, max(remoteVisitDate) AS remoteVisitDate, sum(localVisitCount) AS localVisitCount, sum(remoteVisitCount) AS remoteVisitCount, sum(frecency) AS frecencies FROM ( SELECT *, (localVisitCount * (5 + localVisitCount)) * max(1, 100 * 225 / (((1439576194960535 - (localVisitDate)) / 86400000000.0) * ((1439576194960535 - (localVisitDate)) / 86400000000.0) + 225)) + remoteVisitCount * max(1, 100 * 225 / (((1439576194960600 - (remoteVisitDate)) / 86400000000.0) * ((1439576194960600 - (remoteVisitDate)) / 86400000000.0) + 225)) AS frecency FROM ( SELECT history.id AS historyID, history.url AS url, title, guid, domain_id, domain, COALESCE(max(case visits.is_local when 1 then visits.date else 0 end), 0) AS localVisitDate, COALESCE(max(case visits.is_local when 0 then visits.date else 0 end), 0) AS remoteVisitDate, COALESCE(sum(visits.is_local), 0) AS localVisitCount, COALESCE(sum(case visits.is_local when 1 then 0 else 1 end), 0) AS remoteVisitCount FROM history INNER JOIN domains ON domains.id = history.domain_id INNER JOIN visits ON visits.siteID = history.id WHERE (history.is_deleted = 0) AND (domains.showOnTopSites IS 1) GROUP BY historyID) WHERE ((localVisitCount + remoteVisitCount) > 0) AND max(localVisitDate, remoteVisitDate) > 143950424963682 ) GROUP BY domain_id ORDER BY frecencies DESC LIMIT 15 ) LEFT OUTER JOIN view_history_id_favicon ON historyID = view_history_id_favicon.id;
I have a slightly tweaked query that I'll put into code this afternoon. This needs one other change, which is to load the domain instead of the topmost URL -- the URL is likely to be some particular search engine redirect, rather than the engine itself. We should do this by calling .baseDomain on the returned URL; not only is this simpler, but it'll preserve the protocol. I also think that before Bug 1194852 is addressed, we should add (domains.domain NOT LIKE 'r.%') to the frecency query. That kicks r.search.yahoo.com and r.duckduckgo.com out of my top sites, which is a big help. Karen, lemme know if you want me to simulate this against your browser DB, or if you just want it to hit Nightly so you can try it there.
Created attachment 8648238 [details] justdomain.sql This is a domains-only version of the query. You can run it against your own DB to see what happens.
Comment on attachment 8648238 [details] justdomain.sql I ran the query on my DB and I find it gives better results. It definitely seems more correct and seems to put more importance on number of times I've visited a domain since I'm seeing some more frequently visited sites move up.
Attachment #8648238 - Flags: feedback?(sleroux) → feedback+
N.B. to self and future reviewers: pay close attention to not b0rking history substring search, which uses part of the same query mechanism.
I've been playing around on nightly all day yesterday and still can't get it to populate with relevant mobile top sites. In fact, Top Sites isn't updating at all. I have hit Facebook about 10 times in a very short period of time and it's not showing up anywhere on the Top Sites grid. Although it has keyed in that I visit the bbc, it's only showing up the 'most read' page and not the main 'news' page which is where I start off. And after visiting the telegraph, doing several other mobile searches / browsing, the Top Sites grid hasn't changed at all. It's stuffed with some desktop sites that aren't used frequently but a smattering of open tabs and something I visited once. The good news is that the Firefox watermark is showing up in thumbnails instead of blank.
Created attachment 8649509 [details] [review] Pull req. Read these commits in order. I split apart the shared history query function, tidied it up for history, and then revised it to reflect the new query. This slightly regresses strongly-recent top sites data, but doesn't seem to be noticeable for real data. History panel is now faster, which is nice.
Attachment #8649509 - Flags: review?(sleroux)
Comment on attachment 8649509 [details] [review] Pull req. Tested on both iPhone and iPad (devices and sims) and everything matches. Wicked fast, too!
Comment on attachment 8649509 [details] [review] Pull req. Code looks good from what I can see
Attachment #8649509 - Flags: review?(sleroux) → review+
Steph landed this.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
On nightly now, and it's being better, although the facebook thumbnail isn't resolving (even after tapping it twice now) BUT at least it's in the top row!
The FB favicon will continue to be a problem because it's desktop's URL that wins its group. So the favicon guesser gets it wrong, and we never load www.facebook.com, so we never get an icon. Changing that involves making favicon lookup domain-aware and/or biasing the internal ordering towards mobile sites. Favicons: still the hardest problem in browsing.
FWIW, I don't use desktop [remote] to go to Facebook except once in a blue moon - it's 95% done on my mobile [local]....
Noting for the historical record: we can only distinguish between 'local' and 'remote' -- local being your iOS device, and remote being _typically_ a desktop or laptop. But in Karen's case, remote also includes a bunch of phones and tablets. Their 100+ visits contribute no more weightily to m.facebook.com than her 500 recent desktop visits do to www.facebook.com, so www wins in the pool. It's easy to see three tiers of behavior here: 1. On-device. I want my browsing on this device to be most accessible. 2. Same-category. My browsing on mobile devices is similar, and not the same as my desktop browsing. (This is also the split for m./www.) 3. Everything. Finally, I want to revisit sites that I visited on a desktop. We're unable to support #2 without reworking Sync and Places. I filed Bug 1196243 to figure out how to get an icon for this kind of situation.
You need to log in before you can comment on or make changes to this bug.