1418080 - Consider making "mozIAsyncFavicons.getFaviconURLForPage" and page-icon: redirect-aware

Reporter

Description

•

8 years ago

This is a feature request for this API, would like to hear your thoughts. Let's take "python.org" as an example, typing this in the awesome bar will generate 3 records in moz_places, "python.org/", "https://python.org/", and "https://www.python.org/", respectively. Two icons (a reguar one and a rich one) will be fetched and stored in the favicons database, the associated page_url is the final redirection "https://www.python.org/". However, query favicon with the original URL with getFaviconURLForPage won't work unless you pass it the exact URL "https://www.python.org/".

Ed Lee :Mardak

Comment 1

•

8 years ago

I was thinking a potential alternative is when we set the icon for a page, we could set the icon for the pages it redirected from. This trades off read time with write time. But then I wonder if this is generally desirable… E.g., a search result page could redirect to some other website, but that doesn't always mean the source and destination should have the same icon. In the particular case of activity stream top sites, we would be requesting favicons only for some small set of high frecency pages, so that would seem to favor not impacting write time in the common case. Although for activity stream's own performance, we would want reads to be fast too without potentially expensive redirect traversals…

Ed Lee :Mardak

Comment 2

•

8 years ago

Just playing around with queries… Start with the most recent visit to a url and follow any next visits that are redirects. Show the visit id and url of each visit along the way: WITH RECURSIVE path(visit) AS ( SELECT v.id FROM moz_places h JOIN moz_historyvisits v ON v.place_id = h.id WHERE h.url = "https://twitch.tv/" AND v.visit_date = h.last_visit_date UNION SELECT id FROM moz_historyvisits JOIN path ON visit = from_visit WHERE visit_type IN (5, 6) ) SELECT visit, ( SELECT ( SELECT url FROM moz_places WHERE id = place_id) FROM moz_historyvisits WHERE id = visit) FROM path; where pretend https://twitch.tv/ redirects to https://ed.twitch.tv/ to https://go.twitch.tv/ to https://www.twitch.tv/ 4|https://twitch.tv/ 5|https://ed.twitch.tv/ 6|https://go.twitch.tv/ 7|https://www.twitch.tv/

Ed Lee :Mardak

Comment 3

•

8 years ago

And that query's explain query plan: 2|0|1|SCAN TABLE moz_historyvisits AS v USING COVERING INDEX moz_historyvisits_placedateindex 2|1|0|SEARCH TABLE moz_places AS h USING INTEGER PRIMARY KEY (rowid=?) 3|0|0|SCAN TABLE moz_historyvisits 3|1|1|SCAN TABLE path 1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION) 0|0|0|SCAN SUBQUERY 1 0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 4 4|0|0|SEARCH TABLE moz_historyvisits USING INTEGER PRIMARY KEY (rowid=?) 4|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 5 5|0|0|SEARCH TABLE moz_places USING INTEGER PRIMARY KEY (rowid=?)

Nan Jiang [:nanj]

Reporter

Comment 4

•

8 years ago

:Mardak wow, that's beautiful. On my commute, I've figured out a similar query that only returns the final redirection. This should resolve the issue for activity stream for now, shall we wrap this into the places's API so that it could be used in other scenarios?

Nan Jiang [:nanj]

Reporter

Comment 5

•

8 years ago

:mak, I wonder If we could piggyback bug 1365913 to include the referrers as well as all the redirections, which could be done based on Mardak's query. WITH RECURSIVE path(visit) AS ( SELECT v.id FROM moz_places h JOIN moz_historyvisits v ON v.place_id = h.id WHERE h.url_hash = hash(:url) AND h.url = :url AND v.visit_date = h.last_visit_date UNION SELECT id FROM moz_historyvisits JOIN path ON visit = from_visit WHERE visit_type IN (${History.TRANSITIONS.REDIRECT_PERMANENT}, ${History.TRANSITIONS.REDIRECT_TEMPORARY}) ) SELECT visit, ( SELECT ( SELECT url FROM moz_places WHERE id = place_id) FROM moz_historyvisits WHERE id = visit) FROM path;

Nan Jiang [:nanj]

Reporter

Comment 6

•

8 years ago

I was also thinking if we could use those redirected URLs instead of the original ones when we generate the top sites for AS. That'll save us some trouble like the icon fetching. What do you think? :Mardak

Marco Bonardo [:mak]

Comment 7

•

8 years ago

(In reply to Ed Lee :Mardak from comment #1) > I was thinking a potential alternative is when we set the icon for a page, > we could set the icon for the pages it redirected from. This trades off read > time with write time. > > But then I wonder if this is generally desirable… E.g., a search result page > could redirect to some other website, but that doesn't always mean the > source and destination should have the same icon. Indeed, we do something like that for bookmarks and in some cases it ends up storing the wrong favicon. You could probably do that if both pages are from the same base host (ignoring sub domains). (In reply to Ed Lee :Mardak from comment #2) > Just playing around with queries… Start with the most recent visit to a url > and follow any next visits that are redirects. Show the visit id and url of > each visit along the way: > > WITH RECURSIVE path(visit) It doesn't seem particularly efficient :( In the past I thought of ways to more quickly fetch all the redirects chain, and I was evaluating to reuse the sessionid field in moz_historyvisits for that. Basically we'd store a monotonic integer there and every time a redirect chain starts we annotate all the visits in that chain with that session id. Though, that requires to fetch the largest integer before adding the first visit in the session, and when we are a redirect target, store the right session id from our source. That in the end may slow down visits insertions a little bit, but would allow very quick fetching of the redirects chain. There's still the problem of unrelated pages redirecting though, and we will never be 100% sure that even sub domains are strictly related. What we could potentially do, is ignore scheme and www. when storing icons and consider all of those pages the same. Similar to how I used fixed_icon_url_hash, we could replace page_url and page_url_hash with the "fixed" version. One downside is that the relation with moz_places will need some translation layer in the middle.

Priority: -- → P3

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Marco Bonardo [:mak]

Comment 8

•

2 years ago

This is still a valid problem, and lately I've heard various teams dealing with it, so we may want to do something sooner.

One example is having a page url of
https://www.yahoo.com/sports/jets-qb-aaron-rodgers-has-strong-words-for-insecure-sean-payton-over-nathaniel-hackett-comments-183609249.html
that redirects to
https://sports.yahoo.com/jets-qb-aaron-rodgers-has-strong-words-for-insecure-sean-payton-over-nathaniel-hackett-comments-183609249.html
We store a favicon for the latter, not for the former.
In this case even if we'd have a root icon as https://sports.yahoo.com/favicon.ico, it would not apply to our page-url that is in www.yahoo.com domain. We only ignore "www.", not other subdomains.

On the storing side, when we're not setting a root icon, we could probably walk up all redirect levels and set the icon on all the sources IFF the top level matches, and there's no root icon for the redirecting urls (so they'd be icon-less). This could not fix the www.yahoo.com => sports.yahoo.com root icon problem, cause we likely don't want to copy over a root icon of a subdomain to the main domain.

On the fetching side we could probably walk up 1 level and eventually accept imperfect root icons. But maybe the storing improvement will be good enough to not require this.

Priority: P3 → P2

Marco Bonardo [:mak]

Updated

•

2 years ago

Summary: Consider making "mozIAsyncFavicons.getFaviconURLForPage" redirect-aware → Consider making "mozIAsyncFavicons.getFaviconURLForPage" and page-icon: redirect-aware

Chris Bellini (:cbellini)

Updated

•

2 years ago

Whiteboard: [sng]

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SNG-860

Marco Bonardo [:mak]

Updated

•

2 years ago

Severity: S3 → N/A

Priority: P2 → P3

Bugzilla

Consider making "mozIAsyncFavicons.getFaviconURLForPage" and page-icon: redirect-aware

Categories

(Toolkit :: Places, enhancement, P3)

Tracking

()

People

(Reporter: nanj, Unassigned)

References

Details

(Whiteboard: [sng])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Updated

Updated

Updated

Updated