Have the favicon URL guessing algorithm try stripping common subdomains.

Assigned to



4 years ago
4 years ago


(Reporter: ckitching, Assigned: ckitching)


Firefox Tracking Flags

(Not tracked)



(1 obsolete attachment)



4 years ago
It seems like some mobile sites (notably imgur.com) redirect us to an m. or mobile. subdomain, and don't provide a <link> tag with a favicon.
Their favicon is at imgur.com/favicon.ico, and they seem to expect us to guess it (currently we guess m.imgur.com and fail).

So, I suggest the behaviour is to try a guess using the subdomain we've been given first, and when that goes wrong try stripping the subdomains, before finally giving up.

Comment 1

4 years ago
Created attachment 8485488 [details] [diff] [review]
Try stripping subdomains when downloading favicons
Attachment #8485488 - Flags: review?(rnewman)

Comment 2

4 years ago
Comment on attachment 8485488 [details] [diff] [review]
Try stripping subdomains when downloading favicons

Actually, it's probably better to do a more complete rejig to do this more neatly.

Now, we have four URLs to consider:

* The given URL.
* The default URL.
* Either of the above with common subdomain stripping applied.

Of which we always have at least two (in the event of no explicit favicon URL we can guess the latter two, which may be the same, depending on the subdomains in use by the page)

How far do we want to go attempting to guess a good URL?

For example, imgur.com doesn't give us a link tag. Their favicon is at imgur.com/favicon.ico.
However, when we browse to them we end up at m.imgur.com and miss, unless we strip the subdomain.
But, when sharing a link from reddit we very often get an i.imgur.com URL, which has the same problem but doesn't count as a common subdomain to strip - perhaps we want to try stripping all subdomains?

If we go ahead, I imagine a "tryLoadFavicon(String URL)" method that tries to load the given favicon URL. Try database, then jar, then (if the "onlyLocal" flag has no been set) network.
Then, we'd rejig "doInBackground" to repeatedly call tryLoadFavicon with some combination of the above URLs until something works, or give up.

This wouldn't really be complicated, it'd just involve moving a lot of stuff inside LoadFaviconTask about... But as it stands we don't get any favicon from imgur which is mildly annoying.

Another example of brokenness that I'm not sure how to fix is reddit.com. They give us a <link> tag pointing to http://www.redditstatic.com/icon.png, which works nicely. There also exists an icon at reddit.com/favicon.ico.
So, we point Fennec at reddit.com. Our favicon url is http://www.redditstatic.com/icon.png, and we store that.
Then we share a Reddit self-post link to the overlays... and miss. The share link was reddit.com/something, we guessed reddit.com/favicon.ico, but unfortunately our stored favicon was http://www.redditstatic.com/icon.png, so we manage to fail to get the one in the database.

It's not clear what the best policy there is.

Lastly: Do we want to consider fetching favicons from the internet in the overlay? I guess that's an argument we're in no hurry to sort out: you just need to stop passing the localOnly flag to LoadFaviconTask to turn that on...

Attachment #8485488 - Attachment is obsolete: true
Attachment #8485488 - Flags: review?(rnewman)
I think it's worth being explicit about different use cases.

In one situation we're loading a page, and thus we have canonical content to rely on. In this case we can speculatively try to look up an icon for the URL. If that fails, we can wait for the page. If that doesn't tell us the icon, we might or might not consider falling back to various heuristics, but that can be fairly simple.

In another we want to show an icon, but we're not loading a page. In this case we can do more crazy things: try to find a page in history with the "greatest common subdomain", and use its icon.

We probably shouldn't put these two paths in the same chunk of code: I don't think doing this via multiple tryLoadFavicon is likely to result in the best experience. We probably want to do multiple cache checks (cheap), then a lookup in the DB that's more sophisticated than just ===.

Similarly, I don't think we should attempt to fetch icons from the internet using the same mechanism as for page loads, and certainly not short-term. Not enough value to add for the complexity.
You need to log in before you can comment on or make changes to this bug.