Closed Bug 990322 Opened 10 years ago Closed 10 years ago

[New Tab Page] Don't show multiple thumbnails from the same eTLD+1

Categories

(Firefox :: New Tab Page, defect)

defect
Not set
normal
Points:
5

Tracking

()

VERIFIED FIXED
Firefox 34
Iteration:
34.2
Tracking Status
firefox33 --- verified
firefox34 --- verified

People

(Reporter: markh, Assigned: dao)

References

Details

Attachments

(1 file, 2 obsolete files)

I'm starting to see the better thumbnails on release builds - yay :)  But anecdotally, I'm seeing user's thumbnails on about:newtab have many thumbnails from the same TLD.  Eg, one specific user has:

* 2 thumbnails from their ISP (member login page, plus account usage page)
* 2 thumbnails from google (one with search results, the other the search page)
* 4 youtube thumbnails, each for different videos (and for some reason, one with a thumbnail of a "502 Server Error")
* 1 facebook login page.

This user has many other very commonly-used sites which aren't displayed.
I wonder if it might be better to try and restrict too many thumbnails from the same TLD to be displayed by default.  I'm sure the devil is in the detail, and this might be judged INVALID, but figured it worth throwing out there.

CC and needinfo? boriss for a UX perspective.
Flags: needinfo?(jboriss)
Summary: [Page Thumbnails] Avoid using multiple thumbnails from the same TLD → [New Tab Page] Don't show multiple thumbnails from the same TLD
Yeah I've noticed this a lot too. My mom and I both have multiple gmail entries with slightly different URIs, for example. It would be nice to do something smarter.
Flags: firefox-backlog+
Summary: [New Tab Page] Don't show multiple thumbnails from the same TLD → [New Tab Page] Don't show multiple thumbnails from the same eTLD+1
(In reply to Mark Hammond [:markh] from comment #0)
> I'm starting to see the better thumbnails on release builds - yay :)  But
> anecdotally, I'm seeing user's thumbnails on about:newtab have many
> thumbnails from the same TLD.  Eg, one specific user has:
> 
> * 2 thumbnails from their ISP (member login page, plus account usage page)
> * 2 thumbnails from google (one with search results, the other the search
> page)
> * 4 youtube thumbnails, each for different videos (and for some reason, one
> with a thumbnail of a "502 Server Error")
> * 1 facebook login page.
> 
> This user has many other very commonly-used sites which aren't displayed.
> I wonder if it might be better to try and restrict too many thumbnails from
> the same TLD to be displayed by default.  I'm sure the devil is in the
> detail, and this might be judged INVALID, but figured it worth throwing out
> there.
> 
> CC and needinfo? boriss for a UX perspective.

Absolutely, this needs a fix.  There's at least three problems going on here:
1. Thumbnail duplicates
2. Thumbnails with logins (ugly rather than incorrect)
3. Thumbnails displaying errors (like 502)

We can probably fix #1 by avoiding displaying thumbnails from the same TLD.  

Logins needs a better fix.  I'm exploring some ways to do this now with an engineer, and can follow up on that.

For #3, can we detect errors separately from other pages?
Flags: needinfo?(jboriss)
Whiteboard: p=5
How do pinned sites interact with not having multiple thumbnails from the same eTLD+1? I would guess you could have multiple from the same eTLD+1 if the user somehow pinned multiple, but a pinned http://mozilla.com should cause a history http://mozilla.com/firefox to not show up even if the pinned tile is in position 9 and the history item would have been in position 1 (i.e., make sure pinned items show up).

The eTLD+1 de-duping functionality would work correctly in removing www. vs root vs m. or perhaps various *.wikipedia.org.

But de-duping at eTLD+1 means bugzilla.mozilla.org / developer.mozilla.org / intranet.mozilla.org would cancel the other out while www.mozilla.com / mail.mozilla.com / v.mozilla.com would cancel each other out.

Perhaps it would be safer to do things at eTLD+2 especially for some other sites like google (www, mail, maps, drive, plus) or wikia.
Actually.. if we end up at eTLD+2, in what situation would we actually not just make sure we have one page per full domain (all levels of subdomains)?
(In reply to Jennifer Morrow [:Boriss] (Firefox UX) from comment #2)
> For #3, can we detect errors separately from other pages?

Yes, actually right now we only store thumbnails for an errored page if that page doesn't already have a thumbnail.  A couple of related problems are (a) no thumbnail less than two days old is overwritten, even if that thumbnail is a bad thumbnail like an error page and we can take a better one, (b) once a page falls off of about:newtab, its thumbnail is deleted, so if it makes its way back to newtab but is now an error page, that error is captured.

Bug 952460 is about that.
Mass-move to Firefox::New Tab Page.

Filter on new-tab-page-component.
Component: Tabbed Browser → New Tab Page
Points: --- → 5
QA Whiteboard: [qa?]
Whiteboard: p=5
QA Whiteboard: [qa?] → [qa+]
Assignee: nobody → dao
Status: NEW → ASSIGNED
Iteration: --- → 34.2
QA Contact: cornel.ionce
Attached patch patch (obsolete) — Splinter Review
This appears to work except for this:

TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/newtab/browser_newtab_enhanced.js | history link now is enhanced - Got history, expected enhanced

TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/newtab/browser_newtab_enhanced.js | history link now has enhanced image - Didn't expect , but got it

Still trying to figure out what's going on there...
Attachment #8468991 - Flags: review?(edilee)
(In reply to Dão Gottwald [:dao] from comment #7)
> history link now is enhanced - Got history, expected enhanced
I'm thinking it's because the setLinks("1") call is creating example1.com instead of example.com, and the enhancement is happening only for "example.com". It should probably be setLinks("-1") with the new function.
Actually, my comment 8 might not be quite right as the purpose of this bug is to remove repeat entries from the same base domain. That means the directory tile entry example.com should cause the history tile example.com (created by setLinks("-1")) to be filtered out resulting in just one tile.

The test (enhanced = true) should then be updated to expect one organic tile if the history tile would have appeared after the organic tile. getData(0) -> organic/isnot. getData(1) -> null

Perhaps another test case added so that a history tile with frecency higher than 1000 would cause the organic tile to be filtered out. E.g., take the history entry that would have been filtered out in the previous test (enhanced = true) and bump up its frecency. getData(0) -> enhanced/isnot; getData(1) -> null
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #8468991 - Attachment is obsolete: true
Attachment #8468991 - Flags: review?(edilee)
Attachment #8469206 - Flags: review?(edilee)
Comment on attachment 8469206 [details] [diff] [review]
patch v2

>+++ b/browser/base/content/test/newtab/browser_newtab_enhanced.js
>@@ -16,40 +16,48 @@ function runTests() {
>   // Make the page have a directory link followed by a history link
>-  yield setLinks("1");
>+  yield setLinks("-1");
> 
>   // Test with enhanced = false
>   NewTabUtils.allPages.enhanced = false;
>   yield addNewTabPageTab();
>   let {type, enhanced} = getData(0);
>   is(type, "organic", "directory link is organic");
>   isnot(enhanced, "", "directory link has enhanced image");
> 
>-  let {type, enhanced} = getData(1);
>-  is(type, "history", "history link is history");
>-  is(enhanced, "", "history link has no enhanced image");
>+  is(getData(1), null, "history link pushed out by directory link");

Thanks for adding the test for enhanced = true by pinning the history tile. This enhanced = false case loses its negative test of making sure history tiles aren't enhanced when enhanced = false. So it should do the same thing you did below it except expect the original conditions from above:

  // Test with a pinned link enhanced = false
  setPinnedLinks("-1");
  yield addNewTabPageTab();
  let {type, enhanced} = getData(0);
  is(type, "history", "pinned history link is history");
  is(enhanced, "", "pinned history link has no enhanced image");

Not sure if I can give official r+.
Attachment #8469206 - Flags: review?(edilee)
Attachment #8469206 - Flags: review?(adw)
Attachment #8469206 - Flags: feedback+
Blocks: 1045760
Comment on attachment 8469206 [details] [diff] [review]
patch v2

Review of attachment 8469206 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/modules/NewTabUtils.jsm
@@ +833,5 @@
> +      let uri = Services.io.newURI(url, null, null);
> +      try {
> +        return Services.eTLD.getBaseDomain(uri);
> +      } catch (e) {
> +        return uri.asciiHost;

There are a couple of things here I don't agree with.  First, we should try-catch the newURI() in case a provider includes a bad URI and return `url` in that case.  Second, why asciiHost?  That means that newtab will think that http://localhost and file:///foo are the same link, since eTLD.getBaseDomain throws for both of them due to not enough domain levels.  I think we should also return `url` when getBaseDomain throws.

Let me know if that's OK or if I'm wrong, and then I'll r+ accordingly.

@@ +837,5 @@
> +        return uri.asciiHost;
> +      }
> +    }
> +
> +    let baseDomains = new Set;

Nit: Set() to match the style here
(In reply to Drew Willcoxon :adw from comment #12)
> > +      let uri = Services.io.newURI(url, null, null);
> > +      try {
> > +        return Services.eTLD.getBaseDomain(uri);
> > +      } catch (e) {
> > +        return uri.asciiHost;
> 
> There are a couple of things here I don't agree with.  First, we should
> try-catch the newURI() in case a provider includes a bad URI and return
> `url` in that case.

I didn't think URLs could be invalid here. I don't know where exactly this should happen, but I think it would make more sense to completely reject such entries, as users couldn't click those thumbnails anyway.

> Second, why asciiHost?  That means that newtab will
> think that http://localhost and file:///foo are the same link, since
> eTLD.getBaseDomain throws for both of them due to not enough domain levels. 
> I think we should also return `url` when getBaseDomain throws.

For http://localhost, asciiHost is "localhost". I don't think we want the URL there as this would basically leave this bug unfixed for such domains.
Comment on attachment 8469206 [details] [diff] [review]
patch v2

Review of attachment 8469206 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Dão Gottwald [:dao] from comment #13)
> I didn't think URLs could be invalid here. I don't know where exactly this
> should happen, but I think it would make more sense to completely reject
> such entries, as users couldn't click those thumbnails anyway.

They shouldn't be invalid, but we might have a bug, there might be an add-on that implements a provider that produces a bad URI, we might have a typo in directory tiles JSON on the server, etc.  NewTabUtils doesn't do any URI validation on the links produced by providers.

Ignoring bad URIs rather than using `url` is a good idea.  I think you can just do it in the filter() callback you're changing here.  e.g., if getBaseDomain throws because newURI threw, then reject the link.  So, r+ with something like that.

> For http://localhost, asciiHost is "localhost".

You're right, I conflated it with file:///foo.  Are file URLs the only kind of URL that getBaseDomain throws for but whose asciiHost is empty?  I can imagine people who have lots of them on newtab for who knows what reason coming out of the woodwork...
Attachment #8469206 - Flags: review?(adw) → review+
(In reply to Drew Willcoxon :adw from comment #14)
> You're right, I conflated it with file:///foo.  Are file URLs the only kind
> of URL that getBaseDomain throws for but whose asciiHost is empty?

I think so.

> I can
> imagine people who have lots of them on newtab for who knows what reason
> coming out of the woodwork...

Technically they're all under the same (empty) domain... file:///random_path_1 and file:///random_path_2 aren't much different from http://somedomain/random_path_1 and http://somedomain/random_path_2.
and for browser_newtab_bug991111.js:
https://hg.mozilla.org/integration/fx-team/rev/333bca3fb89e
> Backed out for frequent failures in browser_thumbnails_privacy.js on OSX
> 10.8.
> https://hg.mozilla.org/integration/fx-team/rev/c871b95042c6
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=45578248&tree=Fx-Team

If somebody wants to investigate this, feel free to. Otherwise I'll need to do that in the next iteration as I'm on vacation now.
I'm not familiar with this test but after a quick glance, I would guess it's related to how

addVisitsAndRepopulateNewTabLinks adds a bunch of pages from const URL = "://example.com/browser/toolkit/components/thumbnails/test/privacy_cache_control.sjs"

So only one page ends up getting its thumbnail fetched ? Although not sure why the test is intermittent then.
Attached patch fixed testsSplinter Review
Landed with browser_thumbnails_privacy.js fixed. (Adds the visits for example.com [for both positive and negative checks] just before opening the page instead of all at once.)

https://hg.mozilla.org/integration/fx-team/rev/de9d70a14a16
Attachment #8469206 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/de9d70a14a16
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 34
Blocks: 1010628
Blocks: 1030832
Confirming the fix on Ubuntu 14.04 32bit, Windows 7 64bit and Mac OS X 10.9 using latest Nightly, build ID: 20140818030205.
There are no multiple thumbnails from the same TLD, a single one is displayed.
Status: RESOLVED → VERIFIED
QA Whiteboard: [qa+] → [qa!]
Depends on: 1055825
Blocks: 1057602
Uplift has been managed in bug 1057602
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:33.0) Gecko/20100101 Firefox/33.0
Mozilla/5.0 (X11; Linux i686; rv:33.0) Gecko/20100101 Firefox/33.0

Also confirming for latest Aurora, build ID: 20140828004002.
QA Whiteboard: [qa!]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: