Filter for specific suggested tiles adgroups/buckets/frecent_sites lists with display name

RESOLVED FIXED in Firefox 38

Status

()

Firefox
New Tab Page
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Mardak, Assigned: Mardak)

Tracking

(Blocks: 1 bug)

Trunk
Firefox 40
Points:
5
Dependency tree / graph
Bug Flags:
firefox-backlog +
in-testsuite +
qe-verify -

Firefox Tracking Flags

(firefox38+ fixed, firefox38.0.5 fixed, firefox39+ fixed, firefox40 fixed)

Details

(Whiteboard: .001)

Attachments

(5 attachments, 3 obsolete attachments)

(Assignee)

Description

3 years ago
[Tracking Requested - why for this release]: Suggested Tiles bug 1120311 is in Beta 38, and how the code is right now can allow the server to control more than desired.

dougt, is an acceptable approach for 38 to only allow predefined lists of sites?
Flags: needinfo?(dougt)
(Assignee)

Comment 1

3 years ago
Created attachment 8589427 [details] [diff] [review]
v1

Check against a Set of comma-separated sites that are allowed.
Attachment #8589427 - Flags: review?(adw)
Attachment #8589427 - Flags: feedback?(msamuel)
Attachment #8589427 - Flags: feedback?(dougt)
(Assignee)

Comment 2

3 years ago
Comment on attachment 8589427 [details] [diff] [review]
v1

>+// Only allow explicitly approved frecent sites
>+const ALLOWED_FRECENT_SITES = new Set([
>+  // Mozilla
>+  // Android
>+  // Web developer
To be clear, I'll need to update this list. jterry has WIP media plans:

https://docs.google.com/a/mozilla.com/spreadsheets/d/1FiGT1Fb2w2147q2hTzw36KbPw1ikZGCL1MPWTSpA2cU/edit?usp=sharing

So far the media plans include:
- Mozilla brand
- MDN: web developers, web learners
- Fennec: android, ios, carriers
- Platform: html5 gaming, game distribution platforms
- Hello: communication tech
- Reading List: online readers

We'll probably also have some IAB categories to compare how they perform (or not) compared to our custom.
Comment on attachment 8589427 [details] [diff] [review]
v1

Review of attachment 8589427 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/modules/DirectoryLinksProvider.jsm
@@ +56,5 @@
> +  "addons.mozilla.org,air.mozilla.org,blog.mozilla.org,bugzilla.mozilla.org,developer.mozilla.org,etherpad.mozilla.org,hacks.mozilla.org,hg.mozilla.org,mozilla.org,planet.mozilla.org,quality.mozilla.org,support.mozilla.org,treeherder.mozilla.org,wiki.mozilla.org",
> +  // Android
> +  "android.com,androidandme.com,androidauthority.com,androidcentral.com,androidcommunity.com,androidforums.com,androidguys.com,androidpit.com,androidpolice.com,androinica.com,droid-life.com,droidforums.net,forum.xda-developers.com,phandroid.com,play.google.com,talkandroid.com",
> +  // Web developer
> +  "alistapart.com,caniuse.com,codepen.io,css-tricks.com,css3generator.com,cssdeck.com,csswizardry.com,html5demos.com,html5demos.com,html5rocks.com,html5rocks.com,html5test.com,quirksmode.org,smashingmagazine.com,teamtreehouse.com,validator.w3.org,w3.org,w3schools.com",

Cool! I assume if/when this list gets bigger we can pull it out into a json file
Attachment #8589427 - Flags: feedback?(msamuel) → feedback+
New feature and we want this to be as polished as possible for 38. tracking.

Ed, please make sure that lands asap!
tracking-firefox38: ? → +
tracking-firefox39: ? → +

Comment 5

3 years ago
Comment on attachment 8589427 [details] [diff] [review]
v1

Review of attachment 8589427 [details] [diff] [review]:
-----------------------------------------------------------------

So the frecent_sites string of a link from the server has to match a string in ALLOWED_FRECENT_SITES in order for it to become a suggested link?  (Or is it in order for it to become a tile that is eligible to be decorated with suggested links?)  Is that to guard against a compromised server?
Attachment #8589427 - Flags: review?(adw) → review+

Comment 6

3 years ago
Comment on attachment 8589427 [details] [diff] [review]
v1

Couple questions:

Where are we disabling Firefox from querying the server that responds with the set of URLS?

Where is the code that ensures that we have hit n elements in those sets?
Flags: needinfo?(dougt)
(Assignee)

Comment 7

3 years ago
Comment on attachment 8589427 [details] [diff] [review]
v1

(In reply to Doug Turner (:dougt) from comment #6)
> Where are we disabling Firefox from querying the server that responds with
> the set of URLS?
There is no separate request asking for the set of urls. It's part of the payload that contains all the tiles data, e.g., https://people.mozilla.org/~elee/suggested.unicorn.json (scroll to the bottom).

> Where is the code that ensures that we have hit n elements in those sets?
This conditional short circuits the logic for adding a potential suggested link:
>       rawLinks.suggested.filter(validityFilter).forEach((link, position) => {
>+        // Only allow suggested links with approved frecent sites
>+        if (!this.isFrecentSitesAllowed(link.frecent_sites)) {
>+          return;

By checking if the frecent_sites array on a given suggested link matches one of the predefined ones:
>+  isFrecentSitesAllowed(sites) {
>+    return ALLOWED_FRECENT_SITES.has(sites.join(","));

If a suggested link wasn't filtered out, its frecent_sites array is converted to a Map of possible sites to a Map of possible suggested links:
http://mxr.mozilla.org/mozilla-central/source/browser/modules/DirectoryLinksProvider.jsm?mark=219-229#219

When picking what suggested link to use as the suggested tile, we go through the top sites that have possible suggested links and select one:
http://mxr.mozilla.org/mozilla-central/source/browser/modules/DirectoryLinksProvider.jsm?mark=628-652#594

The key thing is that a potential suggested link is removed from consideration early on if it doesn't match the predefined expectations.
Flags: needinfo?(dougt)
(Assignee)

Comment 8

3 years ago
(In reply to Drew Willcoxon :adw from comment #5)
> So the frecent_sites string of a link from the server has to match a string
> in ALLOWED_FRECENT_SITES in order for it to become a suggested link?
Right, otherwise the link is ignored.

> Is that to guard against a compromised server?
Nod. A built in safeguard that allows only approved sets of sites.

For 38, we don't have a better way to identify a set of sites from the server, but potentially later we could have aliases, e.g., the server sends frecent_sites: "mozilla sites" and Firefox knows that "mozilla sites" is some list of sites.

Comment 9

3 years ago
(In reply to Ed Lee :Mardak from comment #7)
> Comment on attachment 8589427 [details] [diff] [review]
> v1
> 
> (In reply to Doug Turner (:dougt) from comment #6)
> > Where are we disabling Firefox from querying the server that responds with
> > the set of URLS?
> There is no separate request asking for the set of urls. It's part of the
> payload that contains all the tiles data, e.g.,
> https://people.mozilla.org/~elee/suggested.unicorn.json (scroll to the
> bottom).

Not following.

Here is my understanding at this point. For some reason, we are fetching the set of buckets from a remote server controlled by Mozilla.  My recommendation was to NOT do this because we do not want to control such a powerful feature (the ability to query all firefox user's browser history with arbitrary buckets) and the lack of transparency (does your FF install for the set of buckets differ from my set?, does it change, is there anything suspect in the buckets??).  With keeping a local list resolves both of these problems.

So, I was expecting to see some patch that removes the XHR that fetches this data.

> > Where is the code that ensures that we have hit n elements in those sets?
> This conditional short circuits the logic for adding a potential suggested
> link:
> >       rawLinks.suggested.filter(validityFilter).forEach((link, position) => {
> >+        // Only allow suggested links with approved frecent sites
> >+        if (!this.isFrecentSitesAllowed(link.frecent_sites)) {
> >+          return;
> 
> By checking if the frecent_sites array on a given suggested link matches one
> of the predefined ones:
> >+  isFrecentSitesAllowed(sites) {
> >+    return ALLOWED_FRECENT_SITES.has(sites.join(","));

So, if *any* bucket site is in the frecent list, we show some related ad?  I am confused because I thought it was some score or # of hits in a bucket before you show the suggested ad.


Can you point me at a document that explains this scoring?  Or confirm that it is only 1 hit?
Flags: needinfo?(dougt)
would it make more sense to load ALLOWED_FRECENT_SITES from preferences instead of hardcoding them in the source file?
(Assignee)

Comment 11

3 years ago
(In reply to Doug Turner (:dougt) from comment #9)
> Here is my understanding at this point. For some reason, we are fetching the
> set of buckets from a remote server controlled by Mozilla.
There is no separate set of buckets. Each suggested link has a bucket of sites that Firefox uses to determine if it should be shown as a suggested tile. The patch makes it so Firefox only accepts suggested links that conform to a predefined allowed list.

The patch prevents the ability to query with arbitrary buckets because only the allowed ones are used. There could be multiple suggested links that use the same bucket.

> I was expecting to see some patch that removes the XHR that fetches this data.
The server only provides one json response that contains link data including the title, url, and frecent sites to match for suggested links.

> So, if *any* bucket site is in the frecent list, we show some related ad?
Only one site from a bucket needs to match a site in the top 100 links of the new tab page. Those 100 links are based on the user's frecency, so this could avoid triggering a suggested tile on a site that the user only accidentally visited once.

> would it make more sense to load ALLOWED_FRECENT_SITES from preferences
> instead of hardcoding them in the source file?
We could. Is this so users could see and tweak this value from about:config? Or similarly make it easier for add-ons to replace this value as desired?
Ed, I am pretty sure we are talking passed each other here.  The recommendation of the team was to NOT do any network requests for frecent sites | buckets.  Where is the removal of that code?


> Those 100 links are based on the user's frecency, so this could avoid triggering a suggested tile on a site that the user only accidentally visited once.

False.  A user that only has visited a few sites will, of course, trigger an ad if they have been to <= 100 sites.  This is certainly true for new users.  It also might be true for a larger user population.  Do we know what % of our users have only a few 100 urls in their history?

> We could. Is this so users could see and tweak this value from about:config? Or similarly make it easier for add-ons to replace this value as desired?

Correct.  This seems like the best extensible solution we have at this point for your stuff.
(Assignee)

Comment 13

3 years ago
(In reply to Doug Turner (:dougt) from comment #12)
> The recommendation of the team was to NOT do any network requests for frecent sites | buckets.  Where is the removal of that code?
The recommendation was to prevent arbitrary querying of sites from the remote server.

If there cannot be any network requests to the server to figure out what bucket for suggested links should be shown, how is Firefox supposed know when to show a suggested tile?

There is no removal of that code because the server can provide any number of potential suggested links that can reuse any of the valid frecent sites lists. We might choose to stop serving a suggested link or serve a bunch for the same bucket.

Perhaps another way to look at this is, ideally if we had more time, we could assign a name to a bucket, say "mozilla sites" for "addons.mozilla.org,air.mozilla.org,blog.mozilla.org,..." The server says to use "mozilla sites" and Firefox knows that "mozilla sites" is an allowed bucket with an approved list of frecent sites. Because we did not implement our server to have a concept of bucket names and instead only with a list of sites, the list of sites /is/ the bucket name. The patch makes Firefox only accepts "bucket names" of "addons.mozilla.org,air.mozilla.org,blog.mozilla.org,..." and the frecent sites of that really-long-named bucket is also "addons.mozilla.org,air.mozilla.org,blog.mozilla.org,..."

> > Those 100 links are based on the user's frecency, so this could avoid triggering a suggested tile on a site that the user only accidentally visited once.
> False.  A user that only has visited a few sites will, of course, trigger an
> ad if they have been to <= 100 sites.
I don't see how what I said was false. I said it could avoid triggering and it does so when users have more than 100 new tab links that pushes out a site to spot 101.

If you don't think triggering based on 1 visit to a site is a good user experience, that's a fair comment/bug to file.

> > We could. Is this so users could see and tweak this value from about:config? Or similarly make it easier for add-ons to replace this value as desired?
> Correct.  This seems like the best extensible solution we have at this point
> for your stuff.
What is the extensibility needed for?

Comment 14

3 years ago
(In reply to Ed Lee :Mardak from comment #13)

> What is the extensibility needed for?
Giving the power back to the user.
(Assignee)

Comment 15

3 years ago
(In reply to Mardeg from comment #14)
> > What is the extensibility needed for?
> Giving the power back to the user.
The tiles functionality already provides user control in various ways. Users can turn off all suggestions from the new tab menu or block specific tiles directly from a new tab tile.

Why does this level of control need to be another pref? We already have other about:config prefs that can be adjusted such as pointing to a different set of tiles. Users could already completely control their own personal set of suggested tiles. For example, someone could set up their own server that fetches our normal endpoint, filters/adds/modifies/completely-replace and point Firefox to that custom endpoint. Other users could point to that too if they trust that alternative more than Mozilla's.

But those about:config prefs would probably only be touched by people who really want super detailed controls. We don't do this for all our features. As a silly example, should we provide a pref to change which port Firefox uses for http? It would give control to users to by default fetch on port X instead of port 80. Should clicking the back button in Firefox always go back Y pages instead of 1 page? Should Firefox automatically open Z links as the user browses to try to hide which links the user is actually visiting?

Comment 16

3 years ago
(In reply to Ed Lee :Mardak from comment #15)
> Why does this level of control need to be another pref?
We really shouldn't make domain whitelisting harder than a pref string. People like seeing a string they can simply edit manually (or have an extension provide a UI for), which does seem to be more sensible than the "set up their own server" example.
Ed, I am not sure to understand all the impacts for this discussion. I guess you are still targeting 38 for this change, right? If yes, could you make sure that it is going to land this week? Thanks
Flags: needinfo?(edilee)
(Assignee)

Comment 18

3 years ago
(In reply to Sylvestre Ledru [:sylvestre] from comment #17)
> I guess you are still targeting 38 for this change, right? If yes, could
> you make sure that it is going to land this week? Thanks
Yes, we're working through some privacy/control issues and also waiting on finalized approved lists. I'll get them sorted out and landed this week.
Flags: needinfo?(edilee)
(Assignee)

Comment 19

3 years ago
dougt, per comment 13, is preventing arbitrary querying of sites not sufficient enough and we need to remove fetching data from servers as well? adw has reviewed the code to confirm the patch indeed prevents Firefox from showing and reporting back on arbitrary adgroups.
Flags: needinfo?(dougt)
I would rather not leave footguns in our code.  The point is to remove the functionality not just prevent it from running.  When there is a clear security process around how ad buckets are served from the network, then reland it.
Flags: needinfo?(dougt)
(Assignee)

Comment 21

3 years ago
(In reply to Doug Turner (:dougt) from comment #20)
> I would rather not leave footguns in our code.
How is Firefox supposed to know what bucket to use for a given suggested link if the suggestions are coming from the server?
Flags: needinfo?(dougt)
ed,

please clarify my understanding (someone should write down a design of how this all works).

there are at least 3 separate network requests.  the first is the request to collect the set of uris (called buckets).  Each URI in the bucket is used to query the browser's history.

the second request is to fetch an ad when there is a collision between the user's browser history and one of the buckets.  This request is made at some point before showing the ad.

the third request is to inform mozilla when an ad has been displayed to the user or the user has clicked on the ad.

My understanding was that there was considerable risk if the buckets were fetched from the network. Given this, i thought I would have seen something in your code that removed the fetching of the buckets.

Does my understand match reality?
Flags: needinfo?(dougt)
(Assignee)

Comment 23

3 years ago
(In reply to Doug Turner (:dougt) from comment #22)
> there are at least 3 separate network requests.  the first is the request to
> collect the set of uris (called buckets).  Each URI in the bucket is used to
> query the browser's history.
There is no request for just buckets. There is one request that downloads all possible suggested links whether or not they will be shown for the user. Firefox might download a bunch of suggested links and decide to shown no suggested tiles because there's nothing matching, e.g., if the user has no browsing history. Firefox is able to make the decision because the downloaded suggested links has data about when Firefox should show a tile or not.

> the second request is to fetch an ad when there is a collision between the
> user's browser history and one of the buckets.  This request is made at some
> point before showing the ad.
There is no request containing the "collisions" / matches. Firefox downloads more than necessary so that we don't reveal matched or unmatched buckets.

> the third request is to inform mozilla when an ad has been displayed to the user or the user has clicked on the ad.
We do have separate impression and click pings when one of the served tiles is shown/clicked.

> My understanding was that there was considerable risk if the buckets were fetched from the network. Given this, i thought I would have seen something in your code that removed the fetching of the buckets.
There is no separate request for buckets, so that's why there was no code removed for fetching buckets.
Flags: needinfo?(dougt)
(Assignee)

Comment 24

3 years ago
(In reply to Doug Turner (:dougt) from comment #22)
> there are at least 3 separate network requests
I've been searching through emails/comments and still have no idea why you would think there's at least 3 separate network requests. Could you point me to where you found that?
>  Could you point me to where you found that?

We've had a number of different vidyo conference about this feature. I am sure I picked up this idea in one of them.  We do really need a one page document on how this system works.
Flags: needinfo?(dougt)

Updated

3 years ago
Attachment #8589427 - Flags: feedback?(dougt)
(Assignee)

Comment 26

3 years ago
(In reply to Doug Turner (:dougt) from comment #25)
> We do really need a one page document on how this system works.
https://wiki.mozilla.org/Tiles/Technical_Documentation

Not sure if that's detailed enough for what you're looking for though.

Updated

3 years ago
Iteration: 40.1 - 13 Apr → 40.2 - 27 Apr
(Assignee)

Comment 27

3 years ago
Created attachment 8593491 [details]
v2 screenshot

Turns out we need to hardcode a descriptive name instead of using the domain, so updating the patch.....
(Assignee)

Updated

3 years ago
Summary: Filter for specific suggested tiles adgroups/buckets/frecent_sites lists → Filter for specific suggested tiles adgroups/buckets/frecent_sites lists with display name
(Assignee)

Comment 28

3 years ago
Created attachment 8593499 [details] [diff] [review]
v2

I'm still waiting on updated lists and now display names from jterry+maksik, so I'll update the patch when those are ready later today.
Attachment #8589427 - Attachment is obsolete: true
Attachment #8593499 - Flags: review?(adw)
Attachment #8593499 - Flags: feedback?(msamuel)

Updated

3 years ago
Blocks: 1155443

Updated

3 years ago
No longer blocks: 1155443

Updated

3 years ago
Attachment #8593499 - Flags: review?(adw) → review+
(Assignee)

Comment 29

3 years ago
Created attachment 8593682 [details]
cats.json

maksik, here's a json file structured [["string of sites","name"],...]
(Assignee)

Comment 30

3 years ago
Comment on attachment 8593682 [details]
cats.json

https://docs.google.com/a/mozilla.com/spreadsheets/d/1FiGT1Fb2w2147q2hTzw36KbPw1ikZGCL1MPWTSpA2cU/edit#gid=199836947

d=require("fs").readFileSync("tmp.tsv","utf8").trim().split("\n").slice(1);
o={};d.forEach(function(r){x=r.split("\t");if(!o[x[0]])o[x[0]]=[[],x[2]];o[x[0]][0].push(x[1])});
a=[];Object.keys(o).forEach(function(k){a.push([o[k][0].sort().join(","),o[k][1]])});
JSON.stringify(a)
Attachment #8593682 - Attachment mime type: application/json → text/plain
(Assignee)

Comment 31

3 years ago
Created attachment 8593686 [details] [diff] [review]
v2.1 Mardak will checkin after privacy approval

updated with finalish buckets that's pending privacy review

Approval Request Comment
[Feature/regressing bug #]: Suggested Tiles bug 1120311
[User impact if declined]: A malicious server could potentially query for arbitrary sites in user history.
[Describe test coverage new/current, TreeHerder]: New tests added to only allow for the hardcoded sites
[Risks and why]: Low. New logic filters out content in malicious case but no change in normal behavior
[String/UUID change made/needed]: None
Attachment #8593499 - Attachment is obsolete: true
Attachment #8593499 - Flags: feedback?(msamuel)
Attachment #8593686 - Flags: approval-mozilla-beta?
Ed, is it going to land in m-c before or should I accept it right now? Thanks
Flags: needinfo?(edilee)
(Assignee)

Comment 33

3 years ago
We'll land this directly to beta 38 as it's a temporary fix.
Flags: needinfo?(edilee)
Attachment #8593686 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
OK, thanks. This should be in 38 beta 6 then.
(Assignee)

Comment 35

3 years ago
Created attachment 8594546 [details] [diff] [review]
v2.2

I removed Reading as feedly made up almost 99% of the impressions in that bucket. Everything else has the most popular item at most around 50% and many have much lower, e.g., arstechnica is roughly 15% of impressions of the tech news bucket.

I also chopped off video games and news general (extended) to the top 300.

prompt("",document.body.textContent.split("\n").slice(0, 300).map(r => r.split(/[, ]/)[0]).sort().join(","))
Attachment #8593686 - Attachment is obsolete: true
(Assignee)

Comment 36

3 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/e66ad17db13f
Status: NEW → RESOLVED
Points: --- → 5
Last Resolved: 3 years ago
status-firefox38: affected → fixed
Flags: qe-verify-
Flags: firefox-backlog+
Resolution: --- → FIXED
Summary: Filter for specific suggested tiles adgroups/buckets/frecent_sites lists with display name → Filter for specific suggested tiles adgroups/buckets/frecent_sites lists with display name for 38
Target Milestone: --- → Firefox 38
(Assignee)

Updated

3 years ago
Blocks: 1156921
Ed - Do we need this change on Aurora 39 and Nightly 40 as well?
Flags: needinfo?(edilee)
(Assignee)

Comment 38

3 years ago
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #37)
> Ed - Do we need this change on Aurora 39 and Nightly 40 as well?
Yes. Per bsmedberg, we'll also put this temporary fix on Nightly and Aurora until we have a better solution.
Status: RESOLVED → REOPENED
Flags: needinfo?(edilee)
Resolution: FIXED → ---
Summary: Filter for specific suggested tiles adgroups/buckets/frecent_sites lists with display name for 38 → Filter for specific suggested tiles adgroups/buckets/frecent_sites lists with display name
(Assignee)

Comment 39

3 years ago
Created attachment 8599169 [details] [diff] [review]
for nightly

Updated patch for changes from bug 1151876
(Assignee)

Comment 40

3 years ago
Documentation in bug 1156876 expects this bug on Nightly.
Blocks: 1156876
(Assignee)

Updated

3 years ago
Target Milestone: Firefox 38 → Firefox 40

Updated

3 years ago
Iteration: 40.2 - 27 Apr → ---
status-firefox38.0.5: --- → fixed
https://hg.mozilla.org/mozilla-central/rev/61f4eb52a506
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
status-firefox40: affected → fixed
Flags: in-testsuite+
Resolution: --- → FIXED

Updated

3 years ago
Iteration: --- → 40.3 - 11 May
(Assignee)

Updated

3 years ago
Blocks: 1161412
Doug and Ed, have your privacy concerns been resolved ?  

I'm going to go ahead and uplift this to 39 since it's already on 38.
Flags: needinfo?(edilee)
Flags: needinfo?(dougt)
Ed, exactly which patches here should be uplifted? The same one as for beta?
Actually it looks like the updated patch for nightly is probably the one that needs uplift.
(Assignee)

Comment 46

3 years ago
Created attachment 8602802 [details] [diff] [review]
for aurora
Flags: needinfo?(edilee)
(Assignee)

Updated

3 years ago
Blocks: 1140185
(Assignee)

Updated

3 years ago
No longer blocks: 1161412
(Assignee)

Comment 47

3 years ago
(In reply to Liz Henry (:lizzard) from comment #45)
> Actually it looks like the updated patch for nightly is probably the one
> that needs uplift.
Yup. I attached a "for aurora" which is the same as the nightly one. Uplifting to 39 is making the privacy concerns addressed consistent with what's on nightly 40 and beta 38.
Comment on attachment 8602802 [details] [diff] [review]
for aurora

Approved for uplift to aurora as requested in bug 1161412.
Attachment #8602802 - Flags: approval-mozilla-aurora+

Updated

3 years ago
Flags: needinfo?(dougt)

Updated

3 years ago
Blocks: 1203577
You need to log in before you can comment on or make changes to this bug.