Closed Bug 1062708 Opened 10 years ago Closed 10 years ago

Build Telemetry Experiment for new tab url data

Categories

(Content Services Graveyard :: Tiles, defect)

defect
Not set
normal
Points:
8

Tracking

(Not tracked)

RESOLVED FIXED
Iteration:
37.2

People

(Reporter: Mardak, Assigned: Mardak)

References

Details

(Whiteboard: .005)

Attachments

(1 file, 6 obsolete files)

Bug 1042214 added click/view pings that conditionally include the url based on whether the tile is enhanced. Bug 1062683 removed the url data.

Instead of building the code to send url data in Firefox, we'll do it through a telemetry experiment. Also, instead of using the enhanced list as the whitelist, we'll use a whitelist of popular domains (probably eTLD+1, e.g., mozilla.org) to send view/click data.

We won't send the full path of urls and instead use the granularity of how multiple tiles get combined into one (from bug 1045760), i.e., NewTabUtils.extractSite(url)

We'll send the data to the existing previously used server: tiles.up.mozillalabs.com
Attached patch wip (obsolete) — Splinter Review
This wip which only has a whitelist of mozilla.org sends these pings on my normal profile (+I dragged https://www.mozilla.org/en-US/ into the first tile):

POST /view (opening new tab with 3 rows 4 columns)

{"locale":"en-US","tiles":[{"id":5,"pin":1,"url":"mozilla.org"},{"score":860,"url":""},{"score":710,"url":""},{"score":134,"url":""},{"score":127,"url":""},{"score":64,"url":""},{"score":37,"url":""},{"score":36,"url":""},{"score":27,"url":""},{"score":24,"url":""},{"score":24,"url":""},{"score":24,"url":""},{"score":22,"url":"tbpl.mozilla.org"},{"score":20,"url":""},{"score":20,"url":""}],"view":11}

POST /click (pinning tbpl tile)

{"locale":"en-US","tiles":[{"id":5,"pin":1,"url":"mozilla.org"},{"score":860,"url":""},{"score":710,"url":""},{"score":134,"url":""},{"score":127,"url":""},{"score":64,"url":""},{"score":37,"url":""},{"score":36,"url":""},{"score":27,"url":""},{"score":24,"url":""},{"score":24,"url":""},{"score":24,"url":""},{"pin":1,"score":22,"url":"tbpl.mozilla.org"},{"score":20,"url":""},{"score":20,"url":""}],"pin":12}
Who should review the add-on code for the experiment?
Comment on attachment 8483948 [details] [diff] [review]
wip

>diff --git a/bootstrap.js b/bootstrap.js
>+const PING_ENDPOINT = "http://localhost:8000/";
Note: This will point to https://tiles.up.mozillalabs.com/v2/links/ as originally implemented in bug 1042214. The current endpoint for production is tiles.services.mozilla.com as changed in bug 1058935.
Attached patch v1 (obsolete) — Splinter Review
Reports at most eTLD+2 instead of full host.

Here's some sample output:

dragged in 3 urls:
1) https://bug1053530.bugzilla.mozilla.org/attachment.cgi?id=8476909
2) http://mozilla.org/foo
3) http://www.mozilla.org/bar

With just mozilla.org in whitelist:

{"locale":"en-US","tiles":[{"pin":1,"url":"bugzilla.mozilla.org"},{"id":16,"pin":1,"url":"mozilla.org"},{"id":16,"pin":1,"url":"www.mozilla.org"},{"score":924,"url":""},{"score":441,"url":""},{"score":133,"url":""},{"score":131,"url":""},{"score":67,"url":""},{"score":37,"url":""},{"score":37,"url":""},{"score":25,"url":""},{"score":24,"url":"tbpl.mozilla.org"},{"score":24,"url":""},{"score":23,"url":""},{"score":23,"url":""}],"view":11}

With top 50k eTLD+1s:

{"locale":"en-US","tiles":[{"score":924,"url":"m.neogaf.com"},{"score":441,"url":"www.bogleheads.org"},{"score":133,"url":"cloud.feedly.com"},{"score":131,"url":"www.engadget.com"},{"score":67,"url":"www.mint.com"},{"score":37,"url":"www.youtube.com"},{"score":37,"url":"mail.google.com"},{"score":25,"url":"online.citibank.com"},{"score":24,"url":"tbpl.mozilla.org"},{"score":24,"url":"www.fidelity.com"},{"score":23,"url":"www.fatwallet.com"},{"score":23,"url":"mail.mozilla.com"},{"score":23,"url":"arstechnica.com"},{"score":19,"url":"www.facebook.com"},{"score":17,"url":"bugzilla.mozilla.org"}],"view":11}

Generated list of top 50k eTLD+1s from  http://s3.amazonaws.com/alexa-static/top-1m.csv.zip:
prompt("",prompt().split(" ").map(u => { try { return Services.eTLD.getBaseDomain(Services.io.newURI("http://"+u,null,null)) } catch(ex) {}}).slice(0, 50000).sort())
Attachment #8484776 - Flags: review?(benjamin)
Comment on attachment 8484776 [details] [diff] [review]
v1

I am going to mark feedback+ for the data collection proposal; it's what we discussed and I think it strikes a good balance of privacy versus getting useful data.

I don't want to be the code reviewer for this, so I'm going to pass this review on to Felipe, who's done experiment devel and can do a careful review.

I'm a little concerned about the performance of parsing this list of domain names in JS... I wonder if it would be better to ship it as a separate file which is read asynchronously on startup.
Attachment #8484776 - Flags: review?(felipc)
Attachment #8484776 - Flags: review?(benjamin)
Attachment #8484776 - Flags: feedback+
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> I wonder if it would be better to ship it as a separate file
> which is read asynchronously on startup.
felipc, is there a standard way to read files from these restartless add-ons? In the past I've done stuff like..

function startup({id}) AddonManager.getAddonByID(id, addon => {
  let xhr = new XMLHttpRequest();
  xhr.open("GET", addon.getResourceURI("data.file").spec);
  xhr.onload = _ => console.log(xhr.responseText);
  xhr.send();
});
Ed, yeah, Benjamin and I were just talking about that. But it's simpler than your snipped, as the data obj in the first param to the startup function will give you the base resourceURI for the add-on root.

https://developer.mozilla.org/en-US/Add-ons/Bootstrapped_extensions#Bootstrap_data


It would be useful to read this file and create the Set in a jsm to measure the amount of memory that the Set will take.
Attached patch v2 (obsolete) — Splinter Review
Switches to a separate data file + jsm with the following about:memory results:

144,072 B (00.12%) -- add-ons/fx-newtab-data@mozilla.org/js-non-window/zones/zone(0x100396800)
├───72,544 B (00.06%) -- compartment([System Principal], jar:fx-newtab-data@mozilla.org.xpi!/NewTabWhitelist.jsm)
│   ├──35,440 B (00.03%) -- shapes 
│   │  ├──25,008 B (00.02%) -- gc-heap
│   │  │  ├──15,720 B (00.01%) ── tree/global-parented
│   │  │  └───9,288 B (00.01%) ── base
│   │  └──10,432 B (00.01%) ── malloc-heap/compartment-tables
│   ├──22,016 B (00.02%) -- objects
│   │  ├──22,016 B (00.02%) ── gc-heap/function
│   │  └───────0 B (00.00%) ── non-heap/code/asm.js
│   └──15,088 B (00.01%) -- sundries
│      ├───8,712 B (00.01%) ── malloc-heap
│      └───6,376 B (00.01%) ── gc-heap
└───71,528 B (00.06%) -- compartment([System Principal], jar:fx-newtab-data@mozilla.org.xpi!/bootstrap.js (from: resource://gre/modules/addons/XPIProvider.jsm:4174))
    ├──38,024 B (00.03%) -- shapes 
    │  ├──23,592 B (00.02%) -- gc-heap
    │  │  ├──12,000 B (00.01%) ── tree/global-parented
    │  │  └──11,592 B (00.01%) ── base
    │  └──14,432 B (00.01%) ── malloc-heap/compartment-tables
    ├──18,752 B (00.02%) -- sundries
    │  ├───9,776 B (00.01%) ── malloc-heap
    │  └───8,976 B (00.01%) ── gc-heap
    └──14,752 B (00.01%) -- objects
       ├──14,752 B (00.01%) ── gc-heap/function
       └───────0 B (00.00%) ── non-heap/code/asm.js
Attachment #8484776 - Attachment is obsolete: true
Attachment #8484776 - Flags: review?(felipc)
Attachment #8485206 - Flags: review?(felipc)
Attached patch v3 (obsolete) — Splinter Review
Removed the jsm and directly loaded the file in bootstrap.js:

108,616 B (00.09%) -- add-ons/fx-newtab-data@mozilla.org/js-non-window/zones/zone(0x100496800)/compartment([System Principal], jar:fx-newtab-data@mozilla.org.xpi!/bootstrap.js (from: resource://gre/modules/addons/XPIProvider.jsm:4174))
├───55,128 B (00.05%) -- shapes
│   ├──36,504 B (00.03%) -- gc-heap
│   │  ├──21,240 B (00.02%) ── tree/global-parented
│   │  └──15,264 B (00.01%) ── base
│   └──18,624 B (00.02%) ── malloc-heap/compartment-tables
├───29,888 B (00.03%) -- objects
│   ├──29,888 B (00.03%) ── gc-heap/function
│   └───────0 B (00.00%) ── non-heap/code/asm.js
└───23,600 B (00.02%) -- sundries
    ├──12,456 B (00.01%) ── malloc-heap
    └──11,144 B (00.01%) ── gc-heap
Attachment #8483948 - Attachment is obsolete: true
Attachment #8485206 - Attachment is obsolete: true
Attachment #8485206 - Flags: review?(felipc)
Attachment #8485210 - Flags: review?(felipc)
I don't understand what magic is going on.. But if I use a plain {} instead of Set():

3,679,656 B (03.12%) -- add-ons/fx-newtab-data@mozilla.org/js-non-window/zones/zone(0x100496800)/compartment([System Principal], jar:fx-newtab-data@mozilla.org.xpi!/bootstrap.js (from: resource://gre/modules/addons/XPIProvider.jsm:4174))
├──3,107,504 B (02.63%) -- shapes
│  ├──2,038,928 B (01.73%) -- gc-heap
│  │  ├──2,003,200 B (01.70%) ── dict
│  │  ├─────20,680 B (00.02%) ── tree/global-parented
│  │  └─────15,048 B (00.01%) ── base
│  └──1,068,576 B (00.90%) -- malloc-heap
│     ├──1,049,952 B (00.89%) ── dict-tables
│     └─────18,624 B (00.02%) ── compartment-tables
├────560,736 B (00.47%) -- objects
│    ├──531,456 B (00.45%) ── malloc-heap/slots
│    ├───29,280 B (00.02%) ── gc-heap/function
│    └────────0 B (00.00%) ── non-heap/code/asm.js
└─────11,416 B (00.01%) -- sundries
      ├───7,872 B (00.01%) ── gc-heap
      └───3,544 B (00.00%) ── malloc-heap
It looks like the memory storage for the Set is not being reported in the proper compartment. Either it's going somewhere else or it's going to heap-unclassified. OR it is extremely efficient, but I doubt it.. It would be good to understand this before shipping and if there's something we can do to reduce it, as it will add a 3.5MB penalty for users.
Comment on attachment 8485210 [details] [diff] [review]
v3

Review of attachment 8485210 [details] [diff] [review]:
-----------------------------------------------------------------

::: bootstrap.js
@@ +32,5 @@
> +
> +function replaceFunctionality() {
> +  // Save the original functionality to restore on uninstall
> +  let origFunctionality = DirectoryLinksProvider.reportSitesAction;
> +  cleanUp = _ => DirectoryLinksProvider.reportSitesAction = origFunctionality;

arrow functions with no params can be constructed as:

() =>


There are a few across this file that can be changed

@@ +98,5 @@
> +      xhr.onload = _ => {
> +        // Convert the newline separated file into a set
> +        gWhitelist = xhr.responseText.trim().split("\n").reduce((set, val) => {
> +          return set.add(val);
> +        }, new Set());

trim() shouldn't be necessary as you control the input text.

please use a normal loop (or for-of)  to add entries to the Set instead of triggering 50k function calls :)

@@ +103,5 @@
> +
> +        // Now that the data file is ready, replace the existing functionality
> +        replaceFunctionality();
> +      };
> +    xhr.send();

wrong indentation

@@ +112,5 @@
> +
> +// Do nothing unless replaced
> +function cleanUp() {}
> +function shutdown(data, reason) {
> +  cleanUp();

wrap this in:  if (reason != APP_SHUTDOWN)
Attachment #8485210 - Flags: review?(felipc) → feedback+
Sean, we need to specify which locales.  I'm assuming we want all locales.  Can you confirm?
Flags: needinfo?(sbohan)
OOTO last week and didn't see this. 

ALL LOCALES PLEASE
Flags: needinfo?(sbohan)
(In reply to Ed Lee :Mardak from comment #3)
> Note: This will point to https://tiles.up.mozillalabs.com/v2/links/ as
> originally implemented in bug 1042214. The current endpoint for production
> is tiles.services.mozilla.com as changed in bug 1058935.
oyiptong, will it be easier for us to just reuse tiles.services.mozilla.com instead of tiles.up.mozillalabs.com for this experiment? One benefit is that it's known that with bug 1030135, tiles.services.mozilla.com is pinned while we currently don't for tiles.up.mozillalabs.com.

Reusing .services avoids us needing to deploy more updates to .up, but we'll still need to do deploy some updates to .services to do the analysis.

Additionally, if the experiment shows that we're getting and analyzing the data that we need on .services, it's straightforward to turn on the functionality in nightly/aurora/beta which already point to .services.
Flags: needinfo?(oyiptong)
Oh following up on comment 15, one "downside" to reusing .services is that we'll want to avoid any deploys before November 10. Where if we use .up, we could get the experiment out sooner, but I believe we can wait for now unless we get a push to get the data this month.
It seems that we won't release an experiment for another month, so by then we'll target tiles.services.mozilla.com anyway.
Flags: needinfo?(oyiptong)
Here's a sample payload on a new Beta 34 profile and visiting two sites:

{"locale":"en-US","tiles":[{"url":"bugzilla.mozilla.org"},{"url":""},{"id":441},{"id":286},{"id":358},{"id":287},{"id":317},{"id":19},{"id":332},{"id":359},{"id":360},{"id":405},{"id":361}],"view":2}

tspurway, this should be good to send to https://tiles.services.mozilla.com/v2/links/ ?
Flags: needinfo?(tspurway)
Attached patch v4 (obsolete) — Splinter Review
In addition to the code changes from code review below, I made these changes:

+const PING_ENDPOINT = "https://tiles.services.mozilla.com/v2/links/";
Switched to the existing production endpoint that Release/etc use.

+        <em:maxVersion>34.0</em:maxVersion>
Lowered maxversion as the module moved to browser instead of toolkit in 35. We want to run this on Beta 34 anyway.

(In reply to :Felipe Gomes from comment #12)
> arrow functions with no params can be constructed as:
> () =>
> There are a few across this file that can be changed
Fixed up 3 instances of "_ =>" to "() =>"

> > +        gWhitelist = xhr.responseText.trim().split("\n").reduce((set, val) > trim() shouldn't be necessary as you control the input text.
Removed trim and removed EOL from eTLD1.whitelist file.

> please use a normal loop (or for-of)  to add entries to the Set instead of
> triggering 50k function calls :)
Switched to for of.

> > +      };
> > +    xhr.send();
> wrong indentation
Fixed.

> > +function shutdown(data, reason) {
> > +  cleanUp();
> wrap this in:  if (reason != APP_SHUTDOWN)
Done.
Assignee: nobody → edilee
Attachment #8485210 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8525126 - Flags: review?(felipc)
(In reply to Ed Lee :Mardak from comment #18)
> Here's a sample payload on a new Beta 34 profile and visiting two sites:
> 
> {"locale":"en-US","tiles":[{"url":"bugzilla.mozilla.org"},{"url":""},{"id":
> 441},{"id":286},{"id":358},{"id":287},{"id":317},{"id":19},{"id":332},{"id":
> 359},{"id":360},{"id":405},{"id":361}],"view":2}
> 
> tspurway, this should be good to send to
> https://tiles.services.mozilla.com/v2/links/ ?

Mardak, this looks good
Flags: needinfo?(tspurway)
tspurway, because we're using the same production endpoint, should those with the telemetry experiment send the original ping AND the ping with urls (which contains the original ping plus url data)? Or original ping AND ping with urls minus those with directory ids? Or just send one ping with urls?
Flags: needinfo?(tspurway)
Mardak, I am trying to think of a good reason to make it more complicated than just filling in the URL field on the tile, but can't find one.

If there is a URL, put it in the tile.  We can handle everything with a single ping.
Flags: needinfo?(tspurway)
Comment on attachment 8525126 [details] [diff] [review]
v4

I need to fix the double-ping issue by not calling the original implementation.
Attachment #8525126 - Flags: review?(felipc)
Blocks: 1105417
Attached patch v5 (obsolete) — Splinter Review
Add support for new browser module location on 35. Don't call the original functionality, which would result in a double ping sent to the same server.
Attachment #8525126 - Attachment is obsolete: true
Attachment #8530755 - Flags: review?(felipc)
Attachment #8530755 - Attachment is patch: true
Attachment #8530755 - Attachment mime type: text/x-patch → text/plain
Looping in ally to follow along to see some of the decisions made from before, e.g., 50k whitelist, eTLD+2, send to same production server with existing privacy practices.
Iteration: --- → 37.1
Points: --- → 8
Component: New Tab Page → Tiles
Product: Firefox → Content Services
Whiteboard: .?
Version: Trunk → unspecified
Iteration: 37.1 → 37.2
Attached patch v6Splinter Review
Aha! Looks like I was doing things the hard way. ;) Here's an updated patch against telemetry-experiment-server.

Built experiments manifest and served:

$ python build.py out http://localhost:8888/ && cd out && python -m SimpleHTTPServer 8888

Modified Beta 35 profile prefs:

experiments.manifest.uri;http://localhost:8888/firefox-manifest.json
experiments.logging.dump;true
experiments.logging.level;0

Console output:

1418279989811	Browser.Experiments.Experiments	TRACE	ExperimentEntry #0::isApplicable() - now=1418279989.811, randomValue=0.18167829130069946, data={"xpiURL":"http://localhost:8888/newtab-data-beta%40experiments.mozilla.org/experiment.xpi","xpiHash":"sha256:d4fa453f8ca3e54cd539bc3a6094932fdb71cd725393a93b8904ea913bb0edb3","appName":["Firefox"],"maxActiveSeconds":604800,"maxVersion":"37.*","sample":0.25,"startTime":1418169600,"minVersion":"33.0","endTime":1420848000,"id":"newtab-data-beta@experiments.mozilla.org","channel":["beta"]}

Sample ping:

{"locale":"en-US","tiles":[{"pin":1,"url":"cdn.mozilla.net"},{"id":500,"url":"www.mozilla.org"},{"url":""},{"url":"www.engadget.com"},{"id":498},{"id":499},{"id":501},{"id":502},{"id":503},{"id":504},{"id":505},{"id":506},{"id":507}],"view":5}

That's a pinned history tile for https://telemetry-experiment.cdn.mozilla.net/manifest/v1/firefox/35.0/beta, enhanced tile for https://www.mozilla.org/firefox/tiles/, history tile (unreported) for http://ed.agadak.net/, history tile (reported) for http://www.engadget.com/, and various directory tiles. 6 tiles are view-able.
Attachment #8530755 - Attachment is obsolete: true
Attachment #8530755 - Flags: review?(felipc)
Attachment #8534819 - Flags: review?(felipc)
The page at http://localhost:8888/ shows:

New Tab Data

Start Date
    10-Dec-2014
End Date
    10-Jan-2015
Maximum runtime
    7.0 days
Update Channels
    beta
Sample Rate
    25%

An experiment to analyze the data on about:newtab, see bug 1062708.

Related bug
Comment on attachment 8534819 [details] [diff] [review]
v6

Review of attachment 8534819 [details] [diff] [review]:
-----------------------------------------------------------------

::: experiments/newtab-data-beta/code/install.rdf
@@ +5,5 @@
> +    <em:id>newtab-data-beta@experiments.mozilla.org</em:id>
> +    <em:version>1.0.0</em:version>
> +    <em:type>128</em:type>
> +    <em:bootstrap>true</em:bootstrap>
> +    <em:unpack>false</em:unpack>

I wonder if unpack=true would be better given the eTLD1.whitelist file. Check with Benjamin?

::: experiments/newtab-data-beta/manifest.json
@@ +12,5 @@
> +    "appName"          : ["Firefox"],
> +    "channel"          : ["beta"],
> +    "minVersion"       : "33.0",
> +    "maxVersion"       : "37.*",
> +    "sample"           : 0.25

is this expected to go out for all locales?
Attachment #8534819 - Flags: review?(felipc) → review+
Whiteboard: .? → .005
(In reply to :Felipe Gomes from comment #28)
> > +    <em:unpack>false</em:unpack>
> I wonder if unpack=true would be better given the eTLD1.whitelist file.
> Check with Benjamin?
bsmedberg?

> ::: experiments/newtab-data-beta/manifest.json
> @@ +12,5 @@
> > +    "appName"          : ["Firefox"],
> > +    "channel"          : ["beta"],
> > +    "minVersion"       : "33.0",
> > +    "maxVersion"       : "37.*",
> > +    "sample"           : 0.25
> is this expected to go out for all locales?
Yup. The top 50k whitelist includes sites worldwide and there's plenty of sites that share the same domain but localizes with accept-lang.
Flags: needinfo?(benjamin)
We're going to read the whitelist file pretty much every session anyway, right? At that point, leaving the XPI packed probably gives us better I/O performance.
Flags: needinfo?(benjamin)
On the backend, mostlygeek suggested that we should QA the bigger payloads sent by selected users.

This is due to the way we're sending data from front-ends to our data processing pipeline.

It should be fairly straightforward for us to test
Depends on: 1111733
Tested with our production servers showing the raw entry before splitting up the data.

{"tiles":[{"url":"slashdot.org","pin":1},{"url":"arstechnica.com","pin":1},{"url":"www.engadget.com","pin":1},{"id":498,"pin":1},{"id":499,"pin":1},{"id":650},{"id":586},{"id":501},{"id":500},{"id":502},{"id":503},{"id":504},{"id":505},{"id":506},{"id":507}],"locale":"en-US","ip":"12.234.56.78","timestamp":1418766337757,"date":"2014-12-16","ua":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10.10; rv:35.0) Gecko\/20100101 Firefox\/35.0","view":7}
Blocks: 1113256
Committed https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/10ffcac75422
Tagged https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/fdf692d5664f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1126891
Blocks: 1132683
Blocks: 1135738
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: