Closed Bug 1043669 Opened 10 years ago Closed 10 years ago

Clarify what data to send on click vs view vs fetch

Categories

(Firefox :: New Tab Page, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: Mardak, Unassigned)

References

Details

Bug 975235 added per-click data for directory tiles:
- list id, e.g., server-hosted-list-1
- link index, e.g., 0 for facebook
- tile index, e.g., 0 for top left
- action, e.g., "click" for regular click for visit
- score, e.g., 1000 frecency
- pinned, e.g., true if pinned

Bug 993906 added per-day directory links fetch data:
- locale, e.g., en-US
- optional directoryCount, e.g., {sponsored:1,affiliate:2,organic:3} or undefined

Bug 1042214 will add per-view data for each shown tile:
- link type, e.g., history
- (tile index, e.g., 0 for top left -- implied through the position)
- score, e.g., 12345 frecency
- pinned, e.g., true if pinned
- url, e.g., http://foo.bar


Do we still need directoryCount with the daily fetch request? We get that through the view ping, but is it needed to serve the links data?

For the per-view url data, do we want to cleanse/filter out some urls on the client instead of the server? E.g., passwords or intranet hostnames?

Should the per-view send back "null" if the user set it to blank tiles? Should we distinguish it from "[]" if a user blocked out all tiles even directory tiles?
Blocks: 1043674
Sounds like we'll want HLL value from bug 972933 to be sent with the click and view pings. Not sure about fetch?
Blocks: 972933
To be clear, the HLL value is to get cardinality estimates because we don't want to have unique IDs to reidentify users even though partners would really want that ability.
Blocks: 1030832
No longer blocks: 1030832
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
clarkbw, I got an update from oyiptong that we'll need the HLL value from bug 972933. And that impression and click pings will look the same (sending all visible tiles data) except click pings will have an "action" value for the appropriate tile.

E.g.,
click ping = [{tile 1 url}, {tile 2 url, action = click}, {tile 3 url}]
click ping = [{tile 1 url}, {tile 2 url}, {tile 3 url, action = block}]
payload
-------

We went back to our measurement requirements and here is what we'd like to be sent on click and impression, as a POST request with a JSON payload:

{
  locale: _, # locale for user agent
  urls: [ # metadata for list of urls currently appearing in a newtab page, including history links
    {
      tile_id: _, # an id provided in the fetch payload, if a tile
      url: _, # a url if a history tile
      pos: _, # the position of the tile in the grid
      pinned: _, # boolean, whether or not the tile is pinned
      action: _ # one of ["click", "block", "pin"] or null
    },
    ...
  ],
  hll: { count: _, index: _ } # HLL parameters
}

Can this payload be sent both for impression and click?
For impression, the action will be absent for all urls.

url meta data
-------------

* if the tile is an enhanced history tile, it will have both the tile_id and url
* if the tile is a directory tile, it will only have a tile_id
* if the tile is a history tile, it will have just the url

source attribution
------------------

for tiles that are "enhanced", when a click happen, can we attach a header?

Referer: https://tiles.mozilla.org/?type=directory
(In reply to Olivier Yiptong [:oyiptong] from comment #4)
> source attribution
> for tiles that are "enhanced", when a click happen, can we attach a header?
> Referer: https://tiles.mozilla.org/?type=directory
You're saying if the original link, say bugzilla.mozilla.org enhanced by mozilla.org, the browser should load bugzilla.mozilla.org and set the HTTP Referer header? What is this for?
clarkbw, should I also remove the telemetry probes for shown (per index and per type) and click (per type)?
Flags: needinfo?(clarkbw)
Flags: needinfo?(clarkbw)
In the interest of optimizing the size of the message and making it a bit easier to parse, I suggest replacing the 'action' attribute in each URL.  Instead we have a single attribute at the top level that indicates the action ('click', 'block', or 'pin') whose value is the *index* of the tile in the 'urls' array.
In the interest of optimizing the size of the message, we could remove the 'pos' attribute in the 'urls' array and just rely on the natural index of the tile within the array.  Unless, of course this is a sparse array.
(In reply to Tim Spurway [:tspurway] from comment #7)
> I suggest replacing the 'action' attribute in each URL.
My current patch has "action" (as well as "id", "url", "pin") as optional. So only the one tile that has an action will have an "action" entry.

(In reply to Tim Spurway [:tspurway] from comment #8)
> In the interest of optimizing the size of the message, we could remove the
> 'pos' attribute in the 'urls' array and just rely on the natural index of
> the tile within the array.  Unless, of course this is a sparse array.
Sounds good. The logic will be if 'pos' would be the same as the natural index, exclude it. E.g., [{id:"foo"},{url:"http://bar",pinned:1,pos:8}] for 2 tiles shown where the first tile is a directory tile and the second tile is a pinned history tile in the bottom right of a 3x3 grid.
(In reply to Ed Lee :Mardak from comment #5)
> (In reply to Olivier Yiptong [:oyiptong] from comment #4)
> > source attribution
> > for tiles that are "enhanced", when a click happen, can we attach a header?
> > Referer: https://tiles.mozilla.org/?type=directory
> You're saying if the original link, say bugzilla.mozilla.org enhanced by
> mozilla.org, the browser should load bugzilla.mozilla.org and set the HTTP
> Referer header? What is this for?

Yes. At some point, either Sean or Jason spoke about tile targets wanting to know that traffic came from Tiles. One way to do so would be to have a referrer url.

We can probably do that later, tracked by a separate bug. It's also not clear if this became a requirement.
oyiptong, I asked in bug 1042214 comment 14 about including the window client resolution (not the screen resolution) in pings. If we do want that data, should this just appear at the top level next to locale: en-US?

{
  "locale": "en-US",
  "height": 799,
  "width": 1436,
  "tiles": [..]
}

Or maybe "res": "1436x799" or "size": [1436,799] ?

Also, if we're saying view/click/fetch all share the same payload, this size information would be optional (in addition to tiles data optional) for fetch. (There's no associated window for fetch and there could be multiple monitors, etc.)
(In reply to Ed Lee :Mardak from comment #11)
> {
>   "locale": "en-US",
>   "height": 799,
>   "width": 1436,
>   "tiles": [..]
> }
> 
> Or maybe "res": "1436x799" or "size": [1436,799] ?
> 
separate height and width fields are OK.  

> Also, if we're saying view/click/fetch all share the same payload, this size
> information would be optional (in addition to tiles data optional) for
> fetch. (There's no associated window for fetch and there could be multiple
> monitors, etc.)
OK.  I will make the required changes in the inferno rules and Redshift schema.
clarkbw, I added 2 more action types for bug 1040369: sponsored (click or dismiss the sponsored explanation text), sponsored-link (click on the Learn more link within the explanation area). Do we care about them? If so, tspurway/oyiptong, we'll need to update rules/schema again.
Flags: needinfo?(clarkbw)
Sample payloads:

{"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},{"id":10}],"sponsored-link":3}

clicked "learn more..." for wired


{"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},{"id":10}],"sponsored":3}

clicked "sponsored" to show/hide the explanation text
(In reply to Ed Lee :Mardak from comment #14)
> Sample payloads:
> 
> {"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/
> about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},
> {"id":10}],"sponsored-link":3}
> 
> clicked "learn more..." for wired
> 
> 
> {"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/
> about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},
> {"id":10}],"sponsored":3}
> 
> clicked "sponsored" to show/hide the explanation text

Can I make a request that it be "sponsored_link" instead of "sponsored-link" ?  In general, it helps if we keep JSON properties legal identifiers so that we don't need to have a bunch of little transformations as it travels towards the data warehouse.
(In reply to Tim Spurway [:tspurway] from comment #15)
> Can I make a request that it be "sponsored_link" instead of "sponsored-link"
Not a problem. I'm guessing nothing is camelCase, so not sponsoredLink? Or slink / sLink?

> it helps if we keep JSON properties legal identifiers
Curious, what part of the data processing is constraining this identifier? JSON accepts any string I believe, so it could be "sponsored link" too.
(In reply to Ed Lee :Mardak from comment #16)
> > it helps if we keep JSON properties legal identifiers
> Curious, what part of the data processing is constraining this identifier?
> JSON accepts any string I believe, so it could be "sponsored link" too.

Well, it's not a huge problem, but it's convenient to reuse the JSON keys as column names in the database.  It saves one line of code, in this case:

parts['sponsored_link'] = parts['sponsored-link']

The problem becomes more pronounced when there are many such transformations, type conversions, and non-standard naming across multiple log files.
(In reply to Tim Spurway [:tspurway] from comment #17)
> Well, it's not a huge problem, but it's convenient to reuse the JSON keys as
> column names in the database.
I was asking more to just figure out what the actual constraints are. So as table columns, I would assume they're case insensitive?

No "click" for link click and "Click" for sponsored link click? ;) :p So we'll want to avoid camelCasing too.

I've changed the client to use "sponsored_link" as the action.
(In reply to Ed Lee :Mardak from comment #18)
> I was asking more to just figure out what the actual constraints are. So as
> table columns, I would assume they're case insensitive?
> 
> No "click" for link click and "Click" for sponsored link click? ;) :p So
> we'll want to avoid camelCasing too.
> 
> I've changed the client to use "sponsored_link" as the action.

right.  i would keep the following constraints regarding logfile key names:
- all lowercase
- delimit words with '_'s
- all dictionary keys should be valid identifiers in Javascript, Python and SQL (avoid reserved words if possible)
- name keys precisely.  don't worry about key length, compression is your friend
- invent new keys *only* when you can't find a suitable key in your codebase, other logfiles, or database schemas project-wide
- mind your datatypes, use string types only for string data.  (BAD: "impression_count": "15")
- consider if using JSON 'null' helps or hinders.  an alternative is to make the key optional
- ISO dates are sortable as strings (eg. 2014-10-27)
- timestamps in milliseconds
(In reply to Ed Lee :Mardak from comment #13)
> clarkbw, I added 2 more action types for bug 1040369: sponsored (click or
> dismiss the sponsored explanation text), sponsored-link (click on the Learn
> more link within the explanation area). Do we care about them? If so,
> tspurway/oyiptong, we'll need to update rules/schema again.

Interesting, but not required right now.
Flags: needinfo?(clarkbw)
You need to log in before you can comment on or make changes to this bug.